I agree that the quality of each implementation is very pertinent here. I wonder if the B-Tree implementation in question used the Lehman & Yao B-link technique, in order to obviate the need for "latch coupling"/"crabbing". At the same time, it's interesting to consider if the LSM implementation is similarly optimized to minimize low-level locks/semaphore overhead, if in fact that's actually possible.
In general, B-Tree index performance has plenty to do with low-level optimizations, beyond things that are published in academic papers. I'm talking about things like micro-optimizing the number of CPU cache misses on internal pages, the use of interpolation search, and so on.