LSM-Trees vs. B-Trees: The Tale of Two Delivery Companies
Mar 24, 2025
Once upon a time in DataTown, two delivery services, Lightning Speed Movers (LSM-Trees) and Balanced Routes Express (B-Trees), competed to be the fastest, most efficient delivery network. Both had very different ways of handling deliveries, and their unique styles made them a hot topic among the citizens.
⚡ Lightning Speed Movers (LSM-Trees)
At Lightning Speed Movers, their motto was: “Drop the packages off first, sort them later!”
How they operated:
-
Taking Orders: They jotted down every order quickly in a notebook (a log). No organizing upfront — just record and move on.
-
Delivering Packages: Packages were delivered in chaotic batches. Later, similar deliveries were grouped and optimized in what they called "compaction." Canceled orders? Crossed out and forgotten.
-
Finding a Package: Looking up an order involved flipping through multiple lists. To help, they used sticky notes (Bloom Filters) to guess where a package might be.
Strengths:
-
✅ Super fast for incoming orders.
-
✅ Great at handling delivery surges.
Weaknesses:
-
❌ Slower to find old packages.
-
❌ Needed frequent reorganizing.
🗂️ Balanced Routes Express (B-Trees)
Balanced Routes Express had a different motto: “Everything in its right place, always.”
How they operated:
-
Taking Orders: Every new order was placed immediately into a structured filing system — carefully sorted and stored.
-
Delivering Packages: No need for later reorganization. The routes were already optimized. Canceled orders? Instantly removed from the system.
-
Finding a Package: Packages could be located quickly using their neatly organized system. No guessing — just direct access.
Strengths:
-
✅ Extremely fast when looking up package status.
-
✅ Always balanced and efficient.
Weaknesses:
-
❌ Takes a bit longer to insert new orders.
-
❌ Not ideal for handling massive bursts of traffic.
🎯 The Big Showdown: Holiday Rush
A holiday sale flooded both companies with thousands of orders.
-
Lightning Speed Movers (LSM-Trees): Took in orders rapidly, but struggled when customers wanted tracking updates.
-
Balanced Routes Express (B-Trees): Took in orders slower, but tracking was instant thanks to its structured system.
Who Wins?
-
If you need high-speed writes for a massive influx of data (e.g., NoSQL like Cassandra, RocksDB, LevelDB), LSM-Trees are your best bet.
-
If you need structured, predictable lookups with a balance of reads and writes (e.g., MySQL, PostgreSQL), B-Trees are the way to go.
🛠️ Technical Breakdown: How LSM-Trees and B-Trees Work
LSM-Trees (Log-Structured Merge-Trees)
-
Writes: New data is appended to an in-memory table (MemTable) and later flushed to disk as an SSTable. LSM-Trees never modify files in place; instead, they append new data and later delete obsolete files through compaction.
-
Reads: Requires searching the MemTable first, then checking multiple SSTables.
-
Compaction: Periodic merging of SSTables to remove duplicates and optimize reads.
-
Segments: Data is stored in variable-sized segments, which grow until compaction merges them into optimized structures.
✔️ Best for high write throughput, great for NoSQL and logging systems.
❌ Reads can be slow when multiple SSTables need searching.
B-Trees (Balanced Trees)
-
Writes: Data is inserted into a balanced tree, ensuring sorted order. Unlike LSM-Trees, B-Trees overwrite data in place without changing the location of a page, meaning all references to that page remain intact.
-
Reads: Direct, predictable lookups with O(log n) complexity.
-
Rebalancing: The tree restructures itself as data is added or deleted.
-
Segments: Data is stored in fixed-size blocks (pages), which ensure consistent lookup performance and structured storage.
✔️ Best for fast lookups and databases requiring quick data retrieval.
❌ Writes are slower as data must be structured properly upfront.
For a deeper dive into these concepts, refer to the book "Designing Data-Intensive Applications" by Martin Kleppmann.
TL;DR: Quick Cheat Sheet