LSM-Trees vs. B-Trees: The Tale of Two Delivery Companies

Mar 24, 2025

LSM-Trees vs. B-Trees

Once upon a time in DataTown, two delivery services, Lightning Speed Movers (LSM-Trees) and Balanced Routes Express (B-Trees), competed to be the fastest, most efficient delivery network. Both had very different ways of handling deliveries, and their unique styles made them a hot topic among the citizens.


⚡ Lightning Speed Movers (LSM-Trees)

At Lightning Speed Movers, their motto was: “Drop the packages off first, sort them later!”

How they operated:

  • Taking Orders: They jotted down every order quickly in a notebook (a log). No organizing upfront — just record and move on.

  • Delivering Packages: Packages were delivered in chaotic batches. Later, similar deliveries were grouped and optimized in what they called "compaction." Canceled orders? Crossed out and forgotten.

  • Finding a Package: Looking up an order involved flipping through multiple lists. To help, they used sticky notes (Bloom Filters) to guess where a package might be.

Strengths:

  • ✅ Super fast for incoming orders.

  • ✅ Great at handling delivery surges.

Weaknesses:

  • ❌ Slower to find old packages.

  • ❌ Needed frequent reorganizing.


🗂️ Balanced Routes Express (B-Trees)

Balanced Routes Express had a different motto: “Everything in its right place, always.”

How they operated:

  • Taking Orders: Every new order was placed immediately into a structured filing system — carefully sorted and stored.

  • Delivering Packages: No need for later reorganization. The routes were already optimized. Canceled orders? Instantly removed from the system.

  • Finding a Package: Packages could be located quickly using their neatly organized system. No guessing — just direct access.

Strengths:

  • ✅ Extremely fast when looking up package status.

  • ✅ Always balanced and efficient.

Weaknesses:

  • ❌ Takes a bit longer to insert new orders.

  • ❌ Not ideal for handling massive bursts of traffic.


🎯 The Big Showdown: Holiday Rush

A holiday sale flooded both companies with thousands of orders.

  • Lightning Speed Movers (LSM-Trees): Took in orders rapidly, but struggled when customers wanted tracking updates.

  • Balanced Routes Express (B-Trees): Took in orders slower, but tracking was instant thanks to its structured system.

Who Wins?

  • If you need high-speed writes for a massive influx of data (e.g., NoSQL like Cassandra, RocksDB, LevelDB), LSM-Trees are your best bet.

  • If you need structured, predictable lookups with a balance of reads and writes (e.g., MySQL, PostgreSQL), B-Trees are the way to go.


🛠️ Technical Breakdown: How LSM-Trees and B-Trees Work

LSM-Trees (Log-Structured Merge-Trees)

  • Writes: New data is appended to an in-memory table (MemTable) and later flushed to disk as an SSTable. LSM-Trees never modify files in place; instead, they append new data and later delete obsolete files through compaction.

  • Reads: Requires searching the MemTable first, then checking multiple SSTables.

  • Compaction: Periodic merging of SSTables to remove duplicates and optimize reads.

  • Segments: Data is stored in variable-sized segments, which grow until compaction merges them into optimized structures.

✔️ Best for high write throughput, great for NoSQL and logging systems.

❌ Reads can be slow when multiple SSTables need searching.

B-Trees (Balanced Trees)

  • Writes: Data is inserted into a balanced tree, ensuring sorted order. Unlike LSM-Trees, B-Trees overwrite data in place without changing the location of a page, meaning all references to that page remain intact.

  • Reads: Direct, predictable lookups with O(log n) complexity.

  • Rebalancing: The tree restructures itself as data is added or deleted.

  • Segments: Data is stored in fixed-size blocks (pages), which ensure consistent lookup performance and structured storage.

✔️ Best for fast lookups and databases requiring quick data retrieval.

❌ Writes are slower as data must be structured properly upfront.


For a deeper dive into these concepts, refer to the book "Designing Data-Intensive Applications" by Martin Kleppmann.

TL;DR: Quick Cheat Sheet LSM-Trees vs. B-Trees