ποΈ 22012025 2351
ABSTRACT
Chapter 3 of DDIA talks about the details of how databases manage data on the disk - with the goal of making reads and writes efficient, even with large amounts of data
This is important because you need to know:
- how to select a storage engine that is appropriate for your application, from the many that are available
 - storage engine tuning > so that DB can perform well on your kind of workload
 
To break it down into smaller notes
- 
Hash Indexes
- A simple way to map keys to their values using a hash table.
 - Works well for fast lookups but struggles with range queries (e.g., "find all keys between X and Y").
 
 - 
SSTables (Sorted String Tables)
- A data structure where key-value pairs are sorted and stored in files on disk.
 - Because the data is sorted, SSTables are efficient for range queries and merge operations.
 
 - 
LSM Trees (Log-Structured Merge Trees)
- A write-optimized data structure that uses SSTables in a layered, hierarchical way.
 - Writes are first stored in memory (e.g., in a memtable) and periodically flushed to disk as sorted SSTables.
 - Efficiently handles writes and compacts data over time to optimize reads.
 
 
How They All Connectβ
- Hash Indexes are basic but fast for key-value lookups. Theyβre good for simple use cases but lack advanced features like range queries.
 - SSTables solve range query issues by storing sorted data on disk. However, you need to handle compaction and merging.
 - LSM Trees build on SSTables, organizing them into levels or tiers, enabling highly efficient writes while keeping reads manageable.
 
Key Takeawaysβ
- Hash Indexes: Fast, simple, but not flexible.
 - SSTables: Efficient range queries with immutable, sorted storage.
 - LSM Trees: Write-optimized solution that combines SSTables with compaction and layering.
 
Referencesβ
- DDIA Chapter 3
 - ChatGPT