A wizard-level exploration of the PostgreSQL codebase, from SQL string to disk blocks and back.
How to Read This Book
Each chapter follows a zoom-in / zoom-out pattern:
- Chapter index — bird’s-eye overview, key concepts, how the subsystem fits into PG as a whole
- Topic pages — deep dives into specific mechanisms, with source file references (
file:line), struct layouts, and diagrams
- Connections section at the bottom of every page — links back out to related subsystems
You can read linearly or jump to any topic. The dependency arrows in each chapter index will guide you.
Prerequisites
- Comfortable reading C code
- Basic understanding of operating systems (processes, virtual memory, file I/O)
- Familiarity with SQL and relational databases
- A cloned PostgreSQL source tree (this book references
src/ paths throughout)
Acknowledgments
Built by studying the PostgreSQL source code, READMEs in src/backend/*/README, and the following references:
Table of Contents
Architecture
- Memory Layout
- Process Model
- Query Lifecycle
Storage Engine
- Async I/O
- Buffer Manager
- Free Space Map
- Page Layout
- smgr and Forks
- Visibility Map
Access Methods
- BRIN Index
- B-tree Index
- GIN Index
- GiST Index
- Hash Index
- Heap Access Method
- SP-GiST Index
- Table AM API
Transactions & MVCC
- CLOG and Subtransactions
- Isolation Levels
- MVCC and Tuple Versioning
- Snapshots
- Serializable Snapshot Isolation
- Two-Phase Commit
Write-Ahead Logging
- Checkpoints
- Recovery
- WAL for Extensions
- WAL Internals
Locking
- Deadlock Detection
- Heavyweight Locks
- Lightweight Locks
- Predicate Locks
- Spinlocks
Parsing & Rewriting
- Lexer & Parser
- Rewrite Rules
- Semantic Analysis
Query Optimizer
- Cost Model
- GEQO
- Join Ordering
- Path Generation
- Plan Creation
- Preprocessing
Executor
- Aggregation
- Expression Evaluation
- Join Nodes
- Parallel Query
- Scan Nodes
- Sort and Materialize
- Volcano Iterator Model
Caches
- Catalog Cache
- Invalidation
- Plan Cache
- Relation Cache
- Type Cache
Memory Management
- Dynamic Shared Areas
- Memory Contexts
- Resource Owners
IPC
- Latches and Wait Events
- Message Queues
- ProcArray and PGPROC
- Shared Memory
Replication
- Conflict Resolution
- Logical Replication
- Streaming Replication
- Synchronous Replication
Statistics & Monitoring
- Activity Monitoring
- Extended Statistics
- pg_statistic and Single-Column Statistics
Platform Layer
- Atomic Operations and Memory Barriers
- I/O Backends
- Portability and OS Abstraction
- SIMD, CRC, and Hardware Acceleration
Extensions
- Background Workers
- Custom Access Methods
- Foreign Data Wrappers
- Hooks