PostgreSQL implements concurrency control through Multi-Version Concurrency Control (MVCC), a design where readers never block writers and writers never block readers. Rather than locking rows in place, PostgreSQL keeps multiple physical versions of each row and uses visibility rules to determine which version a given transaction can see.
This chapter walks through the full machinery that makes this possible – from the transaction ID stamped on every tuple, through the snapshot mechanism that decides visibility, to the commit log that records final outcomes.
Traditional lock-based concurrency (Strict Two-Phase Locking) forces readers to wait for writers and vice versa. MVCC eliminates this contention at the cost of storing old row versions and periodically cleaning them up (VACUUM). The tradeoff is overwhelmingly positive for read-heavy workloads, which describes most OLTP systems.
| Section | What It Covers |
|---|---|
| MVCC and Tuple Versioning | HeapTupleHeaderData, xmin/xmax, t_infomask hint bits, visibility rules |
| Snapshots | SnapshotData, ProcArray, GetSnapshotData(), xmin horizon |
| Isolation Levels | Read Committed vs Repeatable Read vs Serializable behavior |
| Serializable Snapshot Isolation | Cahill/Fekete algorithm, predicate locks, rw-conflict detection |
| CLOG and Subtransactions | Commit log (pg_xact), subtransaction tracking (pg_subtrans), SLRU buffer |
| Two-Phase Commit | PREPARE TRANSACTION, GlobalTransactionData, crash recovery |
| File | Purpose |
|---|---|
src/include/access/htup_details.h |
HeapTupleHeaderData, infomask flags |
src/include/utils/snapshot.h |
SnapshotData, SnapshotType enum |
src/include/storage/proc.h |
PGPROC shared-memory struct |
src/include/storage/procarray.h |
GetSnapshotData(), TransactionIdIsInProgress() |
src/backend/access/transam/xact.c |
Top-level transaction lifecycle |
src/backend/access/transam/clog.c |
Commit log read/write |
src/backend/access/transam/subtrans.c |
Subtransaction parent mapping |
src/backend/access/transam/twophase.c |
Two-phase commit |
src/backend/storage/lmgr/predicate.c |
SSI predicate lock manager |
src/backend/access/heap/heapam_visibility.c |
HeapTupleSatisfiesMVCC() and friends |
flowchart TD
subgraph "Transaction Lifecycle"
BEGIN["BEGIN"] --> ASSIGN["Assign XID (lazy)"]
ASSIGN --> OPS["INSERT / UPDATE / DELETE"]
OPS --> COMMIT["COMMIT"]
COMMIT --> CLOG["Write status to CLOG"]
end
subgraph "Tuple Versioning"
OPS -->|"stamps xmin"| INSERT_TUPLE["New tuple version"]
OPS -->|"stamps xmax"| OLD_TUPLE["Old tuple version"]
end
subgraph "Visibility Decision"
QUERY["SELECT"] --> SNAP["Get Snapshot"]
SNAP --> VIS{"HeapTupleSatisfiesMVCC()"}
VIS -->|"check xmin/xmax against snapshot"| VISIBLE["Return row"]
VIS -->|"not visible"| SKIP["Skip row"]
VIS -->|"hint bit unknown"| CLOG_LOOKUP["Look up CLOG"]
CLOG_LOOKUP --> SET_HINT["Set hint bits on tuple"]
SET_HINT --> VIS
end
PostgreSQL uses 32-bit transaction IDs that wrap around. The FullTransactionId (64-bit, epoch + xid) prevents ambiguity. All XID comparisons use modular arithmetic defined in src/include/access/transam.h:
/* TransactionIdPrecedes -- is id1 logically before id2? */
#define TransactionIdPrecedes(id1, id2) \
((int32) ((id1) - (id2)) < 0)
This means roughly 2 billion transactions can exist between the oldest active transaction and the newest before wraparound becomes dangerous – the reason autovacuum exists and why xid wraparound is a production concern.