The heap is PostgreSQL’s default (and currently only built-in) table access
method. It stores tuples in an unordered collection of 8 kB pages. Each tuple
carries MVCC metadata (xmin, xmax, cid, ctid) that enables snapshot
isolation without read locks. Two critical optimizations – Heap-Only Tuples
(HOT) and TOAST – address the costs of updates and large values,
respectively.
| File | Purpose |
|---|---|
src/backend/access/heap/heapam.c |
Core heap operations: insert, delete, update, scan |
src/backend/access/heap/heapam_handler.c |
heap_tableam_handler() – wires TableAmRoutine callbacks |
src/backend/access/heap/heapam_visibility.c |
Snapshot visibility functions (HeapTupleSatisfiesMVCC, etc.) |
src/backend/access/heap/heaptoast.c |
TOAST storage for oversized attributes |
src/backend/access/heap/hio.c |
Heap I/O – finding free space, extending relations |
src/backend/access/heap/pruneheap.c |
Page pruning, dead tuple removal, HOT chain repair |
src/backend/access/heap/vacuumlazy.c |
Lazy VACUUM implementation |
src/backend/access/heap/visibilitymap.c |
Two bits per page: all-visible, all-frozen |
src/backend/access/heap/README.HOT |
Detailed design notes for HOT updates |
src/include/access/heapam.h |
HeapScanDescData, HeapTupleFreeze, PruneFreezeResult |
src/include/access/htup.h |
HeapTupleData – the in-memory tuple handle |
src/include/access/htup_details.h |
HeapTupleHeaderData, HeapTupleFields, flag bits |
src/include/access/hio.h |
RelationGetBufferForTuple() prototype |
Every heap page follows the standard PostgreSQL page layout:
0 8192
+------------------+------+------+-----+--------------+
| PageHeaderData | lp1 | lp2 | ... | free space |
| (24 bytes) | (4B) | (4B) | | |
+------------------+------+------+-----+--+-----------+
|
+--------------------------------+
v
+---------+--------+---------+---+--------+
| tuple N | ... | tuple 2 | | tuple 1|
+---------+--------+---------+---+--------+
8192
ItemIdData) grow forward from after the header.pd_lower marks the end of line pointers; pd_upper marks the start of
the lowest tuple. Free space is between them.Every on-disk tuple begins with HeapTupleHeaderData (23 bytes before
alignment):
// src/include/access/htup_details.h
typedef struct HeapTupleFields
{
TransactionId t_xmin; // inserting transaction
TransactionId t_xmax; // deleting/locking transaction
union {
CommandId t_cid; // insert/delete command ID
TransactionId t_xvac; // old-style VACUUM full xact
} t_field3;
} HeapTupleFields;
Key flags in t_infomask:
HEAP_XMIN_COMMITTED / HEAP_XMIN_INVALID – hint bits to skip pg_xact lookupsHEAP_XMAX_INVALID – tuple is live (no deleter)HEAP_HOT_UPDATED – this tuple has a HOT successorHEAP_ONLY_TUPLE – this tuple is heap-only (not indexed)heap_getnextslot() iterates pages sequentially. For each page:
ReadBufferExtended).HeapTupleSatisfiesVisibility().TupleTableSlot.Synchronized scans (syncscan.c) allow concurrent sequential scans on the
same relation to share I/O by starting at similar offsets.
heap_insert()
-> RelationGetBufferForTuple() // find page with enough free space (via FSM)
-> heap_prepare_insert() // set xmin, infomask, alignment
-> XLogInsert() // write WAL record
-> PageAddItemExtended() // copy tuple into page
-> MarkBufferDirty()
heap_update() is effectively delete-old + insert-new, with special handling
for HOT. It:
t_xmax on the old tuple to the current XID.t_ctid.HOT is the single most important heap optimization. When an UPDATE does not change any indexed column AND the new tuple fits on the same page:
HEAP_HOT_UPDATED.HEAP_ONLY_TUPLE – it has no index entry.t_ctid of the old tuple points to the new tuple.graph LR
IDX["Index Entry"]
LP1["lp[1]"]
V1["Tuple v1<br/>xmin=99, xmax=100<br/>HEAP_HOT_UPDATED<br/>t_ctid -> lp[3]"]
LP3["lp[3]"]
V2["Tuple v2<br/>xmin=100<br/>HEAP_ONLY_TUPLE"]
IDX --> LP1
LP1 --> V1
V1 -. "t_ctid" .-> LP3
LP3 --> V2
subgraph "Single Heap Page"
LP1
V1
LP3
V2
end
style IDX fill:#ffe6e6,stroke:#333
style V1 fill:#fff3e6,stroke:#333
style V2 fill:#e6ffe6,stroke:#333
Index scans follow HOT chains: after finding a TID via the index, the heap AM
follows t_ctid pointers on the page until it finds the version visible to
the current snapshot.
Pruning (pruneheap.c) reclaims dead HOT chain members during normal
page access, without requiring VACUUM. This is called page-level cleanup
or micro-vacuum.
HOT chain example (single page):
Index entry -> lp[1]
|
v
tuple v1 (xmax=100, HOT_UPDATED)
t_ctid -> lp[3]
|
v
tuple v2 (xmin=100, ONLY_TUPLE)
After pruning (if xmax=100 is visible to all):
lp[1] now redirects to lp[3] (LP_REDIRECT)
tuple v1's space is reclaimed
When a tuple exceeds roughly 2 kB (the TOAST threshold, ~1/4 of a page), PostgreSQL applies TOAST strategies to the largest variable-length attributes:
| Strategy | attstorage |
Behavior |
|---|---|---|
| PLAIN | p |
No TOAST, value must fit in a page |
| EXTENDED | x |
Compress first, then store out-of-line if still too big |
| EXTERNAL | e |
Store out-of-line without compression |
| MAIN | m |
Compress only; try hard to keep in-line |
Out-of-line values go to a TOAST table (pg_toast.pg_toast_<oid>), a
regular heap table with a B-tree index on (chunk_id, chunk_seq). The
original attribute is replaced with an 18-byte varatt_external pointer.
Key source: src/backend/access/heap/heaptoast.c and
src/include/access/toast_internals.h.
// src/include/access/htup.h
typedef struct HeapTupleData
{
uint32 t_len; // length of tuple (incl. header)
ItemPointerData t_self; // SelfItemPointer (page, offset)
Oid t_tableOid; // table the tuple came from
HeapTupleHeader t_data; // pointer to the actual tuple header+data
} HeapTupleData;
// src/include/access/heapam.h
typedef struct HeapScanDescData
{
TableScanDescData rs_base; // base class (relation, snapshot, nkeys...)
// ... fields for controlling read-stream, block ranges,
// page-at-a-time mode, and synchronized scan state
} HeapScanDescData;
visibilitymap.c maintains two bits per heap page in a side fork:
HeapTupleSatisfiesMVCC(tuple, snapshot)
|
+-- Is xmin committed?
| No -> Is xmin the current txn? (check cid)
| Yes -> Is xmax set and committed?
| No -> VISIBLE
| Yes -> Was xmax committed before snapshot?
| Yes -> INVISIBLE (deleted)
| No -> VISIBLE (delete not yet visible)
+-- Set hint bits if outcome is definitive
Lazy VACUUM (vacuumlazy.c) performs two phases:
TidStore.ambulkdelete()
to remove index entries pointing to dead TIDs.LP_UNUSED, update the FSM.Freezing replaces old xmin/xmax values with FrozenTransactionId to
prevent XID wraparound. The HeapTupleFreeze struct describes the freeze plan
for each tuple.
ReadBuffer() /
ReleaseBuffer(). The FSM and VM are also buffer-managed fork files.heapam_xlog.c defines redo routines for all heap modifications.
Critical for crash recovery.(key, TID) pairs. The TID points into the
heap. HOT is specifically designed to reduce index maintenance costs.heapam_handler.c registers all the callbacks that make
the heap the default TableAmRoutine.