The Free Space Map (FSM) is a per-relation data structure that tracks the amount of free space available on each heap page. It enables INSERT to quickly find a page with enough room for a new tuple, without scanning the entire relation.
Each relation’s FSM is stored in a dedicated fork (FSM_FORKNUM, file suffix _fsm). Rather than recording exact byte counts, the FSM stores free space at a granularity of BLCKSZ / 256 (32 bytes with the default 8 KB page size), using a single byte per heap page. A value of 255 means nearly a full page is free; 0 means the page is full.
The FSM is organized as a three-level tree of pages, where each FSM page internally contains a binary tree (max-heap) of byte values. This design allows finding a page with sufficient free space in O(log N) time, where N is the number of heap pages.
The FSM is not WAL-logged. It is a hint structure that self-corrects: if it is lost or corrupted, the worst case is temporarily suboptimal page selection. VACUUM rebuilds it from scratch.
| File | Purpose |
|---|---|
src/include/storage/freespace.h |
Public API: GetPageWithFreeSpace(), RecordPageWithFreeSpace() |
src/include/storage/fsm_internals.h |
FSMPageData struct, per-page binary tree constants |
src/backend/storage/freespace/freespace.c |
Higher-level FSM logic: tree traversal across pages |
src/backend/storage/freespace/fsmpage.c |
Within-page binary tree operations: fsm_search_avail(), fsm_set_avail() |
src/backend/storage/freespace/indexfsm.c |
Simplified FSM interface for index free space |
src/backend/storage/freespace/README |
Comprehensive design document for the FSM |
Free space is encoded as a single byte using integer division:
stored_value = free_bytes / (BLCKSZ / 256)
= free_bytes / 32 (for BLCKSZ = 8192)
This means the FSM can distinguish free space in 32-byte increments. When searching for space to hold a tuple of size S bytes, the search value is:
search_value = (S + 31) / 32 (rounding up)
Each FSM page contains a binary tree stored as an array in FSMPageData:
typedef struct
{
int fp_next_slot; /* round-robin starting slot */
uint8 fp_nodes[FLEXIBLE_ARRAY_MEMBER]; /* binary tree array */
} FSMPageData;
The tree has two regions in fp_nodes[]:
0 to NonLeafNodesPerPage - 1: internal (non-leaf) nodes storing the maximum value of their children.NonLeafNodesPerPage to NodesPerPage - 1: leaf nodes, each representing one heap page.With default BLCKSZ = 8192:
| Constant | Value | Formula |
|---|---|---|
NodesPerPage |
~8140 | BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - offsetof(FSMPageData, fp_nodes) |
NonLeafNodesPerPage |
4095 | BLCKSZ / 2 - 1 |
LeafNodesPerPage |
~4045 | NodesPerPage - NonLeafNodesPerPage |
SlotsPerFSMPage |
~4045 | Same as LeafNodesPerPage |
The binary tree is not quite perfect because the page header consumes some space, so a few rightmost leaf nodes are missing.
graph TD
subgraph "FSM Page (binary tree in array)"
R["Root [0]<br/>max=200"]
L1["[1]<br/>max=200"]
R1["[2]<br/>max=150"]
L2["[3]<br/>200"]
R2["[4]<br/>100"]
L3["[5]<br/>150"]
R3["[6]<br/>50"]
end
R --- L1
R --- R1
L1 --- L2
L1 --- R2
R1 --- L3
R1 --- R3
style L2 fill:#c8e6c9
style R2 fill:#fff9c4
style L3 fill:#c8e6c9
style R3 fill:#fff9c4
note["Leaf nodes (green/yellow)<br/>= one heap page each<br/>Internal nodes = max of children"]
Search (fsm_search_avail): Starting from the root, descend to a leaf where value >= minvalue. When both children qualify, use fp_next_slot to alternate, spreading concurrent inserts across different pages.
Update (fsm_set_avail): Set a leaf’s value, then bubble up by recomputing each ancestor as max(left_child, right_child) until reaching the root or finding no change.
A single FSM page with ~4045 slots can track ~4045 heap pages (~31 MB). To support PostgreSQL’s maximum relation size of 2^32 blocks (~32 TB), the FSM uses a three-level tree of pages:
| Level | Contains | Represents |
|---|---|---|
| 0 (leaf) | Leaf values = free space on individual heap pages | ~4045 heap pages per FSM page |
| 1 | Leaf values = root values from level-0 pages | ~4045 level-0 pages |
| 2 (root) | Leaf values = root values from level-1 pages | ~4045 level-1 pages |
The root page is always at physical block 0 of the FSM fork. With three levels: 4045^3 > 66 billion > 2^32, so three levels always suffice.
graph TD
subgraph "Level 2 (root FSM page, block 0)"
R2ROOT["Root of level-2 page<br/>= max free space in entire relation"]
R2L0["Slot 0: max from L1-page-0"]
R2L1["Slot 1: max from L1-page-1"]
end
subgraph "Level 1 (FSM pages)"
L1P0["L1 Page 0"]
L1P0S0["Slot 0: max from L0-page-0"]
L1P0S1["Slot 1: max from L0-page-1"]
end
subgraph "Level 0 (leaf FSM pages)"
L0P0["L0 Page 0"]
L0P0S["Slots = free space<br/>for heap pages 0..4044"]
L0P1["L0 Page 1"]
L0P1S["Slots = free space<br/>for heap pages 4045..8089"]
end
R2L0 --> L1P0
L1P0S0 --> L0P0
L1P0S1 --> L0P1
subgraph "Heap Pages"
HP["heap page 0, 1, 2, ... 4044"]
end
L0P0S --> HP
The physical block number for a given (level, logical_page) is computed by a formula that counts how many pages at each level precede it:
block = n + (n/F + 1) + (n/F^2 + 1) + ... + 1
where F = SlotsPerFSMPage and n is the logical page number at level 0.
fsm_search_avail() to find a qualifying slot.Only one FSM page is locked (shared lock) at a time; the parent is released before locking the child. If the child has been concurrently modified and no longer has enough space, the search restarts from the root after correcting the parent’s stale value.
When VACUUM (or INSERT after a failed search) learns the actual free space on a heap page:
fsm_set_avail() to update the leaf and bubble up within the page.The FSM is not WAL-logged. Recovery mechanisms:
fsm_rebuild_page()).FreeSpaceMapVacuumRange() to propagate upward.wal_log_hints is enabled.GetPageWithFreeSpace() to find a target page. If no page has enough space, the relation is extended.FSM_FORKNUM.16384_fsm), managed by the same smgr/md infrastructure.PageHeaderData header, but the content area is the FSMPageData binary tree rather than line pointers and tuples.