The Commit Log (CLOG, stored in pg_xact/) records the final status of every transaction: in-progress, committed, aborted, or sub-committed. The subtransaction log (pg_subtrans/) maps each subtransaction to its parent. Both are implemented using the SLRU (Simple LRU) buffered access mechanism.
| File | Purpose |
|---|---|
src/backend/access/transam/clog.c |
CLOG read/write implementation |
src/include/access/clog.h |
Transaction status constants, API |
src/backend/access/transam/subtrans.c |
Subtransaction parent mapping |
src/include/access/subtrans.h |
SubTransSetParent(), SubTransGetTopmostTransaction() |
src/backend/access/transam/slru.c |
SLRU buffer management framework |
src/include/access/slru.h |
SlruSharedData, page status enum |
Each transaction gets a 2-bit status in the CLOG:
#define TRANSACTION_STATUS_IN_PROGRESS 0x00
#define TRANSACTION_STATUS_COMMITTED 0x01
#define TRANSACTION_STATUS_ABORTED 0x02
#define TRANSACTION_STATUS_SUB_COMMITTED 0x03
The SUB_COMMITTED status is special: it means a subtransaction has committed locally, but its parent transaction has not yet committed. The subtransaction’s effects only become visible when the top-level transaction commits.
With 2 bits per transaction, each 8 KB CLOG page holds status for 32,768 transactions (8192 bytes x 4 statuses per byte). Pages are stored as files in pg_xact/ (historically pg_clog/), named as zero-padded hex segment numbers (e.g., 0000, 0001).
flowchart TD
subgraph "CLOG Page (8 KB)"
BYTE1["Byte 0:<br/>XID 0-3<br/>(2 bits each)"]
BYTE2["Byte 1:<br/>XID 4-7"]
DOTS["..."]
BYTEN["Byte 8191:<br/>XID 32764-32767"]
end
subgraph "On Disk: pg_xact/"
SEG0["0000<br/>(XIDs 0 - ~8M)"]
SEG1["0001<br/>(XIDs ~8M - ~16M)"]
SEG2["0002<br/>(...)"]
end
Writing: When a transaction commits, TransactionIdSetTreeStatus() sets the status for the top-level XID and all its subtransaction XIDs atomically (from CLOG’s perspective). Subtransactions are set to COMMITTED, and the top-level transaction is set last. The order matters for crash safety – if the system crashes after setting subtransactions but before setting the parent, the parent defaults to IN_PROGRESS (initial zero state), and recovery treats everything as aborted.
Reading: TransactionIdGetStatus() reads the 2-bit status for a given XID. This is called during visibility checks when hint bits are not yet set on the tuple.
The first time a visibility check encounters a tuple without hint bits, it calls TransactionIdDidCommit() which reads from CLOG. Once the status is known, hint bits are set on the tuple header, avoiding future CLOG lookups for that tuple:
flowchart TD
VIS["Visibility check"] --> HINTS{"Hint bits set?"}
HINTS -->|Yes| DONE["Use cached status"]
HINTS -->|No| CLOG["TransactionIdGetStatus() from CLOG"]
CLOG --> STATUS{"Status?"}
STATUS -->|COMMITTED| SET_COMMITTED["Set HEAP_XMIN_COMMITTED"]
STATUS -->|ABORTED| SET_ABORTED["Set HEAP_XMIN_INVALID"]
STATUS -->|IN_PROGRESS| WAIT["Cannot set hints yet"]
STATUS -->|SUB_COMMITTED| PARENT["Look up parent via pg_subtrans"]
PARENT --> STATUS
SET_COMMITTED --> DONE
SET_ABORTED --> DONE
The subtransaction log maps each subtransaction XID to its immediate parent XID. This allows PostgreSQL to walk up the parent chain to find the top-level transaction for any given XID.
void SubTransSetParent(TransactionId xid, TransactionId parent);
TransactionId SubTransGetParent(TransactionId xid);
TransactionId SubTransGetTopmostTransaction(TransactionId xid);
Each entry is a 4-byte TransactionId (the parent XID). With 8 KB pages, each page holds 2048 entries. Files are stored in pg_subtrans/.
Subtransaction parent lookups are needed when:
A snapshot’s subxip[] array has overflowed (suboverflowed = true) and a visibility check encounters an XID not in the main xip[] array. The check must determine whether the XID is a subtransaction of a known in-progress transaction.
The CLOG status is SUB_COMMITTED and the system needs to find the top-level transaction to check its commit status.
Each backend caches up to PGPROC_MAX_CACHED_SUBXIDS (64) subtransaction XIDs in its PGPROC struct:
#define PGPROC_MAX_CACHED_SUBXIDS 64
struct XidCache
{
TransactionId xids[PGPROC_MAX_CACHED_SUBXIDS];
};
When a snapshot is taken, these cached XIDs are copied into the snapshot’s subxip[] array. If any backend has more than 64 active subtransactions, suboverflowed is set, and all backends that take snapshots after that point must be prepared to fall back to pg_subtrans lookups.
Both CLOG and subtrans are built on the SLRU framework, which provides a simple LRU buffer pool for page-structured data stored on disk.
typedef struct SlruSharedData
{
int num_slots; /* number of buffer slots */
char **page_buffer; /* page data arrays */
SlruPageStatus *page_status; /* EMPTY, READ_IN_PROGRESS, VALID, WRITE_IN_PROGRESS */
bool *page_dirty;
int64 *page_number;
int *page_lru_count;
LWLockPadded *buffer_locks; /* per-buffer I/O locks */
LWLockPadded *bank_locks; /* bank-level access locks */
pg_atomic_uint64 latest_page_number;
} SlruSharedData;
| State | Meaning |
|---|---|
SLRU_PAGE_EMPTY |
Slot not in use |
SLRU_PAGE_READ_IN_PROGRESS |
I/O in progress reading page from disk |
SLRU_PAGE_VALID |
Page loaded and available |
SLRU_PAGE_WRITE_IN_PROGRESS |
Being written to disk |
Modern PostgreSQL partitions SLRU slots into banks, each with its own LWLock. This reduces contention compared to the original single-lock design. Victim selection happens within a bank, using per-bank LRU counters.
flowchart TD
subgraph "SLRU Buffer Pool"
subgraph "Bank 0"
S0["Slot 0<br/>Page 42"]
S1["Slot 1<br/>Page 17"]
S2["Slot 2<br/>EMPTY"]
end
subgraph "Bank 1"
S3["Slot 3<br/>Page 99"]
S4["Slot 4<br/>Page 3"]
S5["Slot 5<br/>Page 55"]
end
end
DISK["pg_xact/ files on disk"]
S0 -.->|"read/write"| DISK
S3 -.->|"read/write"| DISK
| SLRU Instance | Directory | Entry Size | Purpose |
|---|---|---|---|
| CLOG | pg_xact/ |
2 bits | Transaction commit status |
| Subtrans | pg_subtrans/ |
4 bytes | Parent XID mapping |
| MultiXact Offsets | pg_multixact/offsets/ |
4 bytes | MultiXactId to offset |
| MultiXact Members | pg_multixact/members/ |
variable | Members of a MultiXact |
| Commit Timestamp | pg_commit_ts/ |
12 bytes | Commit timestamps |
CLOG and subtrans pages for XIDs older than the global xmin horizon are no longer needed. PostgreSQL truncates them during VACUUM:
TruncateCLOG(oldestXact) removes pages for transactions older than oldestXactTruncateSUBTRANS(oldestXact) does the same for the subtransaction logCLOG page creation and truncation are WAL-logged:
#define CLOG_ZEROPAGE 0x00 /* new page initialized to zeros */
#define CLOG_TRUNCATE 0x10 /* old pages removed */
However, individual status changes (marking a transaction committed) within an existing page are not separately WAL-logged. The commit status is recoverable from the transaction’s commit WAL record – during recovery, PostgreSQL replays commit records and sets the CLOG status accordingly.
| Structure | Location | Role |
|---|---|---|
XidStatus |
clog.h |
2-bit transaction status (int typedef) |
xl_clog_truncate |
clog.h |
WAL record for CLOG truncation |
SlruSharedData |
slru.h |
Shared-memory SLRU buffer pool |
XidCacheStatus |
proc.h |
Per-backend: count + overflow flag for subxids |
XidCache |
proc.h |
Per-backend: cached subtransaction XIDs |
suboverflowed is true, visibility falls back to pg_subtrans lookups. See Snapshots.