PostgreSQL allocates a single large shared memory region at postmaster startup.
Every backend inherits a mapping to this region via fork(). The region holds
the buffer pool, lock tables, ProcArray, PGPROC slots, and dozens of other
structures. For parallel queries and background workers, a second mechanism –
dynamic shared memory (DSM) – creates additional segments on demand.
This page covers both the fixed shared memory allocator and the dynamic shared memory subsystem.
Shared memory in PostgreSQL has two layers:
Fixed shared memory (shmem.c) – A bump allocator that carves the startup
region into named chunks. Allocations are permanent; there is no free. Every
subsystem requests its space during CreateOrAttachShmemStructs().
Dynamic shared memory (dsm.c, dsm_impl.c) – On-demand segments created
after startup, primarily for parallel query. These segments have explicit
lifetimes and are cleaned up automatically when all mappings are removed.
graph TD
subgraph "Main Shared Memory Region"
Header["PGShmemHeader"]
ShmIdx["Shmem Index (Hash Table)"]
BufDesc["Buffer Descriptors"]
BufBlk["Buffer Blocks\n(NBuffers x 8KB)"]
Locks["Lock Partition\nHash Tables"]
PGP["PGPROC Array"]
PA["ProcArray +\nDense Arrays"]
CLOG["CLOG / Subtrans /\nMultixact Buffers"]
WAL["WAL Buffers"]
DSMCtrl["DSM Control Segment"]
Ext["Extension Shmem\n(pg_stat_statements, etc.)"]
Header --> ShmIdx --> BufDesc --> BufBlk --> Locks --> PGP --> PA --> CLOG --> WAL --> DSMCtrl --> Ext
end
subgraph "Dynamic Shared Memory"
DSMA["DSM Segment A\n(parallel query 1)\nshm_mq + plan state"]
DSMB["DSM Segment B\n(parallel query 2)\nshm_mq + plan state"]
end
DSMCtrl -. "tracks" .-> DSMA
DSMCtrl -. "tracks" .-> DSMB
| File | Role |
|---|---|
src/backend/storage/ipc/shmem.c |
Bump allocator, Shmem Index hash table |
src/backend/storage/ipc/ipci.c |
Startup orchestration, CalculateShmemSize() |
src/include/storage/shmem.h |
Public API: ShmemAlloc, ShmemInitStruct, ShmemInitHash |
src/include/storage/pg_shmem.h |
OS-level segment header (PGShmemHeader) |
src/backend/port/sysv_shmem.c |
System V / mmap shared memory creation (Unix) |
src/backend/storage/ipc/dsm.c |
DSM segment lifecycle management |
src/backend/storage/ipc/dsm_impl.c |
Platform backends for DSM |
src/include/storage/dsm.h |
DSM public API |
src/include/storage/dsm_impl.h |
DSM backend constants and dsm_op enum |
src/backend/storage/ipc/dsm_registry.c |
Named DSM segments that persist across backends |
The shared memory region starts with a PGShmemHeader at address zero, followed
by a contiguous pool of unallocated bytes. A global pointer (ShmemFreeStart)
advances monotonically as subsystems claim space:
PGShmemHeader
+------------------+
| totalsize | Total bytes in the region
| freeoffset | -> Points to first free byte (bump pointer)
| ... |
+------------------+
| Shmem Index (HT) | Hash table mapping names -> {location, size}
+------------------+
| Buffer Descs | Allocated by BufferShmemInit
+------------------+
| Buffer Blocks | The actual 8 KB pages
+------------------+
| Lock Tables |
+------------------+
| PGPROC Array |
+------------------+
| ProcArray |
+------------------+
| ...more... |
+------------------+
| (free space) |
+------------------+
ShmemAlloc(size) simply advances the free pointer by size (aligned to
MAXIMUM_ALIGNOF) and returns the old pointer. The spinlock ShmemLock protects
concurrent allocation, but in practice allocations only happen during startup
in the postmaster (single-threaded).
Each named structure is registered in the Shmem Index, a shared hash table
whose entries are ShmemIndexEnt:
/* src/include/storage/shmem.h */
typedef struct
{
char key[SHMEM_INDEX_KEYSIZE]; /* 48-byte string name */
void *location; /* address in shared memory */
Size size; /* requested size */
Size allocated_size; /* actual allocated size */
} ShmemIndexEnt;
ShmemInitStruct(name, size, &found) looks up name in the index. If the
entry does not exist, it allocates size bytes and inserts the entry. If it
already exists, it returns the existing pointer and sets found = true. This
two-phase pattern allows both the postmaster (which creates) and forked backends
(which attach) to use identical code paths. It is especially important on
EXEC_BACKEND platforms (Windows) where backends cannot inherit pointers from
the postmaster.
ipci.c contains CalculateShmemSize(), which calls every subsystem’s size
estimation function:
/* Abbreviated from src/backend/storage/ipc/ipci.c */
size = 100000; /* base overhead */
size = add_size(size, shmem_cache_size());
size = add_size(size, BufferShmemSize());
size = add_size(size, LockShmemSize());
size = add_size(size, ProcGlobalShmemSize());
size = add_size(size, ProcArrayShmemSize());
/* ... dozens more ... */
size = add_size(size, total_addin_request); /* extensions */
add_size() checks for overflow at each step. The resulting total is passed to
PGSharedMemoryCreate(), which allocates the OS-level segment.
The fixed region is sized at startup and cannot grow. Parallel queries need per-query memory (tuple queues, worker state, hash tables) that varies wildly in size and lifetime. DSM provides this.
A small area of the fixed shared memory holds the DSM control segment: an
array of dsm_control_item slots, one per active DSM segment:
/* src/backend/storage/ipc/dsm.c */
typedef struct dsm_control_item
{
dsm_handle handle; /* Randomly generated 32-bit ID */
uint32 control_slot;
void *impl_private_pm_handle;
bool pinned; /* Pinned to postmaster lifetime? */
} dsm_control_item;
The number of slots is PG_DYNSHMEM_FIXED_SLOTS + PG_DYNSHMEM_SLOTS_PER_BACKEND * MaxBackends.
dsm_create(size)
|
| 1. Pick a random dsm_handle
| 2. Call dsm_impl_op(DSM_OP_CREATE, ...) to create OS segment
| 3. Map the segment into our address space
| 4. Register in DSM control array
| 5. Register resource-owner cleanup callback
|
v
dsm_segment *seg --> seg->mapped_address (usable memory)
|
| Share seg->handle with workers (via bgworker args or shm_toc)
|
dsm_attach(handle) [in worker]
|
| 1. Find handle in DSM control array
| 2. Call dsm_impl_op(DSM_OP_ATTACH, ...)
| 3. Map into worker's address space
|
v
dsm_segment *seg --> same physical memory, different virtual address
|
dsm_detach(seg) [automatic or explicit]
|
| 1. Run on_dsm_detach callbacks
| 2. Call dsm_impl_op(DSM_OP_DETACH, ...)
| 3. When last mapping removed: DSM_OP_DESTROY
By default, a DSM segment is destroyed when its creator’s resource owner is released. Two pinning mechanisms extend the lifetime:
dsm_pin_mapping(seg) – Pins the mapping to the session. The segment stays
mapped even after the current resource owner is released.dsm_pin_segment(seg) – Pins the segment to the postmaster. It survives
until explicit dsm_unpin_segment() or postmaster shutdown.min_dynamic_shared_memory GUCWhen set to a nonzero value, PostgreSQL pre-allocates a pool of DSM space inside
the main shared memory region. Small DSM requests are carved from this pool using
a free-page manager (FreePageManager), avoiding the overhead of OS-level segment
creation. This is especially beneficial on systems where shm_open() or shmget()
has high latency.
/* src/backend/storage/ipc/dsm.c */
struct dsm_segment
{
dlist_node node; /* Link in backend-local list */
ResourceOwner resowner; /* Owning resource owner */
dsm_handle handle; /* 32-bit segment identifier */
uint32 control_slot; /* Slot in DSM control array */
void *impl_private; /* OS-level handle (fd, shmid, etc.) */
void *mapped_address; /* Virtual address of mapping */
Size mapped_size; /* Size of the mapping */
slist_head on_detach; /* Cleanup callback chain */
};
The dsm_impl_op() function dispatches to one of four platform backends:
| Constant | Mechanism | Notes |
|---|---|---|
DSM_IMPL_POSIX |
shm_open() / mmap() |
Default on Linux, macOS, FreeBSD. Uses /dev/shm on Linux. |
DSM_IMPL_SYSV |
shmget() / shmat() |
Fallback on older Unix systems. Subject to kernel SHMMAX/SHMALL limits. |
DSM_IMPL_MMAP |
open() / mmap() on files in pg_dynshmem/ |
Filesystem-backed. Useful when POSIX shm and SysV are unavailable. |
DSM_IMPL_WINDOWS |
CreateFileMapping() |
Windows-only. |
The GUC dynamic_shared_memory_type selects which backend to use. The default
is determined at compile time by DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE.
dsm_registry.c provides named DSM segments that any backend can look up
by name. This is useful for extensions that need a shared memory region but
cannot predict which backend will create it first. The registry stores
{name -> dsm_handle} mappings in a hash table in the main shared memory region.
+==============================================================+
| Main Shared Memory Region |
| |
| PGShmemHeader |
| +----------------------------------------------------------+ |
| | Shmem Index (hash table) | |
| +----------------------------------------------------------+ |
| | Buffer Descriptors (NBuffers * sizeof(BufferDesc)) | |
| +----------------------------------------------------------+ |
| | Buffer Blocks (NBuffers * BLCKSZ) | |
| +----------------------------------------------------------+ |
| | Lock Partition Hash Tables | |
| +----------------------------------------------------------+ |
| | PGPROC Array (MaxBackends + aux + prepared) | |
| +----------------------------------------------------------+ |
| | ProcArray + Dense Arrays (xids[], statusFlags[], ...) | |
| +----------------------------------------------------------+ |
| | CLOG Buffers, Subtrans Buffers, Multixact Buffers | |
| +----------------------------------------------------------+ |
| | WAL Buffers | |
| +----------------------------------------------------------+ |
| | DSM Control Segment | |
| +----------------------------------------------------------+ |
| | Pre-allocated DSM Pool (if min_dynamic_shared_memory > 0)| |
| +----------------------------------------------------------+ |
| | Extension shmem (pg_stat_statements, etc.) | |
| +----------------------------------------------------------+ |
| | Free space (unused) | |
| +----------------------------------------------------------+ |
+==============================================================+
+========================+ +========================+
| DSM Segment A | | DSM Segment B |
| (parallel query 1) | | (parallel query 2) |
| | | |
| shm_toc -> shm_mq x N | | shm_toc -> shm_mq x M |
| query state, params | | query state, params |
+========================+ +========================+
Fixed shmem never shrinks. Once allocated, a chunk in the main region is permanent. There is no deallocation. This simplifies the allocator to a single atomic bump pointer but means the region must be sized correctly at startup.
Same virtual address in every process. The main shared memory region is
mapped at the same address in every backend. This allows raw pointers to be
stored in shared structures (e.g., linked lists of PGPROC nodes).
DSM segments may map at different addresses. Unlike the main region, DSM
segments can appear at different virtual addresses in different processes. Code
that uses DSM must use offsets or shm_toc lookups, not raw pointers.
Cleanup is automatic. DSM segments are tied to resource owners. If a backend crashes, the postmaster detects the crash and cleans up orphaned segments at the next startup (or immediately if using the DSM control segment).
Latches and Wait Events – Latches (stored in PGPROC, which lives in fixed shared memory) are the signaling mechanism between processes.
ProcArray – The PGPROC array and ProcArray dense arrays are allocated from fixed shared memory during startup.
Message Queues – shm_mq queues are laid out inside
DSM segments, organized by shm_toc.
Chapter 1 (Storage) – The buffer pool is the single largest consumer of shared memory, often 90%+ of the total.
Chapter 5 (Locking) – LWLock tranches and heavyweight lock hash tables are allocated from fixed shared memory.
Chapter 10 (Memory) – DSA (Dynamic Shared Area) provides a palloc-style
allocator on top of DSM segments, used by parallel hash joins and other consumers.