Memory contexts prevent memory leaks; resource owners prevent everything-else leaks — buffer pins, locks, file descriptors, cache references, and DSM segments are all guaranteed to be released at transaction end, even if the query fails mid-execution.
A ResourceOwner is a container that tracks non-memory resources acquired during query
execution. Just like memory contexts form a tree of lifetimes for memory, resource owners
form a parallel tree of lifetimes for resources such as buffer pins, heavyweight and
lightweight locks, relation cache references, catalog cache references, tuple
descriptors, file descriptors, and DSM segments.
When a portal closes or a transaction ends, ResourceOwnerRelease() walks the tree and
releases all tracked resources in a well-defined three-phase order: first resources that
are visible to other backends (buffer pins, DSM segments), then locks, then
backend-internal resources (cache references, files). This guarantees that when a lock
is released, no externally-visible state depends on it.
Consider a query that pins three buffers, acquires two locks, opens a file for a sort run, and grabs several catalog cache references. If the query fails at any point, all of these resources must be released. Without centralized tracking, every function in the call chain would need error-handling code to release the specific resources it acquired. This is fragile and error-prone.
The ResourceOwner acts as a ledger. When a buffer is pinned, the pin is registered
with CurrentResourceOwner. When the pin is released normally, it is deregistered.
If the transaction aborts before the pin is released, ResourceOwnerRelease() finds
the orphaned pin and releases it, emitting a WARNING at commit (since resources should
have been properly released).
Resource owners mirror the transaction/subtransaction/portal nesting:
TopTransactionResourceOwner
+-- CurTransactionResourceOwner (same as Top at top level)
| +-- Portal ResourceOwner
| +-- Portal ResourceOwner
+-- Subtransaction ResourceOwner
+-- Portal ResourceOwner
When a child resource owner is released (e.g., portal closes), its remaining resources transfer to the parent (for locks on commit) or are released outright (for most resources on abort).
| File | Role |
|---|---|
src/include/utils/resowner.h |
Public API: ResourceOwnerDesc, phases, priorities, ResourceOwnerRemember/Forget |
src/backend/utils/resowner/resowner.c |
Implementation: ResourceOwnerData, hash table, release logic |
src/backend/utils/resowner/README |
Design rationale and usage guide |
Every tracked resource type is described by a ResourceOwnerDesc:
typedef struct ResourceOwnerDesc
{
const char *name; /* for debugging */
ResourceReleasePhase release_phase; /* BEFORE_LOCKS, LOCKS, or AFTER_LOCKS */
ResourceReleasePriority release_priority; /* ordering within phase */
void (*ReleaseResource)(Datum res); /* cleanup callback */
char *(*DebugPrint)(Datum res); /* optional: for leak warnings */
} ResourceOwnerDesc;
Resources are tracked through three operations:
/* Before acquiring a resource, ensure the hash table has space */
ResourceOwnerEnlarge(CurrentResourceOwner);
/* After acquiring the resource, register it */
ResourceOwnerRemember(CurrentResourceOwner, PointerGetDatum(myresource), &myresource_desc);
/* When releasing the resource normally */
ResourceOwnerForget(CurrentResourceOwner, PointerGetDatum(myresource), &myresource_desc);
The Enlarge / Remember / Forget pattern exists because Remember must not fail
(it may be called after a resource is already acquired), so Enlarge pre-allocates
hash table space while failure is still safe.
ResourceOwnerRelease() must be called three times, once per phase:
Phase 1: RESOURCE_RELEASE_BEFORE_LOCKS
Resources that are visible to other backends must be released before locks. If locks were released first, another backend could see inconsistent state.
| Resource | Priority | Why Before Locks |
|---|---|---|
| Buffer I/O (in-progress) | 100 | Must complete or cancel I/O before unpinning |
| Buffer pins | 200 | Other backends may be waiting to evict the page |
| Relcache references | 300 | Must close relation before releasing lock on it |
| DSM segments | 400 | Shared memory visible to other processes |
| JIT contexts | 500 | Backend-internal but involves shared state |
| Crypto/HMAC contexts | 600-700 | May hold OS resources |
Phase 2: RESOURCE_RELEASE_LOCKS
Lock release is handled specially. On commit, locks transfer to the parent resource owner (they persist until end of transaction). On abort, locks are actually released.
Phase 3: RESOURCE_RELEASE_AFTER_LOCKS
Backend-internal resources that no other process can see:
| Resource | Priority | Description |
|---|---|---|
| Catcache references | 100 | Catalog cache entry reference counts |
| Catcache list references | 200 | Catalog cache list reference counts |
| Plan cache references | 300 | Cached plan reference counts |
| Tuple descriptor references | 400 | Reference-counted tuple descriptors |
| Snapshot references | 500 | Registered snapshots |
| Files | 600 | Open file descriptors |
| Wait event sets | 700 | Event monitoring handles |
The ResourceOwnerData struct uses a hybrid storage strategy optimized for the common
case (few resources, short-lived):
struct ResourceOwnerData
{
ResourceOwner parent;
ResourceOwner firstchild;
ResourceOwner nextchild;
const char *name;
bool releasing; /* true during ResourceOwnerRelease */
bool sorted; /* true after sorting for release */
uint8 nlocks; /* locks in the local cache */
uint8 narr; /* items in the fixed array */
uint32 nhash; /* items in the hash table */
/* Fixed-size array for most-recently-added resources (fast path) */
ResourceElem arr[RESOWNER_ARRAY_SIZE]; /* 32 entries */
/* Open-addressing hash table for overflow */
ResourceElem *hash;
uint32 capacity; /* allocated slots */
uint32 grow_at; /* resize threshold */
/* Separate fast cache for local locks */
LOCALLOCK *locks[MAX_RESOWNER_LOCKS]; /* 15 entries */
/* AIO handles (registered in critical sections) */
dlist_head aio_handles;
};
Why a fixed array plus a hash table? The most common pattern is: acquire a resource, use it briefly, release it (e.g., pin a buffer, read a tuple, unpin). The fixed array of 32 entries handles this fast path with a simple linear scan. Only when the array fills up do entries spill into the hash table.
Why a separate lock cache? Lock operations are extremely frequent. Rather than storing locks in the general-purpose array/hash, each resource owner has a dedicated 15-entry cache. When this overflows, the lock manager’s own hash table is used as fallback, but the per-owner cache speeds up the common case of a handful of locks per query.
When releasing, the tree is processed children before parents within each phase. The full sequence for a three-phase release of a parent with one child:
Phase 1 (BEFORE_LOCKS):
child: release buffer I/Os (priority 100)
child: release buffer pins (priority 200)
child: release relcache refs (priority 300)
...
parent: release buffer I/Os (priority 100)
parent: release buffer pins (priority 200)
parent: release relcache refs (priority 300)
...
Phase 2 (LOCKS):
child: release/transfer locks
parent: release/transfer locks
Phase 3 (AFTER_LOCKS):
child: release catcache refs (priority 100)
child: release plan cache refs (priority 300)
child: release files (priority 600)
...
parent: release catcache refs (priority 100)
parent: release plan cache refs (priority 300)
parent: release files (priority 600)
...
At commit, if ResourceOwnerRelease finds any resources still registered, it emits
a WARNING for each one (using the DebugPrint callback if available). This indicates a
bug: the code that acquired the resource should have released it before commit. At
abort, finding unreleased resources is expected and normal — that is the entire
point of the resource owner mechanism.
Extensions can track custom resources by defining a ResourceOwnerDesc:
static const ResourceOwnerDesc myresource_desc = {
.name = "MyFancyResource",
.release_phase = RESOURCE_RELEASE_AFTER_LOCKS,
.release_priority = RELEASE_PRIO_FIRST,
.ReleaseResource = ReleaseMyFancyResource,
.DebugPrint = NULL /* use default format */
};
Then use ResourceOwnerEnlarge, ResourceOwnerRemember, and ResourceOwnerForget
as shown above. The alternative RegisterResourceReleaseCallback API still exists
for legacy compatibility but is less convenient.
ResourceOwnerData
+-- parent / firstchild / nextchild (tree linkage, same pattern as MemoryContextData)
+-- arr[32] (fixed array of ResourceElem)
| Each entry: { Datum item, const ResourceOwnerDesc *kind }
+-- hash (dynamically allocated) (open-addressing hash table)
| Initial capacity: 64 slots
| Grows at 75% fill (minus array size headroom)
+-- locks[15] (LOCALLOCK* fast cache)
+-- aio_handles (dlist for async I/O handles)
typedef struct ResourceElem
{
Datum item; /* the tracked resource (pointer, fd, etc.) */
const ResourceOwnerDesc *kind; /* NULL means empty slot in hash table */
} ResourceElem;
RESOURCE_RELEASE_BEFORE_LOCKS
priority 100: Buffer I/Os
priority 200: Buffer Pins
priority 300: Relcache Refs
priority 400: DSM Segments
priority 500: JIT Contexts
priority 600: Crypto Hash Contexts
priority 700: HMAC Contexts
RESOURCE_RELEASE_LOCKS
(locks handled specially: transfer on commit, release on abort)
RESOURCE_RELEASE_AFTER_LOCKS
priority 100: Catcache Refs
priority 200: Catcache List Refs
priority 300: Plan Cache Refs
priority 400: Tuple Descriptor Refs
priority 500: Snapshot Refs
priority 600: Files
priority 700: Wait Event Sets
BEGIN; TopTransactionResourceOwner
|
SAVEPOINT sp1; +-- SubtransactionResourceOwner
| |
SELECT * FROM t WHERE ...; | +-- PortalResourceOwner
(pins buffers, acquires locks) | | pins: [buf#42, buf#99]
| | locks: [rel 16384 AccessShareLock]
-- query completes normally -- | | (all forgotten on normal release)
-- portal closes -- | +-- (deleted, locks transfer to parent)
|
RELEASE SAVEPOINT sp1; +-- (subtxn resources merge into parent)
|
COMMIT; (release all: pins WARNING if leaked,
locks released, cache refs released)
ReadBuffer(rel, blockno)
|
v
PinBuffer(buf)
|
v
ResourceOwnerEnlarge(CurrentResourceOwner) <-- pre-allocate space
|
v
ResourceOwnerRemember(owner, buf, &buffer_pin_desc)
|
v
narr < 32?
|
YES --> arr[narr++] = {buf, &buffer_pin_desc} (fast path, no hashing)
|
NO --> move all arr entries to hash table, then arr[0] = {buf, &buffer_pin_desc}
|
v
... use buffer ...
|
v
ReleaseBuffer(buf)
|
v
ResourceOwnerForget(owner, buf, &buffer_pin_desc)
|
v
Scan arr[] for matching (buf, kind)
|
FOUND --> swap with arr[narr-1], narr-- (fast path)
|
NOT FOUND --> scan hash table, remove entry
TopTransactionContext and a TopTransactionResourceOwner. They are
separate because their usage patterns differ: memory contexts manage bulk allocation
and deallocation; resource owners track individual countable resources.SearchSysCache results) are tracked as
AFTER_LOCKS resources. They must be released after locks because the cache entries
are backend-local and invisible to other processes.AbortTransaction calls ResourceOwnerRelease in all three
phases on TopTransactionResourceOwner, ensuring complete cleanup regardless of
where the error occurred.