GIN is an inverted index designed for values that contain multiple
elements – arrays, full-text documents (tsvector), JSONB keys. For each
distinct element (key), GIN stores a sorted list of TIDs (a posting list)
referencing all heap tuples that contain that element. GIN trades insert speed
for fast multi-element lookups, and its fast update buffer amortizes insert
costs over time.
| File | Purpose |
|---|---|
src/backend/access/gin/gininsert.c |
gininsert() – insert into GIN (or pending list) |
src/backend/access/gin/ginfast.c |
Fast update pending list management |
src/backend/access/gin/ginget.c |
gingetbitmap() – scan and intersect posting lists |
src/backend/access/gin/ginbtree.c |
B-tree operations on the GIN entry tree |
src/backend/access/gin/ginentrypage.c |
Entry page (internal tree) management |
src/backend/access/gin/gindatapage.c |
Posting tree (data page) management |
src/backend/access/gin/ginpostinglist.c |
Varbyte-encoded posting list compression |
src/backend/access/gin/ginbulk.c |
Bulk insert accumulator |
src/backend/access/gin/ginscan.c |
Scan initialization and key setup |
src/backend/access/gin/ginlogic.c |
Consistent function evaluation logic |
src/backend/access/gin/ginutil.c |
ginhandler(), utilities |
src/backend/access/gin/ginvacuum.c |
VACUUM support |
src/backend/access/gin/ginxlog.c |
WAL redo |
src/backend/access/gin/ginarrayproc.c |
Array opclass support functions |
src/backend/access/gin/README |
Design document |
src/include/access/gin.h |
GinStatsData |
src/include/access/ginblock.h |
GinPageOpaqueData, GinMetaPageData, PostingItem, GinPostingList |
src/include/access/gin_private.h |
GinState, GinBtreeData, GinScanKeyData, GinScanEntryData |
GIN has two kinds of B-trees internally:
graph TD
META["Meta Page<br/>(pending list head/tail)"]
ET_ROOT["Entry Tree Root"]
ET_L1["Leaf: 'apple'"]
ET_L2["Leaf: 'banana'"]
ET_L3["Leaf: 'cherry'"]
PL1["Inline Posting List<br/>[TID_2, TID_15]"]
PL2["Inline Posting List<br/>[TID_3, TID_7, TID_42]"]
PT_ROOT["Posting Tree Root<br/>(separate B-tree)"]
PT_L1["Data Page<br/>[TID_1..TID_500]"]
PT_L2["Data Page<br/>[TID_501..TID_1200]"]
META --> ET_ROOT
ET_ROOT --> ET_L1
ET_ROOT --> ET_L2
ET_ROOT --> ET_L3
ET_L1 --> PL1
ET_L2 --> PL2
ET_L3 -->|"too many TIDs"| PT_ROOT
PT_ROOT --> PT_L1
PT_ROOT --> PT_L2
style META fill:#e6f3ff,stroke:#333
style ET_ROOT fill:#fff3e6,stroke:#333
style ET_L1 fill:#e6ffe6,stroke:#333
style ET_L2 fill:#e6ffe6,stroke:#333
style ET_L3 fill:#e6ffe6,stroke:#333
style PL1 fill:#f0f0f0,stroke:#333
style PL2 fill:#f0f0f0,stroke:#333
style PT_ROOT fill:#ffe6e6,stroke:#333
style PT_L1 fill:#ffe6e6,stroke:#333
style PT_L2 fill:#ffe6e6,stroke:#333
+------------------+
| Meta page |
| (pending list |
| head/tail) |
+--------+---------+
|
+--------v---------+
| Entry Tree | <-- B-tree of distinct keys
| (key1, key2, ..) | (e.g., lexemes)
+--+-----+-----+--+
| | |
v v v
Posting Posting Posting
list/ list/ list/
tree tree tree
If posting list fits on one page: stored inline in the entry tuple
If too large: stored as a separate B-tree ("posting tree")
Entry tree: A B-tree whose keys are the distinct indexed elements (lexemes, array elements, JSONB keys/values). Each leaf entry either contains an inline posting list or a pointer to a posting tree.
Posting list / posting tree: For each key, the sorted list of TIDs referencing heap tuples containing that key. Small lists are stored inline (varbyte-compressed). Large lists get their own B-tree of data pages.
ginpostinglist.c uses varbyte encoding to compress TID lists. Since TIDs
are sorted, only the delta between consecutive TIDs is stored, and each delta
is encoded in a variable number of bytes. This typically achieves 2-3x
compression over raw TID arrays.
// src/include/access/ginblock.h
typedef struct
{
ItemPointerData first; // first TID (uncompressed)
uint16 nbytes; // size of compressed data
unsigned char bytes[FLEXIBLE_ARRAY_MEMBER]; // varbyte-encoded deltas
} GinPostingList;
Inserting into GIN is expensive because a single heap tuple may contain many keys (e.g., hundreds of lexemes in a document). Each key requires a B-tree lookup and potential posting list update.
The fast update optimization (gin_pending_list_limit, default 4 MB)
defers these inserts:
gin_pending_list_limit.VACUUM runs.gin_clean_pending_list() is called.Controlled by fastupdate = on | off storage parameter.
gingetbitmap(scan)
-> for each scan key (e.g., 'cat' & 'dog' for ts_query):
-> look up key in entry tree
-> retrieve posting list (inline or from posting tree)
-> scan pending list for additional matches
-> intersect / union posting lists according to query logic
(ginlogic.c evaluates consistent / triConsistent)
-> return TID bitmap
GIN always returns a bitmap (amgetbitmap), never individual tuples
(amgettuple = NULL). The executor then does a bitmap heap scan.
// src/include/access/ginblock.h
typedef struct GinMetaPageData
{
BlockNumber head; // first page of pending list
BlockNumber tail; // last page of pending list
uint32 tailFreeSize; // free space on tail page
int64 nPendingPages;
int64 nPendingHeapTuples;
int64 nTotalPages; // total pages in GIN index
int64 nEntryPages; // pages in entry tree
int64 nDataPages; // pages in posting trees
int64 nEntries; // number of distinct keys
int32 ginVersion;
} GinMetaPageData;
// src/include/access/ginblock.h
typedef struct GinPageOpaqueData
{
BlockNumber rightlink; // right sibling
OffsetNumber maxoff; // max offset on page
uint16 flags; // GIN_DATA, GIN_LEAF, GIN_DELETED, GIN_META,
// GIN_LIST, GIN_LIST_PENDING, GIN_COMPRESSED
} GinPageOpaqueData;
// src/include/access/gin_private.h
typedef struct GinScanKeyData
{
// ... scan key definition
ScanKey scanEntry; // underlying scan key
GinScanEntry *scanEntries; // array of entries to look up
int nentries;
// ... tri-state consistent function, required entries, etc.
bool (*triConsistentFn)(GinScanKey key);
} GinScanKeyData;
INSERT INTO docs(body) VALUES ('the cat sat on the mat');
tsvector extraction produces keys: {cat, mat, sat}
With fastupdate=on:
+----------+ +----------+ +----------+
| pending | --> | pending | --> | pending |
| page 1 | | page 2 | | page 3 |
| (TID,keys)| | (TID,keys)| | (TID,keys)|
+----------+ +----------+ +----------+
On drain (VACUUM or limit reached):
Entry tree: Posting lists:
+------+
| cat | --> [TID_1, TID_47, TID_203]
+------+
| mat | --> [TID_1, TID_88]
+------+
| sat | --> [TID_1, TID_5, TID_12, ...] --> posting tree (if large)
+------+
GIN supports a ternary consistent function for optimization. Instead of just true/false, it returns:
GIN_TRUE – the query definitely matches.GIN_FALSE – the query definitely does not match.GIN_MAYBE – need to recheck against the heap tuple.This allows GIN to skip posting lists for keys that cannot affect the outcome, and to perform partial matching during scan.
| Data Type | Operators | Use Case |
|---|---|---|
tsvector |
@@ |
Full-text search |
array |
@>, <@, &&, = |
Array containment and overlap |
jsonb |
@>, ?, ?\|, ?& |
JSONB key/value existence and containment |
hstore |
@>, ?, ?\|, ?& |
Key-value store queries |
ginvacuum.c removes dead TIDs from posting lists and cleans
the pending list. Critical for preventing unbounded pending list growth.ginxlog.c handles redo for entry tree changes, posting list
updates, pending list operations, and page splits.