Genetic Query Optimizer (GEQO)

When a query involves many tables (default threshold: 12), the dynamic programming join search becomes impractical – the number of possible join orders grows super-exponentially. PostgreSQL switches to a Genetic Algorithm (GA) that explores the search space stochastically, trading optimality guarantees for bounded planning time. This is the Genetic Query Optimizer, or GEQO.

Summary

GEQO models the join-ordering problem as a variant of the Traveling Salesman Problem. Each candidate solution (a “chromosome”) encodes a permutation of the query’s tables, which is decoded into a join tree. A population of chromosomes evolves over multiple generations through selection, crossover, and mutation. The fitness of each chromosome is the cost of the resulting plan. After a configurable number of generations, the best chromosome found is returned.

Critically, GEQO only decides the join order. For each proposed join pair, it still calls the standard add_paths_to_joinrel() machinery to enumerate all join methods (nested loop, merge, hash) and uses the normal cost model. GEQO is a heuristic wrapper around the exhaustive path-generation code.

Key Source Files

File	Purpose
`src/backend/optimizer/geqo/geqo_main.c`	`geqo()` – top-level entry point, GA loop
`src/backend/optimizer/geqo/geqo_eval.c`	`geqo_eval()` – decode chromosome into join tree, compute cost
`src/backend/optimizer/geqo/geqo_pool.c`	Population management (initialization, replacement)
`src/backend/optimizer/geqo/geqo_selection.c`	Parent selection (linear bias)
`src/backend/optimizer/geqo/geqo_recombination.c`	Common crossover infrastructure
`src/backend/optimizer/geqo/geqo_erx.c`	Edge Recombination Crossover (ERX) – the default
`src/backend/optimizer/geqo/geqo_ox1.c`	Order Crossover variant 1
`src/backend/optimizer/geqo/geqo_ox2.c`	Order Crossover variant 2
`src/backend/optimizer/geqo/geqo_pmx.c`	Partially Mapped Crossover
`src/backend/optimizer/geqo/geqo_cx.c`	Cycle Crossover
`src/backend/optimizer/geqo/geqo_px.c`	Position Crossover
`src/backend/optimizer/geqo/geqo_mutation.c`	Mutation operator (swap two genes)
`src/backend/optimizer/geqo/geqo_random.c`	Deterministic pseudo-random number generator
`src/backend/optimizer/geqo/geqo_copy.c`	Chromosome copy routines
`src/backend/optimizer/geqo/geqo_misc.c`	Debugging/printing utilities
`src/include/optimizer/geqo.h`	GEQO data types and configuration

How It Works

When GEQO Activates

The decision is made in make_rel_from_joinlist() in allpaths.c:

if (enable_geqo && list_length(joinlist) >= geqo_threshold)
    return geqo(root, list_length(joinlist), initial_rels);
else
    return standard_join_search(root, levels_needed, initial_rels);

Default geqo_threshold is 12 tables. Below this, exhaustive DP search is used. Above it, the factorial growth of join orders makes DP impractical.

The Genetic Algorithm Loop

geqo(root, number_of_rels, initial_rels):

  1. Determine pool_size and number_of_generations from GUC params
  2. Initialize population: pool_size random chromosomes
     Each chromosome = random permutation of [1..number_of_rels]

  3. Evaluate each chromosome:
     For each chromosome in pool:
       cost = geqo_eval(chromosome)
       chromosome.fitness = cost

  4. Sort pool by fitness (lowest cost first)

  5. Evolution loop:
     For generation = 1 to number_of_generations:
       a. Select two parents (momma, daddy) via linear bias
       b. Create offspring via crossover (ERX by default)
       c. Evaluate offspring: cost = geqo_eval(offspring)
       d. If offspring is better than worst in pool:
            Replace worst chromosome with offspring
            Re-sort pool

  6. Return the best chromosome's RelOptInfo

Chromosome Encoding

A chromosome is an array of integers representing a permutation of the relation indices:

typedef struct Chromosome
{
    Gene   *string;     /* array of Gene (= int) values */
    Cost    worth;      /* fitness = total plan cost */
} Chromosome;

For a 5-table query, a chromosome might be [3, 1, 4, 2, 5], meaning: start with relation 3, then join relation 1, then join relation 4, etc.

Decoding: gimme_tree() in geqo_eval.c

The chromosome is decoded into an actual join tree by gimme_tree():

gimme_tree(chromosome, num_rels):
  clumps = empty list

  For each gene in chromosome:
    new_clump = the base RelOptInfo for this gene

    Repeatedly try to merge new_clump with existing clumps:
      For each existing clump C:
        If new_clump and C share a join clause:
          joined = make_join_rel(new_clump, C)
          Replace C with joined, continue merging

    If no merge possible:
      Add new_clump to clumps list

  If clumps has more than one entry:
    Force clauseless joins to combine them

  Return the final single clump's RelOptInfo

The “clump” approach is important: it prefers joining relations that have join clauses between them, avoiding Cartesian products when possible. The chromosome order only determines tie-breaking when multiple join partners are available.

Each make_join_rel() call inside gimme_tree() invokes the full add_paths_to_joinrel() machinery, generating all possible join methods and keeping only Pareto-optimal paths. This means GEQO does not sacrifice join-method quality – only join-order optimality.

Crossover Operators

The default crossover is Edge Recombination Crossover (ERX), which preserves adjacency relationships from both parents. This is appropriate because the TSP-like nature of the problem means that which relations are adjacent in the join order matters more than their absolute positions.

Available crossover operators (selected at compile time via #define in geqo.h):

Operator	Description
ERX	Edge Recombination – preserves adjacency edges from both parents
PMX	Partially Mapped – preserves absolute position of some genes
CX	Cycle Crossover – preserves position via cycles
PX	Position Crossover – direct position inheritance
OX1	Order Crossover 1 – preserves relative order
OX2	Order Crossover 2 – variant of OX1

Selection

Parent selection uses linear bias: chromosomes are ranked by fitness, and the probability of being selected as a parent is linearly proportional to rank. The bias is controlled by geqo_selection_bias (default 2.0, range 1.5–2.0). Higher bias favors fitter individuals more strongly.

Mutation

When CX crossover is enabled, a mutation operator is also applied: it randomly swaps two genes in the chromosome. With the default ERX crossover, mutation is not used because ERX already introduces sufficient diversity.

Configuration Parameters

GUC	Default	Description
`geqo`	on	Enable/disable GEQO
`geqo_threshold`	12	Minimum FROM items to trigger GEQO
`geqo_effort`	5	Controls pool size and generations (1–10)
`geqo_pool_size`	0	Pool size (0 = auto from effort and num_rels)
`geqo_generations`	0	Number of generations (0 = auto from pool_size)
`geqo_selection_bias`	2.0	Selection pressure (1.5–2.0)
`geqo_seed`	0.0	Random seed for reproducibility (0.0 = use PID)

Auto-Sizing

When geqo_pool_size = 0 (default), pool size is computed as:

pool_size = 2 * pow(2.0, rint(log(number_of_rels) / log(2.0)));
/* Clamped to range [128, 1024] for geqo_effort = 5 */

When geqo_generations = 0 (default):

generations = pool_size;  /* one generation per pool member */

Higher geqo_effort values increase both pool size and generations, spending more planning time to explore more of the search space.

Memory Management

GEQO creates a temporary memory context for each chromosome evaluation:

/* From geqo_eval.c */
mycontext = AllocSetContextCreate(CurrentMemoryContext,
                                   "GEQO",
                                   ALLOCSET_DEFAULT_SIZES);
oldcxt = MemoryContextSwitchTo(mycontext);

/* Build join tree for this chromosome */
joinrel = gimme_tree(root, tour, num_gene);

/* Copy the surviving path out of the temp context */
MemoryContextSwitchTo(oldcxt);
MemoryContextDelete(mycontext);

This ensures that the many intermediate RelOptInfos and Paths created during evaluation of rejected chromosomes are freed promptly, rather than accumulating for the entire planning phase.

Diagram: GEQO Evolution

Generation 0 (random initialization):

  Chromosome 1: [3,1,4,2,5]  cost=1500
  Chromosome 2: [1,2,3,4,5]  cost=1200  <-- current best
  Chromosome 3: [5,4,3,2,1]  cost=1800
  Chromosome 4: [2,5,1,3,4]  cost=1350
  ...

Generation 1:
  Select parents: #2 (best) and #4 (good)
  Crossover (ERX): offspring [1,2,5,3,4]
  Evaluate: cost=1100  <-- new best!
  Replace worst (#3) with offspring

Generation 2:
  Select parents: offspring and #2
  Crossover: [1,5,2,3,4]
  Evaluate: cost=1250
  Replace worst remaining
  ...

Final:
  Best chromosome: [1,2,5,3,4]  cost=1100
  Decode to join tree:
    ((rel1 JOIN rel2) JOIN rel5) JOIN (rel3 JOIN rel4)

GEQO vs. Standard Search: Trade-offs

Aspect	Standard DP	GEQO
Optimality	Guaranteed optimal for the cost model	Heuristic – may miss the true optimum
Planning time	O(2^N) for N tables	O(pool_size * generations * join_cost)
Reproducibility	Deterministic	Deterministic if `geqo_seed` is fixed
Small queries	Fast and optimal	Not used (below threshold)
Large queries	Impractical (12+ tables)	Practical for hundreds of tables
Join methods	All considered	All considered (only join ORDER is heuristic)

Practical Considerations

Raise geqo_threshold if planning time is acceptable. For data warehouse queries with 15–20 tables, DP often finishes in reasonable time and finds better plans than GEQO.
Fix geqo_seed for reproducibility. By default, GEQO uses a seed derived from the PID, so the same query can produce different plans across connections. Set geqo_seed to a nonzero constant to get deterministic behavior.
GEQO and explicit JOINs. If parts of the join tree are constrained by explicit JOIN syntax (not flattened due to join_collapse_limit), those sub-trees are planned with DP before GEQO sees them. Only the top-level FROM list enters the GA.
Plugin hook. join_search_hook allows extensions to replace both standard_join_search() and geqo() with custom join-order algorithms.

Connections

Subsystem	Relationship
Join Ordering	GEQO is the alternative to standard_join_search() for the same problem
Path Generation	GEQO calls add_paths_to_joinrel() for every join it considers
Cost Model	Chromosome fitness is the total_cost of the resulting plan
Preprocessing	join_collapse_limit determines how many tables enter GEQO
Memory Management	GEQO uses temporary memory contexts to avoid leaking intermediate paths

← Previous Next →

↑ Back to Table of Contents