PostgreSQL relies on two fundamentally different kinds of statistics. Planner statistics describe the shape of data inside tables – how many distinct values a column has, which values appear most often, how values are distributed – and the query optimizer uses them to choose between sequential scans, index scans, hash joins, merge joins, and other plan alternatives. Activity statistics describe what the server is doing right now – which queries are running, what each backend is waiting on, how far along a long-running VACUUM has progressed – and they are the primary tool for operational monitoring and troubleshooting.
Both systems are read-heavy by design. Planner statistics are written only during ANALYZE (or autovacuum-triggered analysis) and read on every query planning cycle. Activity statistics are written continuously by every backend into shared memory and read by monitoring queries against pg_stat_activity and related views.
Without accurate planner statistics, PostgreSQL falls back on hardcoded constants (selectivity of 0.005 for equality on an unknown column, for example) and produces plans that can be orders of magnitude slower than necessary. A missing ANALYZE after bulk loading a table is one of the most common causes of sudden performance regressions in production.
Without activity monitoring, operators cannot answer the most basic questions: “What is this backend doing?” and “Why is it slow?” The wait event infrastructure, introduced incrementally from PostgreSQL 9.6 onward, finally gave PostgreSQL a lightweight, always-on mechanism to answer those questions without resorting to external profiling tools.
| Section | What It Covers |
|---|---|
| pg_statistic and Single-Column Statistics | Histograms, MCVs, correlation, ndistinct, the slot system, ANALYZE sampling |
| Extended Statistics | Multi-column dependencies, ndistinct, MCV lists, CREATE STATISTICS |
| Activity Monitoring | pg_stat_activity, wait events, progress reporting, cumulative stats |
| File | Purpose |
|---|---|
src/include/catalog/pg_statistic.h |
Single-column statistics catalog definition, slot kinds |
src/include/catalog/pg_statistic_ext.h |
Extended statistics object definitions |
src/include/catalog/pg_statistic_ext_data.h |
Extended statistics data storage |
src/include/statistics/statistics.h |
MVNDistinct, MVDependency, MCVList structs |
src/include/statistics/extended_stats_internal.h |
Internal build/serialize/deserialize functions |
src/backend/commands/analyze.c |
ANALYZE command, sampling, per-column stats computation |
src/backend/statistics/extended_stats.c |
Extended statistics build orchestration |
src/backend/statistics/dependencies.c |
Functional dependency mining and estimation |
src/backend/statistics/mcv.c |
Multi-column MCV list build and selectivity |
src/backend/statistics/mvdistinct.c |
Multi-column ndistinct computation |
src/backend/utils/activity/backend_status.c |
PgBackendStatus shared memory management |
src/backend/utils/activity/wait_event.c |
Wait event reporting |
src/backend/utils/activity/backend_progress.c |
Progress reporting for long-running commands |
src/backend/utils/activity/pgstat.c |
Cumulative statistics collector infrastructure |
src/include/utils/backend_status.h |
PgBackendStatus, BackendState enum |
src/include/utils/backend_progress.h |
ProgressCommandType, progress parameter API |
flowchart TD
subgraph "Planner Statistics (Data Shape)"
ANALYZE["ANALYZE / autovacuum"]
ANALYZE --> SAMPLE["Sample 300 * statistics_target rows"]
SAMPLE --> SINGLE["Compute per-column stats"]
SINGLE --> PG_STAT["pg_statistic catalog"]
SAMPLE --> EXTENDED["Compute extended stats"]
EXTENDED --> PG_EXT["pg_statistic_ext_data catalog"]
PLANNER["Query Planner"] -->|"reads"| PG_STAT
PLANNER -->|"reads"| PG_EXT
PLANNER --> PLAN["Chosen query plan"]
end
subgraph "Activity Statistics (Runtime Behavior)"
BACKEND["Backend process"]
BACKEND -->|"updates"| SHMEM["PgBackendStatus\nin shared memory"]
BACKEND -->|"sets"| WAIT["Wait event ID"]
BACKEND -->|"updates"| PROGRESS["Progress counters"]
MON["pg_stat_activity\nquery"] -->|"reads snapshot"| SHMEM
MON -->|"reads"| WAIT
MON -->|"reads"| PROGRESS
end
subgraph "Cumulative Statistics"
BACKEND -->|"accumulates"| LOCAL["Local stat counters"]
LOCAL -->|"flushes to"| STATS_SHMEM["pgstat shared memory\n(hash table)"]
VIEWS["pg_stat_user_tables\npg_stat_bgwriter\npg_stat_io ..."] -->|"reads"| STATS_SHMEM
end
PostgreSQL has two separate architectures for statistics, and confusing them is a common source of misunderstanding.
Planner statistics (pg_statistic, pg_statistic_ext_data) are stored as regular catalog rows, updated only by ANALYZE, and cached in the relcache/syscache. They describe data distributions and are consumed exclusively by the planner’s selectivity estimation functions.
Activity/cumulative statistics (pgstat infrastructure) live in shared memory as of PostgreSQL 15. Before that, they were maintained by a separate stats collector process that communicated via temporary files. The shared-memory approach eliminates the stats collector process and provides instant visibility into counters. These statistics track operational metrics: row counts, block hits, sequential scans, function call counts, and so on.
The default_statistics_target parameter (default: 100) controls how many entries appear in MCV lists and histogram bins for single-column stats. It also determines the ANALYZE sample size: 300 * statistics_target rows. Per-column overrides are possible via ALTER TABLE ... ALTER COLUMN ... SET STATISTICS. Extended statistics objects can have their own target set via ALTER STATISTICS ... SET STATISTICS.
Higher targets produce more accurate statistics at the cost of longer ANALYZE times, more catalog storage, and slightly slower planning (more MCV entries to scan).