Spinlocks are the lowest-level synchronization primitive in PostgreSQL. They protect very short critical sections – typically a few dozen instructions – by busy-waiting until the lock becomes available. Spinlocks serve as the foundation upon which lightweight locks are built.
A spinlock is a single memory word that a CPU atomically tests and sets using a hardware instruction (test-and-set, compare-and-swap, or load-linked / store-conditional depending on the architecture). If the word is already set, the caller spins in a tight loop, periodically backing off with increasing delays, until the lock is released or a timeout fires.
Spinlocks provide no deadlock detection, no automatic release on error, and no fairness guarantees. They exist purely because they are extremely fast in the uncontended case: a single atomic instruction with no kernel involvement.
| File | Purpose |
|---|---|
src/include/storage/s_lock.h |
Platform-specific TAS/TAS_SPIN macros, S_LOCK/S_UNLOCK API |
src/backend/storage/lmgr/s_lock.c |
Portable spin-wait loop with exponential backoff |
src/include/storage/spin.h |
Public API: SpinLockInit, SpinLockAcquire, SpinLockRelease, SpinLockFree |
The core of every spinlock is the TAS() macro. On x86-64, this compiles to
a single xchg instruction (which has an implicit LOCK prefix):
/*
* Simplified from s_lock.h for x86-64:
*/
static __inline__ int
tas(volatile slock_t *lock)
{
register slock_t _res = 1;
__asm__ __volatile__(
"lock; xchgb %0,%1\n"
: "+q"(_res), "+m"(*lock)
:
: "memory");
return (int) _res; /* 0 = acquired, nonzero = failed */
}
On ARM (aarch64), an LDXR/STXR (load-exclusive / store-exclusive) pair is used instead. The key requirement is atomicity: only one CPU can observe the transition from unlocked to locked.
When TAS() fails, s_lock() in s_lock.c enters a wait loop with
adaptive backoff:
s_lock(lock)
|
+-- init_spin_delay()
| spins_per_delay starts at DEFAULT_SPINS_PER_DELAY (typically 100)
|
+-- while TAS_SPIN(lock) fails:
| |
| +-- perform_spin_delay()
| |
| +-- If cur_delay < spins_per_delay:
| | Spin (CPU pause / yield hint instruction)
| | cur_delay++
| |
| +-- Else (exhausted spin budget):
| pg_usleep(random delay between 1ms and 1s)
| Increment delays counter
| If delays >= NUM_DELAYS (1000):
| PANIC -- "stuck spinlock detected"
|
+-- finish_spin_delay()
Adapt spins_per_delay based on whether we had to sleep:
- If we never slept: increase toward MAX_SPINS_PER_DELAY (1000)
- If we slept a lot: decrease toward MIN_SPINS_PER_DELAY (10)
The adaptive algorithm is designed to handle both uniprocessor machines (where spinning is wasteful because no other CPU can release the lock) and multiprocessor machines (where a short spin is cheaper than a kernel sleep).
| Constant | Value | Meaning |
|---|---|---|
MIN_SPINS_PER_DELAY |
10 | Minimum spin iterations before sleeping |
MAX_SPINS_PER_DELAY |
1000 | Maximum spin iterations before sleeping |
NUM_DELAYS |
1000 | Number of sleep cycles before PANIC |
MIN_DELAY_USEC |
1,000 (1 ms) | Initial sleep duration |
MAX_DELAY_USEC |
1,000,000 (1 s) | Maximum sleep duration |
With these settings, a stuck spinlock will PANIC after roughly 1-2 minutes.
The TAS and S_UNLOCK macros include appropriate memory barriers:
On x86-64, the xchg instruction provides a full memory fence implicitly.
On ARM and POWER, explicit barrier instructions (dmb, lwsync) are
emitted.
/*
* slock_t is platform-dependent. On most platforms it is a simple
* unsigned char or int:
*/
typedef unsigned char slock_t; /* x86-64 */
/*
* SpinDelayStatus tracks the adaptive backoff state:
*/
typedef struct SpinDelayStatus
{
int spins; /* spin iterations so far in current cycle */
int delays; /* number of times we called pg_usleep */
int cur_delay; /* current delay counter */
const char *file; /* source location for diagnostics */
int line;
const char *func;
} SpinDelayStatus;
SpinLockInit(lock) SpinLockAcquire(lock) SpinLockRelease(lock)
| | |
v v v
*lock = 0 TAS(lock) == 0? *lock = 0
(unlocked) Yes: acquired + memory barrier
No: enter s_lock() spin loop
The README in src/backend/storage/lmgr/ lays out strict rules for spinlock
usage:
Hold for at most a few dozen instructions. Never hold a spinlock across a kernel call, subroutine call, or any operation that might block.
No nested spinlocks. There is no deadlock detection; acquiring a second spinlock while holding one risks permanent deadlock.
Interrupts are deferred. Query cancel and die() signals are held
off while a spinlock is held. This prevents a backend from being killed
while a shared data structure is in an inconsistent state.
Do not use for user-visible locking. Spinlocks are infrastructure for LWLocks and other internal mechanisms only.
s_lock.h contains a substantial block of #ifdef directives providing
TAS implementations for every supported architecture:
xchgb instructionldxr / stxr pair with dmb barrierslwarx / stwcx. pair with lwsync / isyncThe SPIN_DELAY() macro emits a “pause” or “yield” hint where available
(e.g., PAUSE on x86, YIELD on ARM) to reduce pipeline stalls and power
consumption during spinning.
CPU 0 CPU 1
| |
| TAS(lock) -> success (0) |
| [lock = 1, CPU 0 owns it] |
| | TAS(lock) -> fail (1)
| ... critical section ... | spin... spin... spin...
| | TAS_SPIN(lock) -> fail (1)
| S_UNLOCK(lock) | pg_usleep(1ms)
| [lock = 0] | TAS_SPIN(lock) -> success (0)
| | [lock = 1, CPU 1 owns it]
| | ... critical section ...
| | S_UNLOCK(lock)
SpinDelayStatus adaptive backoff is
reused in LWLock wait loops.BufMappingLock).shmem_alloc during startup) may use a spinlock.