switching to /read: c6in.8xlarge, 50 Gbps, and IRQ pinning on a loaded NIC

790k RPS, server at 72% NIC, 85% CPU — IRQ pinning still flat

June 08, 2026 — DONE
c6in read NIC IRQ pipelining autocannon profiling

why /read and why c6in

Every previous experiment used /simple: {"message":"hi"}, 16 bytes. At 18M RPS the server TX was 18M × 200 bytes ≈ 3.6 GB/s but via pipelining — 100 requests per TCP connection, so the actual packet rate was 18M / 100 = 180k pps. IRQ cores at 180k pps sit at 0-3% CPU. There is nothing to isolate.

IRQ pinning is a packet rate optimization. To test it, you need packets. That means no pipelining and a bigger payload.

/read returns a pre-serialized product: UUID, name, brand, description (5 paragraphs), tags, attributes, images. Measured: ~4.5KB per response. With autocannon --pipelining 1, each request is an independent TCP round trip. One packet in, one packet out per request.

The math at 800k RPS: 800k × 4.5KB = 3.6 GB/s TX = 28.8 Gbps. On a 50 Gbps NIC that’s 57.6% utilization. NIC queues are actually handling traffic volume. This is the regime where IRQ core saturation is possible.

Hardware: c6in.8xlarge for both server and client. 32 vCPU, 64 GB RAM, 50 Gbps dedicated (not “up to”). IRQ affinity default: ENA driver distributes 16 NIC queues one-per-core.

setup

# server — irqbalance already disabled, RPS/RFS applied by user_data
nohup ./fiber_server > /tmp/fiber.log 2>&1 &

# client — connection ramp, no pipelining
for CONN in 100 500 1000 2000 5000 10000; do
  autocannon -c $CONN --pipelining 1 -w 30 -d 20 \
    "http://SERVER_INTERNAL_IP:8083/read"
  sleep 3
done
# IRQ pinning — pin all 16 NIC queues to cores 16-31
IRQ_MASK="ffff0000"   # bits 16-31 = cores 16-31
grep ens5 /proc/interrupts | awk -F: '{print $1}' | tr -d ' ' | while read irq; do
  echo "$IRQ_MASK" | sudo tee /proc/irq/$irq/smp_affinity > /dev/null
done

# restart fiber workers pinned to cores 0-15
pkill fiber_server
taskset -c 0-15 nohup ./fiber_server > /tmp/fiber_pinned.log 2>&1 &
# server metrics — run in a second SSH window during each benchmark point
# NIC throughput (1s sample)
NIC=$(ip route show default | awk '/default/{print $5}' | head -1)
R1=$(grep "${NIC}:" /proc/net/dev | awk '{print $2,$10}')
sleep 1
R2=$(grep "${NIC}:" /proc/net/dev | awk '{print $2,$10}')
echo "$R1 $R2" | awk '{tx=($4-$2)/1024/1024; printf "TX: %.1f MB/s (%.1f%% of 6250)\n",tx,tx/6250*100}'

# CPU — average across all cores
mpstat -P ALL 1 1 | awk '/^[0-9]/ && $2=="all" {printf "usr:%.1f%% sys:%.1f%% idle:%.1f%%\n",$3,$5,$12}'

# IRQ rate per NIC queue — read /proc/interrupts twice, compute delta
grep "$NIC" /proc/interrupts | awk '{total=0; for(i=2;i<=NF-3;i++) total+=$i; print $1, total}'

results — connection ramp (baseline, no IRQ pinning)

connectionsRPS avgthroughput MB/sNIC TX%p50 msp95 msp99 ms
100444,6211,96431%<1<1<1
500790,2533,49156%<111
1,000786,6943,47571%133
2,000766,1893,38569%157
5,000701,3823,09850%51317
10,000673,7632,97748%112635

Peak at 500 connections: 790k RPS, 3.49 GB/s TX. Server at 85% CPU busy (usr+sys), NIC at 56%. After 500 connections, RPS drops as latency climbs — more concurrent goroutines means more scheduler overhead, not more throughput. The NIC ceiling at 6250 MB/s was never reached.

IRQ interrupt rate during peak: 16 queues × ~22k interrupts/sec = 352k total interrupts/sec.

results — IRQ pinning (cores 16-31 for IRQs, 0-15 for workers)

connectionsbaseline RPSpinned RPSp50p95 basep95 pinnedp99 basep99 pinned
100444,621471,000<1ms<1ms<1ms<1ms<1ms
500790,253787,194<1ms1ms1ms1ms1ms
1,000786,694782,6881ms3ms3ms3ms3ms
2,000766,189768,6341ms5ms6ms7ms7ms
5,000701,382703,3285ms13ms14ms17ms18ms
10,000673,763669,59011ms26ms28ms35ms38ms

Flat. Within noise. p99 at 10k connections got slightly worse (35ms → 38ms) because pinning workers to 16 cores halved the worker count from 32 to 16.

Finding At 352k interrupts/sec, IRQ cores sit at 0-3% CPU. HAProxy saw gains at ~4M pps on a 100 Gbps NIC. We are at 800k pps on a 50 Gbps NIC. The IRQ cores have nothing to do. Pinning them to dedicated cores solves a problem that isn't occurring.

what the server metrics showed

During peak load (500 connections, baseline):

Server TX:  3,491 MB/s  (56% of 6,250 MB/s ceiling)
Server CPU: usr 20%  sys 25%  idle 15%  → 85% busy
IRQ rate:   ~22k/sec per queue × 16 queues = 352k/sec total
Connections: 500 established on :8083

The server is CPU-bound, not NIC-bound. 85% CPU at 790k RPS. The NIC has 44% headroom. The bottleneck is somewhere in the request path — not interrupt handling, not packet steering.

IRQ pinning changes what cores handle interrupts. It does not change how much CPU the request path consumes. That is a different problem.

Note on IRQ pinning and packet rate IRQ pinning helps when interrupt processing competes with goroutine execution on the same cores. That competition is proportional to packet rate, not RPS. With pipelining=100, 18M RPS = 180k pps. Without pipelining, 800k RPS = ~800k pps. Neither is enough to saturate IRQ cores on a 16-queue ENA NIC. The IRQ regime starts around 4M pps on hardware with NUMA effects and saturated interrupt queues.

↑ Back to Journal