switching to /read: c6in.8xlarge, 50 Gbps, and IRQ pinning on a loaded NIC

c6in read NIC IRQ pipelining autocannon profiling

why /read and why c6in

Every previous experiment used /simple: {"message":"hi"}, 16 bytes. At 18M RPS the server TX was 18M × 200 bytes ≈ 3.6 GB/s but via pipelining — 100 requests per TCP connection, so the actual packet rate was 18M / 100 = 180k pps. IRQ cores at 180k pps sit at 0-3% CPU. There is nothing to isolate.

IRQ pinning is a packet rate optimization. To test it, you need packets. That means no pipelining and a bigger payload.

/read returns a pre-serialized product: UUID, name, brand, description (5 paragraphs), tags, attributes, images. Measured: ~4.5KB per response. With autocannon --pipelining 1, each request is an independent TCP round trip. One packet in, one packet out per request.

The math at 800k RPS: 800k × 4.5KB = 3.6 GB/s TX = 28.8 Gbps. On a 50 Gbps NIC that’s 57.6% utilization. NIC queues are actually handling traffic volume. This is the regime where IRQ core saturation is possible.

Hardware: c6in.8xlarge for both server and client. 32 vCPU, 64 GB RAM, 50 Gbps dedicated (not “up to”). IRQ affinity default: ENA driver distributes 16 NIC queues one-per-core.

setup

# server — irqbalance already disabled, RPS/RFS applied by user_data
nohup ./fiber_server > /tmp/fiber.log 2>&1 &

# client — connection ramp, no pipelining
for CONN in 100 500 1000 2000 5000 10000; do
  autocannon -c $CONN --pipelining 1 -w 30 -d 20 \
    "http://SERVER_INTERNAL_IP:8083/read"
  sleep 3
done

# IRQ pinning — pin all 16 NIC queues to cores 16-31
IRQ_MASK="ffff0000"   # bits 16-31 = cores 16-31
grep ens5 /proc/interrupts | awk -F: '{print $1}' | tr -d ' ' | while read irq; do
  echo "$IRQ_MASK" | sudo tee /proc/irq/$irq/smp_affinity > /dev/null
done

# restart fiber workers pinned to cores 0-15
pkill fiber_server
taskset -c 0-15 nohup ./fiber_server > /tmp/fiber_pinned.log 2>&1 &

# server metrics — run in a second SSH window during each benchmark point
# NIC throughput (1s sample)
NIC=$(ip route show default | awk '/default/{print $5}' | head -1)
R1=$(grep "${NIC}:" /proc/net/dev | awk '{print $2,$10}')
sleep 1
R2=$(grep "${NIC}:" /proc/net/dev | awk '{print $2,$10}')
echo "$R1 $R2" | awk '{tx=($4-$2)/1024/1024; printf "TX: %.1f MB/s (%.1f%% of 6250)\n",tx,tx/6250*100}'

# CPU — average across all cores
mpstat -P ALL 1 1 | awk '/^[0-9]/ && $2=="all" {printf "usr:%.1f%% sys:%.1f%% idle:%.1f%%\n",$3,$5,$12}'

# IRQ rate per NIC queue — read /proc/interrupts twice, compute delta
grep "$NIC" /proc/interrupts | awk '{total=0; for(i=2;i<=NF-3;i++) total+=$i; print $1, total}'

results — connection ramp (baseline, no IRQ pinning)

connections	RPS avg	throughput MB/s	NIC TX%	p50 ms	p95 ms	p99 ms
100	444,621	1,964	31%	<1	<1	<1
500	790,253	3,491	56%	<1	1	1
1,000	786,694	3,475	71%	1	3	3
2,000	766,189	3,385	69%	1	5	7
5,000	701,382	3,098	50%	5	13	17
10,000	673,763	2,977	48%	11	26	35

Peak at 500 connections: 790k RPS, 3.49 GB/s TX. Server at 85% CPU busy (usr+sys), NIC at 56%. After 500 connections, RPS drops as latency climbs — more concurrent goroutines means more scheduler overhead, not more throughput. The NIC ceiling at 6250 MB/s was never reached.

IRQ interrupt rate during peak: 16 queues × ~22k interrupts/sec = 352k total interrupts/sec.

results — IRQ pinning (cores 16-31 for IRQs, 0-15 for workers)

connections	baseline RPS	pinned RPS	p50	p95 base	p95 pinned	p99 base	p99 pinned
100	444,621	471,000	<1ms	<1ms	<1ms	<1ms	<1ms
500	790,253	787,194	<1ms	1ms	1ms	1ms	1ms
1,000	786,694	782,688	1ms	3ms	3ms	3ms	3ms
2,000	766,189	768,634	1ms	5ms	6ms	7ms	7ms
5,000	701,382	703,328	5ms	13ms	14ms	17ms	18ms
10,000	673,763	669,590	11ms	26ms	28ms	35ms	38ms

Flat. Within noise. p99 at 10k connections got slightly worse (35ms → 38ms) because pinning workers to 16 cores halved the worker count from 32 to 16.

Finding At 352k interrupts/sec, IRQ cores sit at 0-3% CPU. HAProxy saw gains at ~4M pps on a 100 Gbps NIC. We are at 800k pps on a 50 Gbps NIC. The IRQ cores have nothing to do. Pinning them to dedicated cores solves a problem that isn't occurring.

what the server metrics showed

During peak load (500 connections, baseline):

Server TX:  3,491 MB/s  (56% of 6,250 MB/s ceiling)
Server CPU: usr 20%  sys 25%  idle 15%  → 85% busy
IRQ rate:   ~22k/sec per queue × 16 queues = 352k/sec total
Connections: 500 established on :8083

The server is CPU-bound, not NIC-bound. 85% CPU at 790k RPS. The NIC has 44% headroom. The bottleneck is somewhere in the request path — not interrupt handling, not packet steering.

IRQ pinning changes what cores handle interrupts. It does not change how much CPU the request path consumes. That is a different problem.

Note on IRQ pinning and packet rate IRQ pinning helps when interrupt processing competes with goroutine execution on the same cores. That competition is proportional to packet rate, not RPS. With pipelining=100, 18M RPS = 180k pps. Without pipelining, 800k RPS = ~800k pps. Neither is enough to saturate IRQ cores on a 16-queue ENA NIC. The IRQ regime starts around 4M pps on hardware with NUMA effects and saturated interrupt queues.

Next The bottleneck is somewhere in the request path at 85% CPU. Not IRQ. Not NIC. Profile it — entry 09.

← IRQ on c8i + vegeta without pipelining: still the client 403 seconds of waste, per 60 seconds of work →

↑ Back to Journal