Every previous experiment used /simple: {"message":"hi"}, 16 bytes. At 18M RPS the server TX was 18M × 200 bytes ≈ 3.6 GB/s but via pipelining — 100 requests per TCP connection, so the actual packet rate was 18M / 100 = 180k pps. IRQ cores at 180k pps sit at 0-3% CPU. There is nothing to isolate.
IRQ pinning is a packet rate optimization. To test it, you need packets. That means no pipelining and a bigger payload.
/read returns a pre-serialized product: UUID, name, brand, description (5 paragraphs), tags, attributes, images. Measured: ~4.5KB per response. With autocannon --pipelining 1, each request is an independent TCP round trip. One packet in, one packet out per request.
The math at 800k RPS: 800k × 4.5KB = 3.6 GB/s TX = 28.8 Gbps. On a 50 Gbps NIC that’s 57.6% utilization. NIC queues are actually handling traffic volume. This is the regime where IRQ core saturation is possible.
Hardware: c6in.8xlarge for both server and client. 32 vCPU, 64 GB RAM, 50 Gbps dedicated (not “up to”). IRQ affinity default: ENA driver distributes 16 NIC queues one-per-core.
# server — irqbalance already disabled, RPS/RFS applied by user_data
nohup ./fiber_server > /tmp/fiber.log 2>&1 &
# client — connection ramp, no pipelining
for CONN in 100 500 1000 2000 5000 10000; do
autocannon -c $CONN --pipelining 1 -w 30 -d 20 \
"http://SERVER_INTERNAL_IP:8083/read"
sleep 3
done
# IRQ pinning — pin all 16 NIC queues to cores 16-31
IRQ_MASK="ffff0000" # bits 16-31 = cores 16-31
grep ens5 /proc/interrupts | awk -F: '{print $1}' | tr -d ' ' | while read irq; do
echo "$IRQ_MASK" | sudo tee /proc/irq/$irq/smp_affinity > /dev/null
done
# restart fiber workers pinned to cores 0-15
pkill fiber_server
taskset -c 0-15 nohup ./fiber_server > /tmp/fiber_pinned.log 2>&1 &
# server metrics — run in a second SSH window during each benchmark point
# NIC throughput (1s sample)
NIC=$(ip route show default | awk '/default/{print $5}' | head -1)
R1=$(grep "${NIC}:" /proc/net/dev | awk '{print $2,$10}')
sleep 1
R2=$(grep "${NIC}:" /proc/net/dev | awk '{print $2,$10}')
echo "$R1 $R2" | awk '{tx=($4-$2)/1024/1024; printf "TX: %.1f MB/s (%.1f%% of 6250)\n",tx,tx/6250*100}'
# CPU — average across all cores
mpstat -P ALL 1 1 | awk '/^[0-9]/ && $2=="all" {printf "usr:%.1f%% sys:%.1f%% idle:%.1f%%\n",$3,$5,$12}'
# IRQ rate per NIC queue — read /proc/interrupts twice, compute delta
grep "$NIC" /proc/interrupts | awk '{total=0; for(i=2;i<=NF-3;i++) total+=$i; print $1, total}'
| connections | RPS avg | throughput MB/s | NIC TX% | p50 ms | p95 ms | p99 ms |
|---|---|---|---|---|---|---|
| 100 | 444,621 | 1,964 | 31% | <1 | <1 | <1 |
| 500 | 790,253 | 3,491 | 56% | <1 | 1 | 1 |
| 1,000 | 786,694 | 3,475 | 71% | 1 | 3 | 3 |
| 2,000 | 766,189 | 3,385 | 69% | 1 | 5 | 7 |
| 5,000 | 701,382 | 3,098 | 50% | 5 | 13 | 17 |
| 10,000 | 673,763 | 2,977 | 48% | 11 | 26 | 35 |
Peak at 500 connections: 790k RPS, 3.49 GB/s TX. Server at 85% CPU busy (usr+sys), NIC at 56%. After 500 connections, RPS drops as latency climbs — more concurrent goroutines means more scheduler overhead, not more throughput. The NIC ceiling at 6250 MB/s was never reached.
IRQ interrupt rate during peak: 16 queues × ~22k interrupts/sec = 352k total interrupts/sec.
| connections | baseline RPS | pinned RPS | p50 | p95 base | p95 pinned | p99 base | p99 pinned |
|---|---|---|---|---|---|---|---|
| 100 | 444,621 | 471,000 | <1ms | <1ms | <1ms | <1ms | <1ms |
| 500 | 790,253 | 787,194 | <1ms | 1ms | 1ms | 1ms | 1ms |
| 1,000 | 786,694 | 782,688 | 1ms | 3ms | 3ms | 3ms | 3ms |
| 2,000 | 766,189 | 768,634 | 1ms | 5ms | 6ms | 7ms | 7ms |
| 5,000 | 701,382 | 703,328 | 5ms | 13ms | 14ms | 17ms | 18ms |
| 10,000 | 673,763 | 669,590 | 11ms | 26ms | 28ms | 35ms | 38ms |
Flat. Within noise. p99 at 10k connections got slightly worse (35ms → 38ms) because pinning workers to 16 cores halved the worker count from 32 to 16.
During peak load (500 connections, baseline):
Server TX: 3,491 MB/s (56% of 6,250 MB/s ceiling)
Server CPU: usr 20% sys 25% idle 15% → 85% busy
IRQ rate: ~22k/sec per queue × 16 queues = 352k/sec total
Connections: 500 established on :8083
The server is CPU-bound, not NIC-bound. 85% CPU at 790k RPS. The NIC has 44% headroom. The bottleneck is somewhere in the request path — not interrupt handling, not packet steering.
IRQ pinning changes what cores handle interrupts. It does not change how much CPU the request path consumes. That is a different problem.