The c8i.32xlarge experiment showed the server at 26% CPU even at 18M RPS. IRQ interference can’t matter when the server is idle. We downsized to c6i.2xlarge (8 vCPU) to let the client saturate the server and create conditions where interference would be visible.
Server: c6i.2xlarge (8 vCPU, 6.25 Gbps)
Client: c6i.8xlarge (32 vCPU, 25 Gbps)
irqbalance is a daemon that continuously reassigns NIC IRQs to different cores. Every time it runs it undoes any manual affinity settings. Stop it before any IRQ experiment:
sudo systemctl stop irqbalance
sudo systemctl disable irqbalance
# verify
systemctl is-active irqbalance
# inactive
# how many hardware queues does the NIC have?
sudo ethtool -l ens5
# Combined: 8 (c6i.2xlarge)
# which IRQ numbers belong to this NIC?
grep "ens5" /proc/interrupts | awk -F: '{print $1}' | tr -d ' '
# 28 29 30 31 32 33 34 35
# which CPU is currently handling each queue?
NIC=ens5
awk "NR==1{for(i=2;i<=NF;i++)cpu[i]=\$i} \$NF~/ens5/{q=\$NF;for(i=2;i<=NF-3;i++){if(\$i+0>0)printf \"%-28s -> %-8s (%d)\n\",q,cpu[i],\$i}}" /proc/interrupts
# output (default, before pinning):
# ens5-Tx-Rx-0 -> CPU0 (3500418)
# ens5-Tx-Rx-1 -> CPU1 (3675033)
# ...one queue per core, all mixed with fiber workers
Server state: no IRQ pinning, no RPS/RFS, fiber on all 8 cores.
# start fiber normally — no taskset, no pinning
pkill fiber_server 2>/dev/null
nohup ./fiber_server > /tmp/fiber.log 2>&1 &
# verify RPS/RFS is off
cat /sys/class/net/ens5/queues/rx-0/rps_cpus
# 00
cat /proc/sys/net/core/rps_sock_flow_entries
# 0
# benchmark — 30 workers, pipelining 100
./autocannon_bench.sh SERVER_INTERNAL_IP 100 30 30 simple
| connections | RPS | p50 ms | p99 ms | throughput MB/s |
|---|---|---|---|---|
| 1000 | 2,473,242 | 37 | 85 | 306 |
| 2000 | 2,152,192 | 91 | 195 | 266 |
| 5000 | 2,318,061 | 231 | 505 | 287 |
SERVER: AVG usr: 62% sys: 14% idle: 11% — all 8 cores hot
CLIENT: AVG usr: 81% sys: 7.7% idle: 6%
NIC interrupts: 8 queues on CPUs 0-7 — one per core, all mixed with fiber workers
Pin all 8 NIC IRQs to cores 6-7. Restart fiber on cores 0-5 with taskset.
# cores 6-7 on an 8-core system = bits 6+7 = 0xc0
MASK="c0"
for irq in $(seq 28 35); do
echo $MASK | sudo tee /proc/irq/$irq/smp_affinity > /dev/null
done
# verify
cat /proc/irq/28/smp_affinity_list
# 6-7
# restart fiber restricted to cores 0-5
pkill fiber_server 2>/dev/null; sleep 1
nohup taskset -c 0-5 ./fiber_server > /tmp/fiber.log 2>&1 &
./autocannon_bench.sh SERVER_INTERNAL_IP 100 30 30 simple
| connections | RPS | p50 ms | p99 ms | throughput MB/s |
|---|---|---|---|---|
| 1000 | 2,489,660 | 35 | 95 | 308 |
| 2000 | 2,165,822 | 90 | 195 | 268 |
| 5000 | 2,300,485 | 232 | 525 | 285 |
CPU0-5 (fiber cores): usr 72-85% sys 6-16% — doing real work, no interrupts
CPU6-7 (IRQ cores): usr 0-1% sys 0-1% idle 57% — mostly sleeping
IRQ pinning is working — the separation is visible in per-core CPU. But RPS is essentially unchanged (+0.7%).
Apply RPS/RFS on top of the IRQ pinning already in place.
NIC=ens5
# RPS: allow all 8 cores to process softirqs
for f in /sys/class/net/$NIC/queues/rx-*/rps_cpus; do
echo ff | sudo tee $f > /dev/null
done
# RFS: steer packets to the CPU that last ran the socket's goroutine
echo 32768 | sudo tee /proc/sys/net/core/rps_sock_flow_entries > /dev/null
for f in /sys/class/net/$NIC/queues/rx-*/rps_flow_cnt; do
echo 4096 | sudo tee $f > /dev/null
done
# verify
cat /sys/class/net/$NIC/queues/rx-0/rps_cpus
# ff
cat /proc/sys/net/core/rps_sock_flow_entries
# 32768
| connections | RPS | p50 ms | p99 ms | throughput MB/s |
|---|---|---|---|---|
| 1000 | 2,542,891 | 32 | 90 | 315 |
| 2000 | 2,226,432 | 87 | 202 | 276 |
| 5000 | 2,304,654 | 231 | 537 | 285 |
| baseline | IRQ only | IRQ + RPS/RFS | |
|---|---|---|---|
| 1000c RPS | 2,473,242 | 2,489,660 | 2,542,891 (+2.8%) |
| 1000c p50 | 37ms | 35ms | 32ms |
| 2000c RPS | 2,152,192 | 2,165,822 | 2,226,432 (+3.5%) |
| 5000c RPS | 2,318,061 | 2,300,485 | 2,304,654 (~flat) |
~3% improvement. Within noise for most practical purposes.
/compute saturates the server CPU. At 97% server CPU, IRQ interference on fiber cores should be more visible.
./autocannon_bench.sh SERVER_INTERNAL_IP 100 30 30 compute
| config | 1000c RPS | p99 ms |
|---|---|---|
| baseline (8 cores) | 69,060 | 5430 |
| IRQ + RPS/RFS (6 cores) | 69,849 | 6852 |
Near-identical RPS. But p99 got worse with tuning (+26%). Why: restricting fiber to 6 cores with taskset lost 2 compute cores. For CPU-bound work, fewer cores = fewer parallel goroutines = higher queueing latency at the tail.
We applied both simultaneously:
echo c0 | sudo tee /proc/irq/28/smp_affinity # IRQs on cores 6-7
taskset -c 0-5 ./fiber_server # fiber on cores 0-5
These are independent knobs:
taskset controls which cores a process is allowed to run onThe correct experiment would have tested them separately:
Step 1: baseline (8 cores, IRQs anywhere)
Step 2: IRQ pin only (8 cores for fiber, IRQs on 6-7) ← we skipped this
Step 3: taskset only (fiber on 0-5, IRQs anywhere)
Step 4: IRQ + taskset (fiber on 0-5, IRQs on 6-7)
Step 2 — IRQ pinning without restricting fiber cores — would have zero compute cost and shown the pure effect of interrupt isolation.
The HAProxy 2M+ RPS post describes pinning 32 NIC IRQs to 16 dedicated cores. It worked because:
the network saturates at around 4.15 million packets per second… the network-dedicated cores regularly appear at 100%
Their IRQ cores were at 100% CPU. Ours were at 0-3%. The IRQ cores are almost sleeping at our packet rate.
The reason: pipelining. With --pipelining 100 and 1000 connections:
1000 TCP connections × pipelining 100 = 100k requests in-flight
but actual packet rate ≈ 97,000 packets/sec (much lower than RPS)
÷ 8 NIC queues = ~12,000 interrupts/sec per IRQ core
= one interrupt every 83 microseconds
HAProxy was generating 4.15M packets/sec — 40× more interrupt pressure — because they used no pipelining. At their packet rate, dedicating cores to interrupt handling was necessary to prevent constant goroutine preemption.
IRQ pinning is a packet rate optimisation, not an RPS optimisation. Pipelining gives high RPS at low packet rates. Our IRQ cores never got loaded enough for isolation to matter.
To reach the HAProxy regime on our setup:
taskset and IRQ pinning in the same step. For CPU-bound workloads, restricting fiber to fewer cores costs more RPS than interrupt isolation saves. These should always be tested independently.