This entry will be written once the AWS vCPU quota increase is approved and the experiment runs. The plan below is what we intend to do — nothing here has been executed yet.
The HAProxy 2M+ RPS post describes a key observation:
It took me a while to figure out how to completely stabilize the platform because while virtualized, there are still 32 interrupts (aka IRQs) assigned to the network queues, delivered to 32 cores. This could possibly explain the lower performance with a lower number of cores… Moving the interrupts to the 32 upper cores left the 32 lower ones unused and simplified the setup a lot.
In our last run, NIC IRQs were concentrated on 7 cores (CPU5, CPU7, CPU9, CPU11, CPU18, CPU21, CPU28) — assigned randomly by irqbalance. Fiber workers ran on all 128 cores including those 7. Result: NIC interrupt processing competed with request handling on the same cores.
# irqbalance continuously reassigns NIC IRQs — undoes manual pinning
sudo systemctl stop irqbalance
# crond wakes every 60s — causes latency spikes (documented in HAProxy blog)
sudo systemctl stop crond
# server
grep enp95s0 /proc/interrupts | awk -F: '{print $1}' | tr -d ' '
# gives IRQ numbers (143-158 in our last run)
# client
grep ens5 /proc/interrupts | awk -F: '{print $1}' | tr -d ' '
# gives IRQ numbers (28-35 in our last run)
Pin all 16 NIC IRQs on the server to cores 112-127 (upper 16 cores).
# smp_affinity is a hex bitmask
# cores 112-127 = bits 112-127 set
# = 0xffff000000000000000000000000 (128-bit, comma-separated 32-bit groups)
for irq in $(grep enp95s0 /proc/interrupts | awk -F: '{print $1}' | tr -d ' '); do
echo ffff0000,00000000,00000000,00000000 | sudo tee /proc/irq/$irq/smp_affinity
done
This frees cores 0-111 from ever receiving NIC hardware interrupts.
# pin fiber_server to cores 0-111
# (needs to be set before starting the server, or use taskset on the binary)
taskset -c 0-111 ./fiber_server
Current client: c6i.4xlarge, 16 vCPU, 12.5 Gbps — saturated at 82% CPU.
Planned: c8i.32xlarge, 128 vCPU, 50 Gbps.
# same autocannon command, matched hardware
taskset -c 0-111 autocannon -m GET \
--connections 5000 \
--duration 30 \
--pipelining 100 \
--workers 120 \
"http://SERVER_PRIVATE_IP:8083/simple"
# --connections 5000: 5× previous run → 500k simultaneous in-flight requests
# ramp: 1000 → 2000 → 5000 to observe scaling behaviour
RPS and RFS remain active — they operate at the software layer after the hardware IRQ fires. IRQ pinning and RPS/RFS are complementary, not conflicting.
With NIC IRQs on dedicated cores (112-127) and fiber workers on isolated cores (0-111):
With matched c8i.32xlarge client:
Entry will be updated with actual results after the experiment runs.