...
- 2022-09-02 krowe: sysctl -a | grep <10Gb NIC> between naasc-vs-3/naasc-vs-5 and naasc-vs-4 are different
- naasc-vs-4 has entries for VLANs 101 and 140 while naasc-vs-3 and naasc-vs-5 have entries for VLANs 192 and 96.
- 2022-09-02 krowe: sysctl -a on naasc-vs-4 and naasc-vs-5 and found many questionable differences
- naasc-vs-4: net.iw_cm.default_backlog = 256
- Is this because the IB modules are loaded?
- naasc-vs-4: net.rdma_ucm.max_backlog = 1024
- Is this because the IB modules are loaded?
- naasc-vs-4: sunrpc.rdma*
- Is this because the IB modules are loaded?
- naasc-vs-4: net.netfilter.nf_log.2 = nfnetlink_log
- nfnetlink is a module for packet mangling. Could this interfere with the docker swarm networking?
- naasc-vs-4: net.iw_cm.default_backlog = 256
- 2022-09-06 krowe: ethtool -k <NIC> for naasc-vs-3/naasc-vs-5 are very different from naasc-vs-4.
- hw-tc-offload: off vs hw-tc-offload: on
- rx-gro-hw: off vs rx-gro-hw: on
- rx-vlan-offload: off vs rx-vlan-offload: on
- rx-vlan-stag-hw-parse: off vs rx-vlan-stag-hw-parse: on
- tcp-segmentation-offload: off vs tcp-segmentation-offload: on
- tx-gre-csum-segmentation: off vs tx-gre-csum-segmentation: on
- tx-gre-segmentation: off vs tx-gre-segmentation: on
- tx-gso-partial: off vs x-gso-partial: on
- tx-ipip-segmentation: off vs tx-ipip-segmentation: on
- tx-sit-segmentation: off vs tx-sit-segmentation: on
- tx-tcp-segmentation: off vs tx-tcp-segmentation: on
- tx-udp_tnl-csum-segmentation: off vs tx-udp_tnl-csum-segmentation: on
- tx-udp_tnl-segmentation: off vs tx-udp_tnl-segmentation: on
- tx-vlan-offload: off vs tx-vlan-offload: on
- tx-vlan-stag-hw-insert: off vs tx-vlan-stag-hw-insert: on
- 2022-09-15 krowe: The VM Hosts have different 10Gb network cards
- naasc-vs-2 uses a Solarflare Communications SFC9220
- naasc-vs-3 uses a Solarflare Communications SFC9020
- naasc-vs-4 uses a Broadcom BCM57412 NetXtreme-E
- naasc-vs-5 uses a Solarflare Communications SFC9020
I found an articles suggesting that GRO can make traffic slower when it is enabled. I see that rx-gro-hw is enabled on naasc-vs-4 but disabled on naasc-vs-2, 3, and 5. You can see this with ethtool -k em1 | grep gro. So I disabled it on naasc-vs-4 with ethtool -K em1 gro off and iperf3 tests now show about 2Gb/s both directions!!! I found articles suggesting that GRO can make traffic slower when it is enabled, especially when using vxlan which Docker Swarm uses.
- GRO = Generic Receive Offload. It is hardware on the physical NIC. GRO is an aggregation technique to coalesce several receive packets from a stream into a single large packet, thus saving CPU cycles as fewer packets need to be processed by the kernel.
- https://bugzilla.redhat.com/show_bug.cgi?id=1424076
- https://access.redhat.com/solutions/20278
- https://techdocs.broadcom.com/us/en/storage-and-ethernet-connectivity/ethernet-nic-controllers/bcm957xxx/adapters/Tuning/tcp-performance-tuning/nic-tuning_22/gro-generic-receive-offload.html
- https://techdocs.broadcom.com/us/en/storage-and-ethernet-connectivity/ethernet-nic-controllers/bcm957xxx/adapters/Tuning/ip-forwarding-tunings/nic-tuning_48.html
- https://techdocs.broadcom.com/us/en/storage-and-ethernet-connectivity/ethernet-nic-controllers/bcm957xxx/adapters/Tuning/tcp-performance-tuning/os-tuning-linux.htmlAfter disabling rx-gro-hw, I no longer see TCP Retransmission or TCP Out-Of-Order packets when tracing the iperf3 test from na-arc-3 to na-arc-2.
So I disabled it on naasc-vs-4 with ethtool -K em1 gro off and iperf3 tests now show about 2Gb/s both directions!!!
Table7: iperf3 TCP throughput from/to ingress_sbox with rx-gro-hw=off (Mb/s) | ||||||
---|---|---|---|---|---|---|
na-arc-1 (naasc-vs-4) | na-arc-2 (naasc-vs-4) | na-arc-3 (naasc-vs-3) | na-arc-4 (naasc-vs-4) | na-arc-5 (naasc-vs-5) | na-arc-6 (naasc-vs-2) | |
na-arc-1 | 4460 | 2580 | 4630 | 2860 | 3150 | |
na-arc-2 | 4060 | 2590 | 4220 | 3690 | 2570 | |
na-arc-3 | 2710 | 2580 | 3080 | 2770 | 2920 | |
na-arc-4 | 1090 | 3720 | 2200 | 2970 | 3200 | |
na-arc-5 | 4010 | 3970 | 2340 | 4010 | 3080 | |
na-arc-6 | 3380 | 3060 | 3060 | 3010 | 3080 |
References
- Prepare offline infrastructure from the scratch (Describes docker swarm setup)
- file:///tmp/ALMA%20Offline%20Software%20Test_Deployment%20Concept(2).pdf