...
- https://ictjira.alma.cl/browse/AES-52
- https://confluence.alma.cl/pages/viewpage.action?pageId=91826715
- You can see poor performance with a command like
- wget --no-check-certificate http://almaportal.cv.nrao.edu/dataPortal/member.uid___A001_X1358_Xd2.3C286_sci.spw31.cube.I.pbcor.fits
- krowe has narrowed this down to the ingress overlay network created by docker swarm which is used to re-route traffic sent to the wrong host.
- On na-arc-2
- nsenter --net=/var/run/docker/netns/ingress_sbox
- iperf -B 10.0.0.21 -s
- On na-arc-3
- nsenter --net=/var/run/docker/netns/ingress_sbox
- iperf3 -B 10.0.0.19 -c 10.0.0.21
- SOLUTION: Set rx-gro-hw=off on naasc-vs-4. See Conclusions for more details.
- On na-arc-2
- 2022-09-19 krowe: With rx-gro-hw=off, retransmissions have reduced, but I still see them when sending to naasc-vs-5 and even more when sending to naasc-vs-2. Sending to naasc-vs-3 or naasc-vs-4 does not produce retransmissions. This is surprising given how similar naasc-vs-3 and naasc-vs-5 are. I expect this is caused by congestion as the retransmissions are not easily reproducable.
- On naasc-vs-2
- iperf3 -B 10.2.120.107 -s
- On naasc-vs-3 or naasc-vs-4 or naasc-vs-5 or naasc-cont-1 or...
- iperf3 -B <LOCAL_IP> -c 10.2.120.107
- 2022-09-21 cfultz: Replaced the 10Gb network cable on naasc-vs-2. "the cable was nearly bent in half at the router". Sadly, I am still seeing retransmissions.
- On naasc-vs-2
- 2022-09-19 krowe: With rx-gro-hw=off, throughput over the vxlan/overlay network (like ingress_sbox) has improved but still ranges from 1Gb/s to 4.6Gb/s. The network is rated at 10Gb/s. VLAN and VXLAN introduce about a 10% overhead penalty. So I would expect throughput to be more like 8Gb/s to 9Gb/s. Granted this isn't really an issue since at the moment the NGAS servers are on a 1Gb/s network, but that may change someday.
...