You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Poor Download Performance

This was first reported on 2022-04-18 and documented in https://ictjira.alma.cl/browse/AES-52  What we have seen/has been reported is that sometimes downloads are incredibly slow (10s of kB/s) and sometimes the transfer is closed with data missing from the download. Other times we see perfectly reasonable download speeds (~10 MB/s).  This was reproducable with a command like the following

wget --no-check-certificate http://almascience.nrao.edu/dataPortal/member.uid___A001_X1358_Xd2.3C286_sci.spw31.cube.I.pbcor.fits

Shortly after this report, the almascience portal was redirected from the production docker swarm to the test-prod docker swarm because it produced better download performance, although still not as good as was expected (10s of MB/s).  Also, somewhere around this time the MTUs on the production docker swarm nodes was changed from 1500 to 9000.

It was noticed that one of the production docker swarm nodes, na-arc-3, was configured differently than the other na-arc-* nodes:

  • ping na-arc-[1,2,4,5] from na-arc-3 with anything larger than -s 1490 drops all packets
  • iperf tests show 10Gb/s between the VM host of na-arc-3 (naasc-vs-3 p5p1.120) and the VM host of na-arc-5 (naasc-vs-5 p2p1.120).  So it isn't a bad card in either of the VM hosts.
  • iptables on na-arc-3 looks different than iptables on na-arc-[2,3,5].  na-arc-1 also looks a bit different.
  • docker_gwbridge interface on na-arc-[1,2,4,5] shows NO_CARRIER but not on na-arc-3.
  • na-arc-3 has a veth10fd1da@if37 interface.  None of the other na-arc-* nodes have a veth interface.
  • iperf3 tests between all the na-arc-* nodes showed na-arc-3 was performing about 10e4 times slower on both sending and receiving.

Given the number of issues with na-arc-3 it was decided to just recreated it from a clone of na-arc-2.  This happened on 2022-08-11 and since then iperf3 tests between all the na-arc-* nodes have shown expected performance.

On 2022-08-12 http://almaportal.cv.nrao.edu/ was created so that we could internally test the production docker swarm nodes in a manner similar to how external users would use it.







  • No labels