Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Shortly after this report, the almascience portal was redirected from the production docker swarm to the test-prod docker swarm because it produced better download performance, although still not as good as was expected (10s of MB/s).  Also, somewhere around this time the MTUs on the production docker swarm nodes was changed from 1500 to 9000.

It was noticed that one of the production docker swarm nodes, na-arc-3, was configured differently than the other na-arc-* nodes:

  • ping na-arc-[1,2,4,5] from na-arc-3 with anything larger than -s 1490 drops all packets
  • iperf tests show 10Gb/s between the VM host of na-arc-3 (naasc-vs-3 p5p1.120) and the VM host of na-arc-5 (naasc-vs-5 p2p1.120).  So it isn't a bad card in either of the VM hosts.
  • iptables on na-arc-3 looks different than iptables on na-arc-[2,3,5].  na-arc-1 also looks a bit different.
  • docker_gwbridge interface on na-arc-[1,2,4,5] shows NO_CARRIER but not on na-arc-3.
  • na-arc-3 has a veth10fd1da@if37 interface.  None of the other na-arc-* nodes have a veth interface.
  • iperf3 tests between all the na-arc-* nodes showed na-arc-3 was performing about 10e4 times slower on both sending and receiving.

Given the number of issues with na-arc-3 it was decided to just recreated it from a clone of na-arc-2.  This happened on 2022-08-11 and since then iperf3 tests between all the na-arc-* nodes have shown expected performance.

On 2022-08-12 http://almaportal.cv.nrao.edu/ was created so that we could internally test the production docker swarm nodes in a manner similar to how external users would use it.