...
The newest NAASC VM Host (naasc-vs-2) often shows over 100 TCP retransmissions per second when doing iperf3 tests. Other nodes like naasc-vs-3 and naasc-vs-4 show 0 TCP retransmissions per second. While I can't say these TCP retransmissions are indicative of a problem, they could become a problem with increased load and they certainly will make debugging more difficult when there is a problem. I suggest the reason for these TCP retransmissions be found and resolved.
MTU
Dropped packets
Some of the NAASC VM hosts show lots of dropped Rx packets. The rate ranges from 2 to over 100 per minuteAt some point the Maximum Transmission Unit (MTU) for ethernet frames on the production servers was changed from 1500 to 9000. This is really unacceptable on a modern, well-designed network. While I can't say these dropped packets are indicative of a problem, they could become a problem with increased load and they certainly will make debugging more difficult when there is a problem. I suggest the reason for these dropped packets be found and resolved.a common technique to improve performance in certain situations. But in order to benefit from a 9000 MTU, all ethernet devices in the data path must be set 9000 MTU. Simply changing the interfaces on the naasc-vs and na-arc nodes is not enough. All the NGAS nodes, docker containers, and namespaces in the data path must also be changed. This means recreating the entire ingress overlay network among other changes. Also, since it is unlikely the end user is going to have an MTU of 9000, there is little advantage in setting an MTU of 9000 if your goal is to move data to the user faster. Finally, because of the overhead of vxlan, an MTU of 8900 would be better than 9000. I suggest leaving the MTU at the default 1500 until there is good evidence that a larger MTU is an improvement.
Dropped packets
Some of the NAASC VM hosts show lots of dropped Rx packets. The rate ranges from 2 to over 100 per minute. This is really unacceptable on a modern, well-designed network. While I can't say these dropped packets are indicative of a problem, they could become a problem with increased load and they certainly will make debugging more difficult when there is a problem. I suggest the reason for these dropped packets be found and resolved.
Further tests show patterns. It looks like the same packets may be being dropped on naasc-vs-2 Further tests show patterns. It looks like the same packets may be being dropped on naasc-vs-2 and naasc-vs-4 as they report the same dropped packet rate. For example, I wrote a simple script to print dropped packets per time interval and ran it at the same time on all four naasc-vs hosts. You can see that naasc-vs-2 and naasc-vs-4 show a similar pattern, while naasc-vs-3 and naasc-vs-5 show a different pattern.
naasc-vs-2 | naasc-vs-3 | naasc-vs-4 | naasc-vs-5 |
---|---|---|---|
30 | 0 | 30 | 0 |
22 | 0 | 24 | 0 |
13 | 1 | 11 | 1 |
9 | 0 | 9 | 0 |
8 | 0 | 8 | 0 |
12 | 1 | 12 | 1 |
I don't think these dropped packets are viewable with tcpdump. At least I haven't seen a set of packets in a tcpdump that matches the number of dropped packets. I supposed there may be more than one type of packet being dropped, but that is very difficult to tell.
...
According to Alvaro's document https://confluence.alma.cl/display/OFFLINE/Documentation docker swarm nodes should have a minimum of 16cores and 32GB of memory. None of the production docker swarm nodes meet this requirement. There is a paln to address this though.
ARC benchmarks
I think it would be worthwhile for each ARC to benchmark their download performance. This should be done regularly (weekly, monthly, quarterly, etc) and using as similar a procedure at each arc as possible. This will provide two useful sets of data. 1. It will show when performance has dropped at an ARC hopefully before users start complaining and 2. it will provide a history of benchmarks to measure current benchmarks against. A simple wget script could be used to do this and shared among the ARCs. E.g.
wget --no-check-certificate https://almascience.nrao.edu/dataPortal/member.uid___A001_X1284_Xc9b.spt2349-56_sci.spw19.cube.I.pbcor.fits
...
_X1284_Xc9b.spt2349-56_sci.spw19.cube.I.pbcor.fits
Better use of docker swarm
The web proxy points each connection to the next na-arc node in a round-robin manner. Each na-arc node runs no more than one copy of each of the docker containers. There are five na-arc nodes. This means that 80% of requests go to the wrong host and have to be re-routed to the correct host using the docker swarm overlay ingress network (vxlan). This seems very inefficient.
RHEL8 shorcommings
The version of RHEL8 installed on naasc-vs-2 seems to be some small subset of the full RHEL8 distrobution. For over a decade, NRAO installed all packages that came with the Operating System because disk space is cheap and we might need tools like iperf3 or dropwatch or tcpretrans.
...