Page History

...

Table7: iperf3 TCP throughput from/to ingress_sbox with rx-gro-hw=off (Mb/s)
	na-arc-1 (naasc-vs-4)	na-arc-2 (naasc-vs-4)	na-arc-3 (naasc-vs-3)	na-arc-4 (naasc-vs-4)	na-arc-5 (naasc-vs-5)	na-arc-6 (naasc-vs-2)
na-arc-1		4460	2580	4630	2860	3150
na-arc-2	4060		2590	4220	3690	2570
na-arc-3	2710	2580		3080	2770	2920
na-arc-4	1090	3720	2200		2970	3200
na-arc-5	4010	3970	2340	4010		3080
na-arc-6	3380	3060	3060	3010	3080

Documentation

The NAASC doesn't have a documented procedure for creating a VM guest nor making it a docker swarm node. This needs to be documented so that the creation of such nodes can be repeated without error or change. Alvaro's documentation is a good start but far from sufficient. https://confluence.alma.cl/display/OFFLINE/Documentation

In this to-be-written documentation will be one off settings like ethtool -K em1 gro off.

Consistent Hardware

The VM Hosts used ad the NAASC are of various hardware. This lead to the largest performance issue, the GRO feature on naasc-vs-4. I suggest making hardware as consistent as possible to avoid such issues in the future.

NGAS network limit

There has been much effort to put the docker swarm nodes on a 10Gb/s network yet the links to the NGAS nodes is only 1Gb/s. This means that even though there could be a 10Gb connection between the docker swarm nodes and the download site of the archive user, it will still be limited to 1Gb/s.

Upgrade swarm to meet ALMA requirements

According to Alvaro's document https://confluence.alma.cl/display/OFFLINE/Documentation docker swarm nodes should have a minimum of 16cores and 32GB of memory. None of the production docker swarm nodes meet this requirement. There are plans to address this though.

ARC benchmarks

I think it would be worthwhile for each ARC to benchmark their download performance. This should be done regularly (weekly, monthly, quarterly, etc) and using as similar a procedure at each arc as possible. This will provide two useful sets of data. 1. It will show when performance has dropped at an ARC hopefully before users start complaining and 2. it will provide a history of benchmarks to measure current benchmarks against. A simple wget script could be used to do this and shared among the ARCs. E.g.

wget --no-check-certificate https://almascience.nrao.edu/dataPortal/member.uid___A001_X1284_Xc9b.spt2349-56_sci.spw19.cube.I.pbcor.fits

Dropped packets

Some of the NAASC VM hosts show lots of dropped Rx packets. The rate ranges from 2 to over 100 per minute. This is really unacceptable on a modern, well-designed network. While I can't say these dropped packets are indicative of a problem, they could become a problem with increased load and they certainly will make debugging more difficult when there is a problem. I suggest the reason for these dropped packets be found and resolved.

TCP retransmissions

The newest NAASC VM Host (naasc-vs-2) shows over 100 TCP retransmissions per second when doing iperf3 tests. Other nodes like naasc-vs-3 and naasc-vs-4 do not show these at all. While I can't say these TCP retransmissions are indicative of a problem, they could become a problem with increased load and they certainly will make debugging more difficult when there is a problem. I suggest the reason for these TCP retransmissions be found and resolved.

Space shortcuts

Page tree

Versions Compared

Old Version 7

New Version 8

Key