...
- https://ictjira.alma.cl/browse/AES-52
- https://confluence.alma.cl/pages/viewpage.action?pageId=91826715
- You can see poor performance with a command like
- wget --no-check-certificate http://almaportal.cv.nrao.edu/dataPortal/member.uid___A001_X1358_Xd2.3C286_sci.spw31.cube.I.pbcor.fits
Diagrams
Timeline of events
- 2020-03-19: ALMA suspends science observing and stows the array because of COVID-19.
- 2020-06-24: Archive webapps (aq, asaz, rh, etc, but not SP) moved to new Docker Swarm (na-arc-*) system. See more.
- 2021-03-17: ALMA re-starts limited science observations, resuming Cycle 7. See more.
- 2021-10-01: ALMA starts Cycle 8 observations. See more.
- 2022-02-03: Science Portal (SP) upgraded Plone, Python, RHEL and moved into Docker Swarm. All other webapps had already been in Docker Swarm.
- 2022-04-18: First documented report of performance issues. Webapps moved to pre-production Docker Swarm (natest-arc-*). See more
- 2022-05-09: moved Science Portal (SP) from Docker Swarm to an rsync copy on http://almaportal.cv.nrao.edu/ for performance issues
- 2022-05-31: moved Science Portal (SP) from rsync copy back to Docker Swarm
- 2022-06-30: Tracy changed the eth0 MTU on the production docker swarm nodes (na-arc-*) from the default 1500 to 9000. The test swarm is still 1500.
- 2022-07-25: Jeff Kern asked K. Scott Rowe to head a tiger team to investigate the various issues that have affected the ALMA Archive.
- 2022-08-11: cloned na-arc-2 and moved the clone to naasc-vs-3 as na-arc-3 and change MTU to 1500. Other na-arc nodes are 9000 but changing na-arc-3 to 9000 would require changing naasc-vs-3 which could affect other, non-archive, VM guests.
- 2022-08-12: setup http://almaportal.cv.nrao.edu/ which uses the five na-arc nodes. This is for internal testing. Results show download speed at about 32KB/s regaurdless of which na-arc node the web proxy chooses.
- 2022-08-17 krowe: Changed eth0 on na-arc-5 from qdisc pfifo_fast to qdisc fq_codel to match all the other na-arc and natest-arc nodes. This seemed to have no affect on performance.
- tc qdisc replace dev eth0 root fq_codel
- 2022-08-19 krowe: For some reason, all the swarm services on na-arc-5 shutdown about 11am Central Aug. 18, 2022. Now my wget tests are getting about 100MB/s and I tested this five times to walk through all four nodes. I then moved the httpd to na-arc-5 and now na-arc-[1,2,4] download at ~32KB/s while na-arc-[3,5] download at ~100MB/s.
- 2022-08-25 krowe: Tracy cahnged the following sysctl options on naasc-vs-5 to match the other VM Hosts. Sadly it seems to have had no effect on wget performance. na-arc-1, na-arc-2, na-arc-4 are 32KB/s while na-arc-3 and na-arc-5 are 45MB/s.
- net.ipv4.conf.all.accept_redirects = 0
- net.ipv4.conf.all.forwarding = 1
- 2022-09-01: Tracy rebooted naasc-vs-5 which hosts na-arc-5 just in case this was necessary for the net.ipv4.conf.all.forwarding sysctl change to take effect. Sadly, no change in performance.
...
- 2022-09-02 krowe: sysctl -a | grep br97 accross naasc-vs-{3..5} are identical.
- 2022-09-02 krowe: sysctl -a | grep <vnet> accross naasc-vs-{3..5} are identical except for the vnet name (e.g. vnet2, vnet4, etc)
- 2022-09-02 krowe: sysctl -a | grep <10Gb NIC> accross naasc-vs-{3..5} are different
- naasc-vs-3 is identical to naasc-bs-5 except for the name of the NIC (e.g. p5p1, p2p1).
- naasc-vs-4 has entries for VLANs 101 and 140 while naasc-vs-3 and naasc-vs-5 have entries for VLANs 192 and 96.
- 2022-09-02 krowe: sysctl -a on naasc-vs-3 and naasc-vs-5 have no significant differences.
- 2022-09-02 krowe: sysctl -a on naasc-vs-4 and naasc-vs-5 and found many questionable differences
- naasc-vs-4: net.iw_cm.default_backlog = 256
- Is this because the IB modules are loaded?
- naasc-vs-4: net.rdma_ucm.max_backlog = 1024
- Is this because the IB modules are loaded?
- naasc-vs-4: sunrpc.rdma*
- Is this because the IB modules are loaded?
- naasc-vs-4: net.netfilter.nf_log.2 = nfnetlink_log
- naasc-vs-4: net.iw_cm.default_backlog = 256
- 2022-09-06 krowe: CV-NEXUS switch port capibilities for naasc-vs-{3..5} are identical.
- 2022-09-06 krowe: CV-NEXUS9K switch port capibilities for naasc-vs-{3..5} are identical.
- Though the recorded output rate of naasc-vs-5 is about 500 Mb/s while naasc-vs-{3..4} is about 300Kb/s.
- And the recorded input rate of naasc-vs-5 is about 500 Mb/s while naasc-vs{3..4} is about 5 Mb/s.
- This is very strange as it seemed naasc-vs-5 was the limiting factor but the switch ports suggest not. Perhaps this data rate is caused by other VM guests on naasc-vs-5 (helpdesk-prod, naascweb2-prod, cartaweb-prod, natest-arc-3, cobweb2-dev)
- 2022-09-06 krowe: ethtool -k <NIC> for naasc-vs-3 and naasc-vs-5 are identical execpt for the NIC name
- 2022-09-06 krowe: ethtool -k <NIC> for naasc-vs-3 and naasc-vs-4 are very different.
- hw-tc-offload: off vs hw-tc-offload: on
- rx-gro-hw: off vs rx-gro-hw: on
- rx-vlan-offload: off vs rx-vlan-offload: on
- rx-vlan-stag-hw-parse: off vs rx-vlan-stag-hw-parse: on
- tcp-segmentation-offload: off vs tcp-segmentation-offload: on
- tx-gre-csum-segmentation: off vs tx-gre-csum-segmentation: on
- tx-gre-segmentation: off vs tx-gre-segmentation: on
- tx-gso-partial: off vs x-gso-partial: on
- tx-ipip-segmentation: off vs tx-ipip-segmentation: on
- tx-sit-segmentation: off vs tx-sit-segmentation: on
- tx-tcp-segmentation: off vs tx-tcp-segmentation: on
- tx-udp_tnl-csum-segmentation: off vs tx-udp_tnl-csum-segmentation: on
- tx-udp_tnl-segmentation: off vs tx-udp_tnl-segmentation: on
- tx-vlan-offload: off vs tx-vlan-offload: on
- tx-vlan-stag-hw-insert: off vs tx-vlan-stag-hw-insert: on
na-arc-1,2,3,4,5
Diagrams
...
Questions
- Where is the main docker config (yaml file)?
- Why does na-arc-5 still have net.ipv4.conf.all.accept_redirects = 1 even after a reboot while all the other na-arc nodes have this set to 0?
- 2022-09-06 krowe: probably because na-arc-5 didn't reboot when naasc-vs-5 rebooted. I expect it was suspended instead of rebooted. Yet natest-arc-3 and naascweb2-prod were rebooted. I just checked virt-manager and na-arc-5 is hosted by naasc-vs-5. Can we reboote na-arc-5?
- Why does naasc-vs-4 have all the infiniband modules loaded? I don't see an IB card. naasc-vs-1 and naasc-dev-vs also have some IB modules loaded but naasc-vs-3 and naasc-vs-5 don't have any IB modules loaded.
- Why is nfnetlink logging enabled on naasc-vs-4? You can see this with cat /proc/net/netfilter/nf_log and lsmod|grep -i nfnet
- why is the eth1 interface in the ingress_sbox namespace on na-arc-1 172.18.0.2 while all the other na-arcs are 172.19.0.2?
- Here are some diffs in sysctl on na-arc nodes. I tried changed na-arc-4 and na-arc-5 to match the others but performance was the same. I then changed all the nodes to match na-arc-{1..3} and still no change in performance. I still don't understand how na-arc-{4..5} got different setttings. I did find that there is another directory for sysctl settings in /usr/lib/sysctl.d but that isn't why these are different.
- na-arc-1, na-arc-2, na-arc-3, natest-arc-1, natest-arc-2, natest-arc-3
net.bridge.bridge-nf-call-arptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 1
- na-arc-4, na-arc-5
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
- na-arc-1, na-arc-2, na-arc-3, natest-arc-1, natest-arc-2, natest-arc-3
- I see sysctl differences between the natest-arc servers and the na-arc servers. Here is a diff of /etc/sysctl.d/99-nrao.conf on natest-arc-1 and na-arc-5
< #net.ipv4.tcp_tw_recycle = 1
---
> net.ipv4.tcp_tw_recycle = 1
22,39d21
< net.ipv4.conf.all.accept_redirects=0
< net.ipv4.conf.default.accept_redirects=0
< net.ipv4.conf.all.secure_redirects=0
< net.ipv4.conf.default.secure_redirects=0
<
< #net.ipv6.conf.all.disable_ipv6 = 1
< #net.ipv6.conf.default.disable_ipv6 = 1
<
< # Mellanox recommends the following
< net.ipv4.tcp_timestamps = 0
< net.core.netdev_max_backlog = 250000
<
< net.core.rmem_default = 16777216
< net.core.wmem_default = 16777216
< net.core.optmem_max = 16777216
< net.ipv4.tcp_mem = 16777216 16777216 16777216
< net.ipv4.tcp_low_latency = 1If I set net.ipv4.tcp_timestamps = 0 on na-arc-5, the wget download drops to nothing (--.-KB/s).
- If I set all the above sysctl options, execpt net.ipv4.tcp_timestamps, on all five na-arc nodes, wget download performance doesn't change. It is still about 32KB/s. Also I still zeeo ZeroWindow packets.
- Try rebooting VMs after making changes?
- I see ZeroWindow packets sent from na-arc-5 to nangas13 while downloading a file from nangas13 using wget. This is na-arc-5 telling nangas13 to wiat because its network buffer is full.
- Is this because of qdisc pfifo_fast? No. krowe changed eth0 to *qdisc fq_codel* and still seeing ZeroWait packets.
- Now that I have moved the rh_download to na-arc-1 and put httpd on na-arc-5 I no longer see ZeroWindow packets on na-arc-5. But I am seeing them on na-arc-1 which is where the rh_downloader is now. Is this because the rh_downloader is being stalled talking to something else like httpd and therefore telling nangas13 to wait?
- Why does almaportal use ens3 while almascience uses eth0?
- What if we move the rh-downloader container to a different node? In fact walk it through all five nodes and test.
- Why do I see cv-6509 when tracerouting from na-arc-5 to nangas13 but not on natest-arc-1
[root@na-arc-5 ~]# traceroute nangas13
traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
1 cv-6509-vlan97.cv.nrao.edu (10.2.97.1) 0.426 ms 0.465 ms 0.523 ms
2 cv-6509.cv.nrao.edu (10.2.254.5) 0.297 ms 0.277 ms 0.266 ms
3 nangas13.cv.nrao.edu (10.2.140.33) 0.197 ms 0.144 ms 0.109 ms[root@natest-arc-1 ~]# traceroute nangas13
traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
1 cv-6509-vlan96.cv.nrao.edu (10.2.96.1) 0.459 ms 0.427 ms 0.402 ms
2 nangas13.cv.nrao.edu (10.2.140.33) 0.184 ms 0.336 ms 0.311 ms- Derek wrote that 10.2.99.1 = CV-NEXUS and 10.2.96.1 = CV-6509
- Why does natest-arc-3 have ens3 instead of eth0 and why is its speed 100Mb/s?
- virsh domiflist natest-arc-3 shows the Model as rtl8139 instead of virtio
- When I run ethtool eth0 on nar-arc-{1..5} natest-arc-{1..2} as root, the result is just Link detected: yes instead of the full report with speed while na-arc-3 shows 100Mb/s.
- Why do iperf tests from natest-arc-1 and natest-arc-2 to natest-arc-3 get about half the performance (0.5Gb/s) expected especially when the reverse tests get expected performance (0.9Gb/s).
- Is putting the production swarm nodes (na-arc-*) on the 10Gb/s network a good idea? Sure it makes a fast connection to cvsan but it adds one more hop to the nangas servers (e.g. na-arc-1 -> cv-nexus9k -> cv-nexus -> nangas11)
- When I connect to the container acralmaprod001.azurecr.io/offline-production/rh-download:2022.06.01.2022jun I get errors like unknown user 1009 I get the same errors on the natest-arc-1 container.
- Does it matter that the na-arc nodes are on 10.2.97.x, their VM host is on 10.2.99.x while the natest-arc nodes are on 10.2.96.x and their VM hosts (well 2 out of 3) are also on 10.2.96.x? Is this why I see cv-509.cv.nrao.edu when running traceroute from the na-arc nodes?
- When running wget --no-check-certificate http://na-arc-3.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1358_Xd2.3C286_sci.spw31.cube.I.pbcor.fits I see traffic going through veth14ce034 on na-arc-3 but I can't find a container associated with that veth.
- Why does the httpd container have eth0(10.0.0.8). This is the ingress network. I don't see any other conrainter with an interface on 10.0.0.0/24.
...