...
- Create na-arc-6 on new naasc-vs-2 (https://support.nrao.edu/show-ticket.php?ticketid=144552)
- Test iperf between ingress_sbox on new na-arc-6 when it is available
- Set ethtool -K em1 gro off perminantly on naasc-vs-4 and document it. How do we do this?
- Double check switch port settings for naasc-vs-2. I am seeing many TCP retransmissions (dhart)
- Check and perhaps replace 10Gb network cable to naas-vs-2. Does that help with TCP retransmissions?
- are the retarnsmissions to naasc-vs-2 causing my wget to na-arc-6 to fail?
- Strawman proposal for reassigning VM guests
Done
Answers
- Why does iperf show 10Gb/s between Recreate na-arc-3 so it gets the same performance as other 5 and na-arc-* nodes which is apparently at least 10Gb/s. (pmurphy)
- 2022-08-11: cloned na-arc-2 and moved the clone to naasc-vs-3 (zbutcher)
- 2022-08-11: moved old na-arc-3 to na-arc-3-OLD (thalstea)
- 2022-08-11: Renamed the clone to na-arc-3. We connected it to the swarm successfully, but it had a low connection speed.
- 2022-08-11: Changed the model of na-arc-3's vnet5 interface on naasc-vs-3 from rtl8139 to virtio to match all the other na-arc-* nodes. Performance was still poor.
- 2022-08-11: Changed the MTU of na-arc-3 eth0 to 1500. This is different than all the other na-arc-* nodes but it was either that or change the p5p1.120 and br97 on naasc-vs-3 from 9000 to 1500 which my have impacted other VM guests on that host. Performance was now reasonable. 7Gb/s. I was expecting about 9Gb/s but perhaps the 1500 MTU is affecting performance.
- 2022-08-11: Joined na-arc-3 to the swarm and started services (sbooth)
- Launch services on production swarm (sbooth)
- 2022-08-11: Joined na-arc-3 to the swarm and started services (sbooth)
- Test the production docker swarm with a test web interface. (lsharp)
- 2022-08-12: http://almaportal.cv.nrao.edu/
- 2022-08-12 krowe: ran tcpdump on all five na-arc-{1..5} nodes tcpdump dst almaportal and then downloaded a datafile wget --no-check-certificate https://almaportal.cv.nrao.edu/dataPortal/2013.1.00226.S_uid___A001_X122_X1f1_001_of_001.tar and with each execution of the wget, I could see the next na-arc host report the traffic. This is because the web proxy on almaportal will select the next na-arc node via round-robin. All five nodes were providing about 6KB/s speeds to cvpost-master.
- 2022-08-12 krowe: I did iperf tests from host to host in the entire chain (nangas14 -> na-arc-{1..5} -> almaportal -> cvpost-master) and each step the performance was at least 900Mb/s yet downloading with wget was about 0.06Mb/s.
- Ask other ARC if they use MTU 9000 on 10Gb. (krowe)
- JAO uses MTU of 1500
- ESO uses two VM hosts running VMware with 10Gb/s and MTU of 1500
- 2022-08-17 krowe: Changed eth0 on na-arc-5 from qdisc pfifo_fast to qdisc fq_codel to match all the other na-arc and natest-arc nodes. This seemed to have no affect on performance.
- tc qdisc replace dev eth0 root fq_codel
- 2022-08-25 krowe: Tracy cahnged the following sysctl options on na-arc-5 to match the other VM Hosts. Sadly it seems to have had no effect on wget performance. na-arc-1, na-arc-2, na-arc-4 are 32KB/s while na-arc-3 and na-arc-5 are 45MB/s.
- net.ipv4.conf.all.accept_redirects = 0
- net.ipv4.conf.all.forwarding = 1
- 2022-09-01: Tracy rebooted naasc-vs-5 which hosts na-arc-5 just in case this was necessary for the net.ipv4.conf.all.forwarding sysctl change to take effect. Sadly, no change in performance.
- Why does na-arc-5 still have net.ipv4.conf.all.accept_redirects = 1 even after a reboot while all the other na-arc nodes have this set to 0?
- 2022-09-06 krowe: probably because na-arc-5 didn't reboot when naasc-vs-5 rebooted. I expect it was suspended instead of rebooted. Yet natest-arc-3 and naascweb2-prod were rebooted. I just checked virt-manager and na-arc-5 is hosted by naasc-vs-5. Can we reboote na-arc-5?
- 2022-09-07 krowe: rebooted na-arc-5 and now net.ipv4.conf.all.accept_redirects = 0
- 2022-09-21 cfultz: Replaced the 10Gb network cable on naasc-vs-2. "the cable was nearly bent in half at the router".
People (not necessarily team members)
- K. Scott Rowe - Tiger Team Lead
- CJ Allen - sysadmin
- Tom Booth - programmer
- Liz Sharp - sysadmin
- Brian Mason - DRM Scientist
- Zhon Butcher - sysadmin
- Tracy Halstead - sysadmin
- Alvaro Aguirre - ALMA software
- Pat Murphy - CIS lead
- Rachel Rosen - previous ICT lead
- Laura Jenson - current ICT lead
- Catherine Vlahakis - Scientist
Answers
...
- ANSWER: The vnets for the VM guests are tied to the 10Gb/s NICs on the VM hosts not the 1Gb/s NICs.
...
- Each container creates a veth* interface.
...
Production docker swarm iperf tests measured in Gb/s.
...
na-arc-1
(naasc-vs-4)
...
na-arc-2
(naasc-vs-4)
...
na-arc-3
(naasc-vs-3)
...
na-arc-4
(naasc-vs-4)
...
na-arc-5
(naasc-vs-5)
...
na-arc-2
...
There is clearly something wrong with na-arc-3
...
- I am thinking it does not matter because it looks like the production docker swarm nodes use the 10Gb/s network which is on cv-nexus9k
...
...
2022-08-12 dhart: If you want all of your guest VMs to be on the same subnet as the VM host, then VLAN awareness isn't needed. However, in most cases we want the flexibility of being able to have VM guests on different networks (from one another and/or the VM host) so the VM host is configured with a trunk interface to the network to allow for any VLAN to be passed to the underlying VM guests housed on that VM host machine
- 2022-08-12 dhart: 10.2.97.x (and 10.2.96.x) = internal VLAN for servers (primarily) 10.2.99.x = internal VLAN for server management
- 10.2.120.x = internal VLAN for 10 GE connections
...
...
[root@naasc-vs-2 ~]# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.2.99.1 0.0.0.0 UG 0 0 0 eno1
10.2.99.0 0.0.0.0 255.255.255.0 U 0 0 0 eno1
10.2.120.0 0.0.0.0 255.255.255.0 U 0 0 0 ens1f0np0.120
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ens1f0np0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ens1f0np0.120
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br97
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br101
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0- 2022-09-28 krowe: APIPA routes are created via /etc/sysconfig/network-scripts/ifup-eth which is installed from the network-scripts RPM. This RPM is legacy for RHEL8 (naasc-vs-2 is RHEL8.6) and must have been installed specificly. It is not installed on any other RHEL8 machine I have checked.
...
- 2022-09-28 krowe: No. When thalstea replaced the card with an old SFP9020 card from cv-vs-1, the machine would not boot. So the original SFC9022 is back in naasc-vs-2. See ticket https://support.nrao.edu/show-ticket.php?ticketid=145153 for deatils.
...
- wget --no-check-certificate http://na-arc-6.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1284_Xc9b.spt2349-56_sci.spw19.cube.I.pbcor.fits
--2022-09-15 10:22:32-- http://na-arc-6.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1284_Xc9b.spt2349-56_sci.spw19.cube.I.pbcor.fits
Resolving na-arc-6.cv.nrao.edu (na-arc-6.cv.nrao.edu)... 10.2.97.76
Connecting to na-arc-6.cv.nrao.edu (na-arc-6.cv.nrao.edu)|10.2.97.76|:8088... failed: Connection timed out. - 2022-09-29 krowe: Apparently docker just needed to be restarted on na-arc-6. Now I can download files via wget at the same rate using na-arc-6 as other na-arc nodes.
...
[root@na-arc-5 ~]# traceroute nangas13
traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
1 cv-6509-vlan97.cv.nrao.edu (10.2.97.1) 0.426 ms 0.465 ms 0.523 ms
2 cv-6509.cv.nrao.edu (10.2.254.5) 0.297 ms 0.277 ms 0.266 ms
3 nangas13.cv.nrao.edu (10.2.140.33) 0.197 ms 0.144 ms 0.109 ms[root@natest-arc-1 ~]# traceroute nangas13
traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
1 cv-6509-vlan96.cv.nrao.edu (10.2.96.1) 0.459 ms 0.427 ms 0.402 ms
2 nangas13.cv.nrao.edu (10.2.140.33) 0.184 ms 0.336 ms 0.311 ms- Derek wrote that 10.2.99.1 = CV-NEXUS and 10.2.96.1 = CV-6509
- [1,2,4]? How is this possible if the default interface on the respective VM Hosts is 1Gb/s?
- ANSWER: The vnets for the VM guests are tied to the 10Gb/s NICs on the VM hosts not the 1Gb/s NICs.
- Why do natest-arc-{1..3} have 9 veth* interfaces in ip addr show while na-arc-{1..5} don't have any veth* interfaces?
- Each container creates a veth* interface.
- Why does na-arc-3 have such poor network performance to the other na-arc nodes?
- ping na-arc-[1,2,4,5] with anything larger than -s 1490 drops all packets
- iperf tests show 10Gb/s between the VM host of na-arc-3 (naasc-vs-3 p5p1.120) and the VM host of na-arc-5 (naasc-vs-5 p2p1.120). So it isn't a bad card in either of the VM hosts.
- iptables on na-arc-3 looks different than iptables on na-arc-[2,3,5]. na-arc-1 also looks a bit different.
- docker_gwbridge interface on na-arc-[1,2,4,5] shows NO_CARRIER but not on na-arc-3.
- na-arc-3 has a veth10fd1da@if37 interface. None of the other na-arc-* nodes have a veth interface.
Production docker swarm iperf tests measured in Gb/s.
na-arc-1
(naasc-vs-4)
na-arc-2
(naasc-vs-4)
na-arc-3
(naasc-vs-3)
na-arc-4
(naasc-vs-4)
na-arc-5
(naasc-vs-5)
na-arc-1 18 0.002 20 10 na-arc-2
20 0.002 20 10 na-arc-3 0.002 0.002 0.002 0.002 na-arc-4 20 19 0.002 na-arc-5 10 10 0.002 10 10 There is clearly something wrong with na-arc-3
- ANSWER: Since there were so many problems with na-arc-3, it was decided to recreate it. It was recreated from a clone of na-arc-2.
- Is putting all the 1Gb/s production docker swarm nodes on the same ASIC on the same Fabric Extender of the cv-nexus switch a good idea?
- I am thinking it does not matter because it looks like the production docker swarm nodes use the 10Gb/s network which is on cv-nexus9k
- Can we set up a test archive query that uses the "other" docker swarm which in this case would be the production swarm (na-arc-*)?
- Why are there VLANs on the VM hosts. e.g. em1.97 on naasc-vs-4?
2022-08-12 dhart: If you want all of your guest VMs to be on the same subnet as the VM host, then VLAN awareness isn't needed. However, in most cases we want the flexibility of being able to have VM guests on different networks (from one another and/or the VM host) so the VM host is configured with a trunk interface to the network to allow for any VLAN to be passed to the underlying VM guests housed on that VM host machine
- 2022-08-12 dhart: 10.2.97.x (and 10.2.96.x) = internal VLAN for servers (primarily) 10.2.99.x = internal VLAN for server management
- 10.2.120.x = internal VLAN for 10 GE connections
- Where is the main docker config (yaml file)?
- 2022-09-20 krowe: Why does naasc-vs-2 have APIPA configured networks (169.254.0.0)? Aren't these usually created only if there are misconfigured network(s)?
[root@naasc-vs-2 ~]# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.2.99.1 0.0.0.0 UG 0 0 0 eno1
10.2.99.0 0.0.0.0 255.255.255.0 U 0 0 0 eno1
10.2.120.0 0.0.0.0 255.255.255.0 U 0 0 0 ens1f0np0.120
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ens1f0np0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ens1f0np0.120
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br97
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br101
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0- 2022-09-28 krowe: APIPA routes are created via /etc/sysconfig/network-scripts/ifup-eth which is installed from the network-scripts RPM. This RPM is legacy for RHEL8 (naasc-vs-2 is RHEL8.6) and must have been installed specificly. It is not installed on any other RHEL8 machine I have checked.
- 2022-09-26 krowe: Can an older solarflare card (Solarflare Communications SFC9020) replace the card in naasc-vs-2 to see if that helps with the TCP Retransmissions?
- 2022-09-28 krowe: No. When thalstea replaced the card with an old SFP9020 card from cv-vs-1, the machine would not boot. So the original SFC9022 is back in naasc-vs-2. See ticket https://support.nrao.edu/show-ticket.php?ticketid=145153 for deatils.
- Why can't I download via na-arc-6? I don't think it is properly setup yet.
- wget --no-check-certificate http://na-arc-6.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1284_Xc9b.spt2349-56_sci.spw19.cube.I.pbcor.fits
--2022-09-15 10:22:32-- http://na-arc-6.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1284_Xc9b.spt2349-56_sci.spw19.cube.I.pbcor.fits
Resolving na-arc-6.cv.nrao.edu (na-arc-6.cv.nrao.edu)... 10.2.97.76
Connecting to na-arc-6.cv.nrao.edu (na-arc-6.cv.nrao.edu)|10.2.97.76|:8088... failed: Connection timed out. - 2022-09-29 krowe: Apparently docker just needed to be restarted on na-arc-6. Now I can download files via wget at the same rate using na-arc-6 as other na-arc nodes.
- wget --no-check-certificate http://na-arc-6.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1284_Xc9b.spt2349-56_sci.spw19.cube.I.pbcor.fits
- Why do I see cv-6509 when tracerouting from na-arc-5 to nangas13 but not on natest-arc-1
[root@na-arc-5 ~]# traceroute nangas13
traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
1 cv-6509-vlan97.cv.nrao.edu (10.2.97.1) 0.426 ms 0.465 ms 0.523 ms
2 cv-6509.cv.nrao.edu (10.2.254.5) 0.297 ms 0.277 ms 0.266 ms
3 nangas13.cv.nrao.edu (10.2.140.33) 0.197 ms 0.144 ms 0.109 ms[root@natest-arc-1 ~]# traceroute nangas13
traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
1 cv-6509-vlan96.cv.nrao.edu (10.2.96.1) 0.459 ms 0.427 ms 0.402 ms
2 nangas13.cv.nrao.edu (10.2.140.33) 0.184 ms 0.336 ms 0.311 ms- Derek wrote that 10.2.99.1 = CV-NEXUS and 10.2.96.1 = CV-6509
Done
- Recreate na-arc-3 so it gets the same performance as other na-arc-* nodes which is apparently at least 10Gb/s. (pmurphy)
- 2022-08-11: cloned na-arc-2 and moved the clone to naasc-vs-3 (zbutcher)
- 2022-08-11: moved old na-arc-3 to na-arc-3-OLD (thalstea)
- 2022-08-11: Renamed the clone to na-arc-3. We connected it to the swarm successfully, but it had a low connection speed.
- 2022-08-11: Changed the model of na-arc-3's vnet5 interface on naasc-vs-3 from rtl8139 to virtio to match all the other na-arc-* nodes. Performance was still poor.
- 2022-08-11: Changed the MTU of na-arc-3 eth0 to 1500. This is different than all the other na-arc-* nodes but it was either that or change the p5p1.120 and br97 on naasc-vs-3 from 9000 to 1500 which my have impacted other VM guests on that host. Performance was now reasonable. 7Gb/s. I was expecting about 9Gb/s but perhaps the 1500 MTU is affecting performance.
- 2022-08-11: Joined na-arc-3 to the swarm and started services (sbooth)
- Launch services on production swarm (sbooth)
- 2022-08-11: Joined na-arc-3 to the swarm and started services (sbooth)
- Test the production docker swarm with a test web interface. (lsharp)
- 2022-08-12: http://almaportal.cv.nrao.edu/
- 2022-08-12 krowe: ran tcpdump on all five na-arc-{1..5} nodes tcpdump dst almaportal and then downloaded a datafile wget --no-check-certificate https://almaportal.cv.nrao.edu/dataPortal/2013.1.00226.S_uid___A001_X122_X1f1_001_of_001.tar and with each execution of the wget, I could see the next na-arc host report the traffic. This is because the web proxy on almaportal will select the next na-arc node via round-robin. All five nodes were providing about 6KB/s speeds to cvpost-master.
- 2022-08-12 krowe: I did iperf tests from host to host in the entire chain (nangas14 -> na-arc-{1..5} -> almaportal -> cvpost-master) and each step the performance was at least 900Mb/s yet downloading with wget was about 0.06Mb/s.
- Ask other ARC if they use MTU 9000 on 10Gb. (krowe)
- JAO uses MTU of 1500
- ESO uses two VM hosts running VMware with 10Gb/s and MTU of 1500
- 2022-08-17 krowe: Changed eth0 on na-arc-5 from qdisc pfifo_fast to qdisc fq_codel to match all the other na-arc and natest-arc nodes. This seemed to have no affect on performance.
- tc qdisc replace dev eth0 root fq_codel
- 2022-08-25 krowe: Tracy cahnged the following sysctl options on na-arc-5 to match the other VM Hosts. Sadly it seems to have had no effect on wget performance. na-arc-1, na-arc-2, na-arc-4 are 32KB/s while na-arc-3 and na-arc-5 are 45MB/s.
- net.ipv4.conf.all.accept_redirects = 0
- net.ipv4.conf.all.forwarding = 1
- 2022-09-01: Tracy rebooted naasc-vs-5 which hosts na-arc-5 just in case this was necessary for the net.ipv4.conf.all.forwarding sysctl change to take effect. Sadly, no change in performance.
- Why does na-arc-5 still have net.ipv4.conf.all.accept_redirects = 1 even after a reboot while all the other na-arc nodes have this set to 0?
- 2022-09-06 krowe: probably because na-arc-5 didn't reboot when naasc-vs-5 rebooted. I expect it was suspended instead of rebooted. Yet natest-arc-3 and naascweb2-prod were rebooted. I just checked virt-manager and na-arc-5 is hosted by naasc-vs-5. Can we reboote na-arc-5?
- 2022-09-07 krowe: rebooted na-arc-5 and now net.ipv4.conf.all.accept_redirects = 0
- 2022-09-21 cfultz: Replaced the 10Gb network cable on naasc-vs-2. "the cable was nearly bent in half at the router".
Conclusions
NAASC Archive Stabilization Solutions
People (not necessarily team members)
- K. Scott Rowe - Tiger Team Lead
- CJ Allen - sysadmin
- Tom Booth - programmer
- Liz Sharp - sysadmin
- Brian Mason - DRM Scientist
- Zhon Butcher - sysadmin
- Tracy Halstead - sysadmin
- Alvaro Aguirre - ALMA software
- Pat Murphy - CIS lead
- Rachel Rosen - previous ICT lead
- Laura Jenson - current ICT lead
- Catherine Vlahakis - Scientist
Conclusions
...
References
- Prepare offline infrastructure from the scratch (Describes docker swarm setup)
- file:///tmp/ALMA%20Offline%20Software%20Test_Deployment%20Concept(2).pdf