Page History

...

2022-09-02 krowe: sysctl -a | grep br97 accross naasc-vs-{3..5} are identical.
2022-09-02 krowe: sysctl -a | grep <vnet> accross naasc-vs-{3..5} are identical except for the vnet name (e.g. vnet2, vnet4, etc)
2022-09-02 krowe: sysctl -a | grep <10Gb NIC> accross naasc-vs-{3..5} are different
- naasc-vs-3 is identical to naasc-bs-5 except for the name of the NIC (e.g. p5p1, p2p1).
- naasc-vs-4 has entries for VLANs 101 and 140 while naasc-vs-3 and naasc-vs-5 have entries for VLANs 192 and 96.
2022-09-02 krowe: sysctl -a on naasc-vs-3 and naasc-vs-5 have no significant differences.
2022-09-02 krowe: sysctl -a on naasc-vs-4 and naasc-vs-5 and found many questionable differences
- naasc-vs-4: net.iw_cm.default_backlog = 256
  - Is this because the IB modules are loaded?
- naasc-vs-4: net.rdma_ucm.max_backlog = 1024
  - Is this because the IB modules are loaded?
- naasc-vs-4: sunrpc.rdma*
  - Is this because the IB modules are loaded?
- naasc-vs-4: net.netfilter.nf_log.2 = nfnetlink_log
2022-09-06 krowe: CV-NEXUS switch port capibilities for naasc-vs-{3..5} are identical.
2022-09-06 krowe: CV-NEXUS9K switch port capibilities for naasc-vs-{3..5} are identical.
- Though the recorded output rate of naasc-vs-5 is about 500 Mb/s while naasc-vs-{3..4} is about 300Kb/s.
- And the recorded input rate of naasc-vs-5 is about 500 Mb/s while naasc-vs{3..4} is about 5 Mb/s.
- This is very strange as it seemed naasc-vs-5 was the limiting factor but the switch ports suggest not. Perhaps this data rate is caused by other VM guests on naasc-vs-5 (helpdesk-prod, naascweb2-prod, cartaweb-prod, natest-arc-3, cobweb2-dev)
2022-09-06 krowe: ethtool -k <NIC> for naasc-vs-3 and naasc-vs-5 are identical execpt for the NIC name
2022-09-06 krowe: ethtool -k <NIC> for naasc-vs-3 and naasc-vs-4 are very different.
- hw-tc-offload: off vs hw-tc-offload: on
- rx-gro-hw: off vs rx-gro-hw: on
- rx-vlan-offload: off vs rx-vlan-offload: on
- rx-vlan-stag-hw-parse: off vs rx-vlan-stag-hw-parse: on
- tcp-segmentation-offload: off vs tcp-segmentation-offload: on
- tx-gre-csum-segmentation: off vs tx-gre-csum-segmentation: on
- tx-gre-segmentation: off vs tx-gre-segmentation: on
- tx-gso-partial: off vs x-gso-partial: on
- tx-ipip-segmentation: off vs tx-ipip-segmentation: on
- tx-sit-segmentation: off vs tx-sit-segmentation: on
- tx-tcp-segmentation: off vs tx-tcp-segmentation: on
- tx-udp_tnl-csum-segmentation: off vs tx-udp_tnl-csum-segmentation: on
- tx-udp_tnl-segmentation: off vs tx-udp_tnl-segmentation: on
- tx-vlan-offload: off vs tx-vlan-offload: on
- tx-vlan-stag-hw-insert: off vs tx-vlan-stag-hw-insert: on

na-arc-1,2,3,4,5

Diagrams

...

Where is the main docker config (yaml file)?
Why does naasc-vs-4 have all the infiniband modules loaded? I don't see an IB card. naasc-vs-1 and naasc-dev-vs also have some IB modules loaded.
Why is nfnetlink logging enabled on naasc-vs-4? You can see this with cat /proc/net/netfilter/nf_log and lsmod|grep -i nfnet
Why does na-arc-5 still have net.ipv4.conf.all.accept_redirects = 1 even after a reboot while all the other na-arc nodes have this set to 0?Why does the httpd container have eth0(10.0.0.8). This is the ingress network. I don't see any other conrainter with an interface on 10.0.0.0/24..ipv4.conf.all.accept_redirects = 1 even after a reboot while all the other na-arc nodes have this set to 0?
Here are some diffs in sysctl on na-arc nodes. I tried changed na-arc-4 and na-arc-5 to match the others but performance was the same. I then changed all the nodes to match na-arc-{1..3} and still no change in performance. I still don't understand how na-arc-{4..5} got different setttings. I did find that there is another directory for sysctl settings in /usr/lib/sysctl.d but that isn't why these are different.
- na-arc-1, na-arc-2, na-arc-3, natest-arc-1, natest-arc-2, natest-arc-3
  - net.bridge.bridge-nf-call-arptables = 0
    net.bridge.bridge-nf-call-ip6tables = 0
    net.bridge.bridge-nf-call-iptables = 1
- na-arc-4, na-arc-5
  - net.bridge.bridge-nf-call-arptables = 1
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables = 1

I see sysctl differences between the natest-arc servers and the na-arc servers. Here is a diff of /etc/sysctl.d/99-nrao.conf on natest-arc-1 and na-arc-5

< #net.ipv4.tcp_tw_recycle = 1
---
> net.ipv4.tcp_tw_recycle = 1
22,39d21
< net.ipv4.conf.all.accept_redirects=0
< net.ipv4.conf.default.accept_redirects=0
< net.ipv4.conf.all.secure_redirects=0
< net.ipv4.conf.default.secure_redirects=0
<
< #net.ipv6.conf.all.disable_ipv6 = 1
< #net.ipv6.conf.default.disable_ipv6 = 1
<
< # Mellanox recommends the following
< net.ipv4.tcp_timestamps = 0
< net.core.netdev_max_backlog = 250000
<
< net.core.rmem_default = 16777216
< net.core.wmem_default = 16777216
< net.core.optmem_max = 16777216
< net.ipv4.tcp_mem = 16777216 16777216 16777216
< net.ipv4.tcp_low_latency = 1

If I set net.ipv4.tcp_timestamps = 0 on na-arc-5, the wget download drops to nothing (--.-KB/s).
If I set all the above sysctl options, execpt net.ipv4.tcp_timestamps, on all five na-arc nodes, wget download performance doesn't change. It is still about 32KB/s. Also I still zeeo ZeroWindow packets.
Try rebooting VMs after making changes?

I see ZeroWindow packets sent from na-arc-5 to nangas13 while downloading a file from nangas13 using wget. This is na-arc-5 telling nangas13 to wiat because its network buffer is full.
- Is this because of qdisc pfifo_fast? No. krowe changed eth0 to *qdisc fq_codel* and still seeing ZeroWait packets.
- Now that I have moved the rh_download to na-arc-1 and put httpd on na-arc-5 I no longer see ZeroWindow packets on na-arc-5. But I am seeing them on na-arc-1 which is where the rh_downloader is now. Is this because the rh_downloader is being stalled talking to something else like httpd and therefore telling nangas13 to wait?
Why does almaportal use ens3 while almascience uses eth0?
What if we move the rh-downloader container to a different node? In fact walk it through all five nodes and test.

Why do I see cv-6509 when tracerouting from na-arc-5 to nangas13 but not on natest-arc-1

[root@na-arc-5 ~]# traceroute nangas13
traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
 1  cv-6509-vlan97.cv.nrao.edu (10.2.97.1)  0.426 ms  0.465 ms  0.523 ms
 2  cv-6509.cv.nrao.edu (10.2.254.5)  0.297 ms  0.277 ms  0.266 ms
 3  nangas13.cv.nrao.edu (10.2.140.33)  0.197 ms  0.144 ms  0.109 ms

[root@natest-arc-1 ~]# traceroute nangas13
traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
 1  cv-6509-vlan96.cv.nrao.edu (10.2.96.1)  0.459 ms  0.427 ms  0.402 ms
 2  nangas13.cv.nrao.edu (10.2.140.33)  0.184 ms  0.336 ms  0.311 ms

Derek wrote that 10.2.99.1 = CV-NEXUS and 10.2.96.1 = CV-6509

Why does natest-arc-3 have ens3 instead of eth0 and why is its speed 100Mb/s?
- virsh domiflist natest-arc-3 shows the Model as rtl8139 instead of virtio
- When I run ethtool eth0 on nar-arc-{1..5} natest-arc-{1..2} as root, the result is just Link detected: yes instead of the full report with speed while na-arc-3 shows 100Mb/s.
Why do iperf tests from natest-arc-1 and natest-arc-2 to natest-arc-3 get about half the performance (0.5Gb/s) expected especially when the reverse tests get expected performance (0.9Gb/s).
Is putting the production swarm nodes (na-arc-*) on the 10Gb/s network a good idea? Sure it makes a fast connection to cvsan but it adds one more hop to the nangas servers (e.g. na-arc-1 -> cv-nexus9k -> cv-nexus -> nangas11)
When I connect to the container acralmaprod001.azurecr.io/offline-production/rh-download:2022.06.01.2022jun I get errors like unknown user 1009 I get the same errors on the natest-arc-1 container.
Does it matter that the na-arc nodes are on 10.2.97.x, their VM host is on 10.2.99.x while the natest-arc nodes are on 10.2.96.x and their VM hosts (well 2 out of 3) are also on 10.2.96.x? Is this why I see cv-509.cv.nrao.edu when running traceroute from the na-arc nodes?
When running wget --no-check-certificate http://na-arc-3.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1358_Xd2.3C286_sci.spw31.cube.I.pbcor.fits I see traffic going through veth14ce034 on na-arc-3 but I can't find a container associated with that veth.
Why does the httpd container have eth0(10.0.0.8). This is the ingress network. I don't see any other conrainter with an interface on 10.0.0.0/24.

To Do

Done

Recreate na-arc-3 so it gets the same performance as other na-arc-* nodes which is apparently at least 10Gb/s. (pmurphy)
1. 2022-08-11: cloned na-arc-2 and moved the clone to naasc-vs-3 (zbutcher)
2. 2022-08-11: moved old na-arc-3 to na-arc-3-OLD (thalstea)
3. 2022-08-11: Renamed the clone to na-arc-3. We connected it to the swarm successfully, but it had a low connection speed.
4. 2022-08-11: Changed the model of na-arc-3's vnet5 interface on naasc-vs-3 from rtl8139 to virtio to match all the other na-arc-* nodes. Performance was still poor.
5. 2022-08-11: Changed the MTU of na-arc-3 eth0 to 1500. This is different than all the other na-arc-* nodes but it was either that or change the p5p1.120 and br97 on naasc-vs-3 from 9000 to 1500 which my have impacted other VM guests on that host. Performance was now reasonable. 7Gb/s. I was expecting about 9Gb/s but perhaps the 1500 MTU is affecting performance.
6. 2022-08-11: Joined na-arc-3 to the swarm and started services (sbooth)
Launch services on production swarm (sbooth)
1. 2022-08-11: Joined na-arc-3 to the swarm and started services (sbooth)
Test the production docker swarm with a test web interface. (lsharp)
1. 2022-08-12: http://almaportal.cv.nrao.edu/
2. 2022-08-12 krowe: ran tcpdump on all five na-arc-{1..5} nodes tcpdump dst almaportal and then downloaded a datafile wget --no-check-certificate https://almaportal.cv.nrao.edu/dataPortal/2013.1.00226.S_uid___A001_X122_X1f1_001_of_001.tar and with each execution of the wget, I could see the next na-arc host report the traffic. This is because the web proxy on almaportal will select the next na-arc node via round-robin. All five nodes were providing about 6KB/s speeds to cvpost-master.
3. 2022-08-12 krowe: I did iperf tests from host to host in the entire chain (nangas14 -> na-arc-{1..5} -> almaportal -> cvpost-master) and each step the performance was at least 900Mb/s yet downloading with wget was about 0.06Mb/s.
Ask other ARC if they use MTU 9000 on 10Gb. (krowe)
1. JAO uses MTU of 1500
2. ESO uses two VM hosts running VMware with 10Gb/s and MTU of 1500
2022-08-17 krowe: Changed eth0 on na-arc-5 from qdisc pfifo_fast to qdisc fq_codel to match all the other na-arc and natest-arc nodes. This seemed to have no affect on performance.
- tc qdisc replace dev eth0 root fq_codel
2022-08-25 krowe: Tracy cahnged the following sysctl options on na-arc-5 to match the other VM Hosts. Sadly it seems to have had no effect on wget performance. na-arc-1, na-arc-2, na-arc-4 are 32KB/s while na-arc-3 and na-arc-5 are 45MB/s.
- net.ipv4.conf.all.accept_redirects = 0
- net.ipv4.conf.all.forwarding = 1
2022-09-01: Tracy rebooted naasc-vs-5 which hosts na-arc-5 just in case this was necessary for the net.ipv4.conf.all.forwarding sysctl change to take effect. Sadly, no change in performance.

...

Space shortcuts

Page tree

Versions Compared

Old Version 166

New Version 167

Key

na-arc-1,2,3,4,5

Diagrams

To Do

Done