Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Identical

Differences


Questions

  • ifconfig shows dropped RX packes on all naasc-vs-* nodes.  Is that increasing still with time?  What is causing this?
  • It looks like device eno1 on naasc-vs-2 is configured via DHCP instead of STATIC.  Is that correct?
  • Why does naasc-vs-2 have APIPA configured networks (169.254.0.0)?  Aren't these usually created only if there are misconfigured network(s)?
    • [root@naasc-vs-2 ~]# netstat -nr
      Kernel IP routing table
      Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
      0.0.0.0         10.2.99.1       0.0.0.0         UG        0 0          0 eno1
      10.2.99.0       0.0.0.0         255.255.255.0   U         0 0          0 eno1
      10.2.120.0      0.0.0.0         255.255.255.0   U         0 0          0 ens1f0np0.120
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 ens1f0np0
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 ens1f0np0.120
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 br97
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 br101
      192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 virbr0
  • Why can't I download via na-arc-6?  I don't think it is properly setup yet.
    • wget --no-check-certificate http://na-arc-6.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1284_Xc9b.spt2349-56_sci.spw19.cube.I.pbcor.fits
      --2022-09-15 10:22:32--  http://na-arc-6.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1284_Xc9b.spt2349-56_sci.spw19.cube.I.pbcor.fits
      Resolving na-arc-6.cv.nrao.edu (na-arc-6.cv.nrao.edu)... 10.2.97.76
      Connecting to na-arc-6.cv.nrao.edu (na-arc-6.cv.nrao.edu)|10.2.97.76|:8088... failed: Connection timed out.
  • Why, with rx-gro-hw=off on naasc-vs-4, does na-arc-6 see so many retransmissions and small Congestion Window (Cwnd)?
    • [root@na-arc-6 ~]# iperf3 -B 10.0.0.16 -c 10.0.0.21
      Connecting to host 10.0.0.21, port 5201
      [  4] local 10.0.0.16 port 38534 connected to 10.0.0.21 port 5201
      [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
      [  4]   0.00-1.00   sec   302 MBytes  2.54 Gbits/sec  523    207 KBytes
      [  4]   1.00-2.00   sec   322 MBytes  2.70 Gbits/sec  596    186 KBytes
      [  4]   2.00-3.00   sec   312 MBytes  2.62 Gbits/sec  687    245 KBytes
      [  4]   3.00-4.00   sec   335 MBytes  2.81 Gbits/sec  638    278 KBytes
      [  4]   4.00-5.00   sec   309 MBytes  2.60 Gbits/sec  780    146 KBytes

    • [root@na-arc-3 ~]# iperf3 -B 10.0.0.19 -c 10.0.0.21
      Connecting to host 10.0.0.21, port 5201
      [  4] local 10.0.0.19 port 52986 connected to 10.0.0.21 port 5201
      [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
      [  4]   0.00-1.00   sec   309 MBytes  2.59 Gbits/sec  232    638 KBytes
      [  4]   1.00-2.00   sec   358 MBytes  3.00 Gbits/sec    0    967 KBytes
      [  4]   2.00-3.00   sec   351 MBytes  2.95 Gbits/sec    0   1.18 MBytes 
      [  4]   3.00-4.00   sec   339 MBytes  2.84 Gbits/sec   74   1.36 MBytes
      [  4]   4.00-5.00   sec   359 MBytes  3.01 Gbits/sec    0   1.54 MBytes
    • Actuqally the retransmissions seem to very quite a lot from one run to another. That is the more important question.  Also the throughput seems to vary as well from 1Gb/s to 4Gb/s.  Of course the more retransmissions the less throughput.  Granted this is a second order force and given that the nangas hosts have 1Gb/s links, probably won't be seen.  But if we ever put 10Gb/s cards in the nangas nodes we will see this and be sad.
  • Why does naasc-vs-3 have a br120 in state UNKNOWN?  none of the other naasc-vs nodes have a br120.
  • Why does naasc-vs-4 have all the infiniband modules loaded?  I don't see an IB card.  naasc-vs-1 and naasc-dev-vs also have some IB modules loaded but naasc-vs-3 and naasc-vs-5 don't have any IB modules loaded.
    • Tracy will look into this
  • Why is nfnetlink logging enabled on naasc-vs-4?  You can see this with cat /proc/net/netfilter/nf_log and lsmod|grep -i nfnet
    • nfnetlink is a module for packet mangling.  Could this interfear with the docker swarm networking?
  • why is the eth1 interfaces in all the containers and docker_gwbridge on na-arc-1 in the 172.18.x.x range while all the other na-arcs are in the 172.19.x.x range?  Does it matter?
  • Here are some diffs in sysctl on na-arc nodes.  I tried changed na-arc-4 and na-arc-5 to match the others but performance was the same.  I then changed all the nodes to match na-arc-{1..3} and still no change in performance.  I still don't understand how na-arc-{4..5} got different setttings.  I did find that there is another directory for sysctl settings in /usr/lib/sysctl.d but that isn't why these are different.
    • na-arc-1, na-arc-2, na-arc-3, natest-arc-1, natest-arc-2, natest-arc-3
      • net.bridge.bridge-nf-call-arptables = 0

        net.bridge.bridge-nf-call-ip6tables = 0

        net.bridge.bridge-nf-call-iptables = 1

    • na-arc-4, na-arc-5
      • net.bridge.bridge-nf-call-arptables = 1

        net.bridge.bridge-nf-call-ip6tables = 1

        net.bridge.bridge-nf-call-iptables = 1

  • I see sysctl differences between the natest-arc servers and the na-arc servers.  Here is a diff of /etc/sysctl.d/99-nrao.conf on natest-arc-1 and na-arc-5
    • < #net.ipv4.tcp_tw_recycle = 1
      ---
      > net.ipv4.tcp_tw_recycle = 1
      22,39d21
      < net.ipv4.conf.all.accept_redirects=0
      < net.ipv4.conf.default.accept_redirects=0
      < net.ipv4.conf.all.secure_redirects=0
      < net.ipv4.conf.default.secure_redirects=0
      <
      < #net.ipv6.conf.all.disable_ipv6 = 1
      < #net.ipv6.conf.default.disable_ipv6 = 1
      <
      < # Mellanox recommends the following
      < net.ipv4.tcp_timestamps = 0
      < net.core.netdev_max_backlog = 250000
      <
      < net.core.rmem_default = 16777216
      < net.core.wmem_default = 16777216
      < net.core.optmem_max = 16777216
      < net.ipv4.tcp_mem = 16777216 16777216 16777216
      < net.ipv4.tcp_low_latency = 1
    • If I set net.ipv4.tcp_timestamps = 0 on na-arc-5, the wget download drops to nothing (--.-KB/s).

    • If I set all the above sysctl options, execpt net.ipv4.tcp_timestamps, on all five na-arc nodes, wget download performance doesn't change.  It is still about 32KB/s.  Also I still zeeo ZeroWindow packets.
    • Try rebooting VMs after making changes?
  • I see ZeroWindow packets sent from na-arc-5 to nangas13 while downloading a file from nangas13 using wget.  This is na-arc-5 telling nangas13 to wiat because its network buffer is full.
    • Is this because of qdisc pfifo_fast?  No.  krowe changed eth0 to *qdisc fq_codel* and still seeing ZeroWait packets.
    • Now that I have moved the rh_download to na-arc-1 and put httpd on na-arc-5 I no longer see ZeroWindow packets on na-arc-5.  But I am seeing them on na-arc-1 which is where the rh_downloader is now.  Is this because the rh_downloader is being stalled talking to something else like httpd and therefore telling nangas13 to wait?
  • Why does almaportal use ens3 while almascience uses eth0?
  • What if we move the rh-downloader container to a different node?  In fact walk it through all five nodes and test.
  • Why do I see cv-6509 when tracerouting from na-arc-5 to nangas13 but not on natest-arc-1
    • [root@na-arc-5 ~]# traceroute nangas13
      traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
       1  cv-6509-vlan97.cv.nrao.edu (10.2.97.1)  0.426 ms  0.465 ms  0.523 ms
       2  cv-6509.cv.nrao.edu (10.2.254.5)  0.297 ms  0.277 ms  0.266 ms
       3  nangas13.cv.nrao.edu (10.2.140.33)  0.197 ms  0.144 ms  0.109 ms
       
    • [root@natest-arc-1 ~]# traceroute nangas13
      traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
       1  cv-6509-vlan96.cv.nrao.edu (10.2.96.1)  0.459 ms  0.427 ms  0.402 ms
       2  nangas13.cv.nrao.edu (10.2.140.33)  0.184 ms  0.336 ms  0.311 ms
    • Derek wrote that 10.2.99.1 = CV-NEXUS and 10.2.96.1 = CV-6509
  • Why does natest-arc-3 have ens3 instead of eth0 and why is its speed 100Mb/s?
    • virsh domiflist natest-arc-3 shows the Model as rtl8139 instead of virtio
    • When I run ethtool eth0 on nar-arc-{1..5} natest-arc-{1..2} as root, the result is just Link detected: yes instead of the full report with speed while na-arc-3 shows 100Mb/s.
  • Why do iperf tests from natest-arc-1 and natest-arc-2 to natest-arc-3 get about half the performance (0.5Gb/s) expected especially when the reverse tests get expected performance (0.9Gb/s).
  • Is putting the production swarm nodes (na-arc-*) on the 10Gb/s network a good idea?  Sure it makes a fast connection to cvsan but it adds one more hop to the nangas servers (e.g. na-arc-1 -> cv-nexus9k -> cv-nexus -> nangas11)
  • When I connect to the container acralmaprod001.azurecr.io/offline-production/rh-download:2022.06.01.2022jun I get errors like unknown user 1009  I get the same errors on the natest-arc-1 container.
  • Does it matter that the na-arc nodes are on 10.2.97.x, their VM host is on 10.2.99.x while the natest-arc nodes are on 10.2.96.x and their VM hosts (well 2 out of 3) are also on 10.2.96.x?  Is this why I see cv-509.cv.nrao.edu when running traceroute from the na-arc nodes?
  • When running wget --no-check-certificate http://na-arc-3.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1358_Xd2.3C286_sci.spw31.cube.I.pbcor.fits I see traffic going through veth14ce034 on na-arc-3 but I can't find a container associated with that veth.
  • Why does the httpd container have eth0(10.0.0.8).  This is the ingress network.  I don't see any other conrainter with an interface on 10.0.0.0/24.
  • Do we want to use jumbo frames?  If so, some recommend using mtu=8900 and there are a lot of places it needs to be set.

...