Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Using Apache Benchmarks every hour to load http://almascience.nrao.edu/ on rastan.aoc.nrao.edu
    • ssh.aoc.nrao.edu:/users/krowe/alma_archive/benchmarks/almascience.nrao.edu/data (times are in milliseconds)
      • Mode load time is 98ms
    • ssh.aoc.nrao.edu:/users/krowe/alma_archive/benchmarks/almaportal.cv.nrao.edu/data (times are in milliseconds)
      • Mode load time is 123ms
  • Using wget to get 2013.1.00226.S-small (about 700MB) every hour on cvpost-master.aoc.nrao.edu
    • ssh.cv.nrao.edu:/lustre/cv/users/krowe/tickets/scg-207/benchmarks/almascience.nrao.edu/2013.1.00226.S-small
      • 2022-08-16: average time to download is about 42 seconds which is about 16MB/s
  • iperf tests using iperf3 -s -B <local IP> and  iperf3 -B <local IP> -c <dest IP>
  • roduction docker swarm iperf tests measured in Gb/s.

    2022-08-11: After re-creating na-arc-3 (a clone of na-arc-2).  Also set the MTU to 1500.  The VM Host interfaces (p5p1.97 and br97 on naasc-vs-3) were still 1500 so we changed the interface on the VM guest (na-arc-3) to 1500 instead of changing the interfaces on the VM host to 9000 because there was concern that may interfere with other running VM guests on that host.


    na-arc-1

    (naasc-vs-4)

    na-arc-2

    (naasc-vs-4)

    na-arc-3

    (naasc-vs-3)

    na-arc-4

    (naasc-vs-4)

    na-arc-5

    (naasc-vs-5)

    na-arc-1
    1992110

    na-arc-2

    22
    92010
    na-arc-377
    77
    na-arc-421219
    10
    na-arc-5109810


    Test docker swarm iperf tests measured in Gb/s


    natest-arc-1

    (naasc-dev-vs)

    natest-arc-2

    (naasc-vs-1)

    natest-arc-3

    (naasc-vs-5)

    natest-arc-1
    0.90.9
    natest-arc-20.9
    0.9
    natest-arc-30.50.5

    The test docker swarm (natest-arc-*) are performing as expected.  The VM hosts have 1Gb/s links so getting 80% to 90% bandwidth is about as good as one can expect.

  • 2022-08-15 krowe: I had tcpdump running on each na-arc-{1..5} nodes watching for traffic from almaportal tcpdump dst almaportal.  Then I would run the following wget on cvpost-master.  The first execution would be shown by tcpdump on na-arc-1, the second execution on na-arc-2 and so forth.  This is because of the round-robin nature of the web proxy on almaportal and was a nice confirmation of that process.  However, each execution also downloaded at about 32KB/s (0.3Mb/s) after a minute or so of downloading, which is about 300 times slower than expected.  Using the test swarm (natest-arc-{1..3}) I can download the same file at about 10MB/s (100Mb/s). Also, I did not see any difference in performance across the five nodes which was also surprising given that one of the nodes runs the downloader container and the other four need to forward traffic to the one download container.
    • cvpost-master wget --no-check-certificate https://almaportal.cv.nrao.edu/dataPortal/2013.1.00226.S_uid___A001_X122_X1f1_001_of_001.tar
  • 2022-08-15 krowe: I ran iperf tests from end to end and don't see any unexpected performance.
    • [nangas11] -- ~900Mb/s --> [rh-download container on na-arc-5] -- ~8,000Mb/s --> [almaportal] -- ~900Mb/s --> [cvpost-master]
    • [nangas11] -- ~900Mb/s --> [na-arc-5] -- ~8,000Mb/s --> [almaportal] -- ~900Mb/s --> [cvpost-master]
  • 2022-08-17 krowe: doing scp tests of a 784MB file
    • [root@rh-download-na-production-2022jun tmp]# scp krowe@nangas13:/NGAS1/volume1/afa/2022-08-17/1/member.uid___A001_X158f_X90c.IRAS_09022-3615_sci.spw29.cube.I.pb.fits.gz /tmp (93MB/s)
    • [root@rh-download-na-production-2022jun tmp]# scp member.uid___A001_X158f_X90c.IRAS_09022-3615_sci.spw29.cube.I.pb.fits.gz krowe@almaportal:/tmp (70MB/s)
    • almaportal krowe >scp /tmp/member.uid___A001_X158f_X90c.IRAS_09022-3615_sci.spw29.cube.I.pb.fits.gz krowe@cvpost-master:/tmp (110MB/s)
  • tcpdump bandwidth tests
    • When I download a file from na-arc-5 like so `wget --no-check-certificate http://na-arc-5.cv.nrao.edu:8088/dataPortal/member.uid___A001_X122_X1f1.LKCA_15_13CO_cube.image.fits` which lives on nangas13, to cvpost-master, the download runs at about 32KB/s.
      • On nangas13 I see about that much traffic (32KB/s to 50KB/s) almost all of it going to na-arc-5.
      • on na-arc-5 (rh-download container) I see between about 200KB/s and 300KB/s of traffic.
      • on na-arc-2 (httpd container) I see between about 100KB/s and 150KB/s of traffic.  It seems like it is about half the traffic na-arc-5 sees.
  • 2022-08-19 krowe: For some reason, all the swarm services on na-arc-5 shutdown about 11am Central Aug. 18, 2022.  Now my wget tests are getting about 100MB/s and I tested this five times to walk through all four nodes.
    • na-arc-5 was running
      • acralmaprod001.azurecr.io/offline-production/asax-elasticsearch:2022.02.01.2022feb (now on na-arc-3)
      • acralmaprod001.azurecr.io/offline-production/asax-explorer:2022.04.01.2022apr (now on na-arc-2)
      • acralmaprod001.azurecr.io/offline-production/asax-ingestor:2022.06.01.2022jun (now on na-arc-3)
      • acralmaprod001.azurecr.io/offline-production/rh-download:2022.06.01.2022jun (now on na-arc-2)
      • acralmaprod001.azurecr.io/offline-production/rh-logging:2022.06.01.2022jun (now on na-arc-4)
    • Looks like na-arc-5 lost its heartbeat with the swarm.  This is around the time I learned that setting net.ipv4.tcp_timestamps = 0 makes wget performance drop to 0.0KB/s.  So it may have been me that caused na-arc-5 to loose its heartbeat with the swarm although setting net.ipv4.tcp_timestamps=0 on na-arc-5 again didn't cause docker to move the service off it this time.
      • Aug 18 13:34:16 na-arc-5 dockerd: time="2022-08-18T13:34:14.131474019-04:00" level=warning msg="memberlist: Refuting a suspect message (from: c30261b68826)"

        Aug 18 13:34:16 na-arc-5 dockerd: time="2022-08-18T13:34:15.929428007-04:00" level=info msg="memberlist: Suspect 886f1454e2b4 has failed, no acks received"

        Aug 18 13:34:16 na-arc-5 dockerd: time="2022-08-18T13:34:16.061224152-04:00" level=error msg="heartbeat to manager {xojanp58fu1ysx3yk0rpvjsft 10.2.97.71:2377} failed" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" method="(*session).heartbeat" module=node/agent node.id=1l5cnfmt16f6hyg5it0rq39rr session.id=fl2thh44rmjfxgu7xidnakjg8 sessionID=fl2thh44rmjfxgu7xidnakjg8

    • I moved the rh-download service back to na-arc-5 with docker service update --force production_requesthandler_download and wget performance is back to about 32KB/s.
    • I moved rh-download from na-arc-5 back to na-arc-2 by draining na-arc-5 docker node update --availability drain na-arc-5 and wget performance was back to about 100MB/s.  I ran it four times to make sure.
    • Then I moved rh-download from na-arc-2 to na-arc-1 by forcing a rebalance again with docker service update --force production_requesthandler_download.  This is because na-arc-5 was still drained.  wget performance was back to about 100MB/s.  I ran it four times to make sure.  I wanted to make sure the performance was good because rh-download wasn't on na-arc-5 and not because it was on na-arc-2.  I think I have shown that.  So, the question is why is performance so poor when rh-download is on na-arc-5?
    • I moved production_httpd from na-arc-2 to na-arc-5 and wget performance is vaiable.
      • wget --no-check-certificate http://na-arc-1.cv.nrao.edu:8088/dataPortal/member.uid___A001_X122_X1f1.LKCA_15_13CO_cube.image.fits is about 32KB/s
      • wget --no-check-certificate http://na-arc-2.cv.nrao.edu:8088/dataPortal/member.uid___A001_X122_X1f1.LKCA_15_13CO_cube.image.fits is about 32KB/s
      • wget --no-check-certificate http://na-arc-3.cv.nrao.edu:8088/dataPortal/member.uid___A001_X122_X1f1.LKCA_15_13CO_cube.image.fits is about 100MB/s
      • wget --no-check-certificate http://na-arc-4.cv.nrao.edu:8088/dataPortal/member.uid___A001_X122_X1f1.LKCA_15_13CO_cube.image.fits is about 32KB/s
      • wget --no-check-certificate http://na-arc-5.cv.nrao.edu:8088/dataPortal/member.uid___A001_X122_X1f1.LKCA_15_13CO_cube.image.fits is about 100MB/s
      • My first thought was this has something to do with naasc-vs-4 since na-arc-[1,2,4] are on that VM host.  But iperf tests still show about 900MB/s between all hosts and cvpost-master.
        • There are some differences in the VM hosts sysctl settings
          • naasc-vs-3
            • net.ipv4.conf.all.accept_redirects = 0

            • net.ipv4.conf.all.forwarding = 1
          • naasc-vs-4
            • net.ipv4.conf.all.accept_redirects = 0
            • net.ipv4.conf.all.forwarding = 1
          • naasc-vs-5
            • net.ipv4.conf.all.accept_redirects = 1
            • net.ipv4.conf.all.forwarding = 0
  • 2022-08-31 krowe: I have learned how to do iperf tests ingress network (docker mesh).
    • Login to an na-arc node and run the following to get a shell in the ingress_sbox namespace
      • nsenter --net=/var/run/docker/netns/ingress_sbox
    • You can now run ip -c addr to see the internal IPs of the ingress network
    • Below is a table using default iperf3 test.  The values are rounded for simplicity.  Bold indicates the node is receiving.

...

  • Where is the main docker config (yaml file)?
  • Why does the httpd container have eth0(10.0.0.8).  This is the ingress network.  I don't see any other conrainter with an interface on 10.0.0.0/24.
  • I see sysctl differences between the natest-arc servers and the na-arc servers.  Here is a diff of /etc/sysctl.d/99-nrao.conf on natest-arc-1 and na-arc-5
    • < #net.ipv4.tcp_tw_recycle = 1
      ---
      > net.ipv4.tcp_tw_recycle = 1
      22,39d21
      < net.ipv4.conf.all.accept_redirects=0
      < net.ipv4.conf.default.accept_redirects=0
      < net.ipv4.conf.all.secure_redirects=0
      < net.ipv4.conf.default.secure_redirects=0
      <
      < #net.ipv6.conf.all.disable_ipv6 = 1
      < #net.ipv6.conf.default.disable_ipv6 = 1
      <
      < # Mellanox recommends the following
      < net.ipv4.tcp_timestamps = 0
      < net.core.netdev_max_backlog = 250000
      <
      < net.core.rmem_default = 16777216
      < net.core.wmem_default = 16777216
      < net.core.optmem_max = 16777216
      < net.ipv4.tcp_mem = 16777216 16777216 16777216
      < net.ipv4.tcp_low_latency = 1
    • If I set net.ipv4.tcp_timestamps = 0 on na-arc-5, the wget download drops to nothing (--.-KB/s).

    • If I set all the above sysctl options, execpt net.ipv4.tcp_timestamps, on all five na-arc nodes, wget download performance doesn't change.  It is still about 32KB/s.  Also I still zeeo ZeroWindow packets.
    • Try rebooting VMs after making changes?
  • I see ZeroWindow packets sent from na-arc-5 to nangas13 while downloading a file from nangas13 using wget.  This is na-arc-5 telling nangas13 to wiat because its network buffer is full.
    • Is this because of qdisc pfifo_fast?  No.  krowe changed eth0 to *qdisc fq_codel* and still seeing ZeroWait packets.
    • Now that I have moved the rh_download to na-arc-1 and put httpd on na-arc-5 I no longer see ZeroWindow packets on na-arc-5.  But I am seeing them on na-arc-1 which is where the rh_downloader is now.  Is this because the rh_downloader is being stalled talking to something else like httpd and therefore telling nangas13 to wait?
  • Why does almaportal use ens3 while almascience uses eth0?
  • What if we move the rh-downloader container to a different node?  In fact walk it through all five nodes and test.
  • Why do I see cv-6509 when tracerouting from na-arc-5 to nangas13 but not on natest-arc-1
    • [root@na-arc-5 ~]# traceroute nangas13
      traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
       1  cv-6509-vlan97.cv.nrao.edu (10.2.97.1)  0.426 ms  0.465 ms  0.523 ms
       2  cv-6509.cv.nrao.edu (10.2.254.5)  0.297 ms  0.277 ms  0.266 ms
       3  nangas13.cv.nrao.edu (10.2.140.33)  0.197 ms  0.144 ms  0.109 ms
       
    • [root@natest-arc-1 ~]# traceroute nangas13
      traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
       1  cv-6509-vlan96.cv.nrao.edu (10.2.96.1)  0.459 ms  0.427 ms  0.402 ms
       2  nangas13.cv.nrao.edu (10.2.140.33)  0.184 ms  0.336 ms  0.311 ms
    • Derek wrote that 10.2.99.1 = CV-NEXUS and 10.2.96.1 = CV-6509
  • Why does natest-arc-3 have ens3 instead of eth0 and why is its speed 100Mb/s?
    • virsh domiflist natest-arc-3 shows the Model as rtl8139 instead of virtio
    • When I run ethtool eth0 on nar-arc-{1..5} natest-arc-{1..2} as root, the result is just Link detected: yes instead of the full report with speed while na-arc-3 shows 100Mb/s.
  • Why do iperf tests from natest-arc-1 and natest-arc-2 to natest-arc-3 get about half the performance (0.5Gb/s) expected especially when the reverse tests get expected performance (0.9Gb/s).
  • Is putting the production swarm nodes (na-arc-*) on the 10Gb/s network a good idea?  Sure it makes a fast connection to cvsan but it adds one more hop to the nangas servers (e.g. na-arc-1 -> cv-nexus9k -> cv-nexus -> nangas11)
  • When I connect to the container acralmaprod001.azurecr.io/offline-production/rh-download:2022.06.01.2022jun I get errors like unknown user 1009  I get the same errors on the natest-arc-1 container.
  • Does it matter that the na-arc nodes are on 10.2.97.x, their VM host is on 10.2.99.x while the natest-arc nodes are on 10.2.96.x and their VM hosts (well 2 out of 3) are also on 10.2.96.x?  Is this why I see cv-509.cv.nrao.edu when running traceroute from the na-arc nodes?
  • When running wget --no-check-certificate http://na-arc-3.cv.nrao.edu:8088/dataPortal/member.uid___A001_X1358_Xd2.3C286_sci.spw31.cube.I.pbcor.fits I see traffic going through veth14ce034 on na-arc-3 but I can't find a container associated with that veth.

...