You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 92 Next »

On Jul. 25, 2022 Jeff Kern asked K. Scott Rowe to head a tiger team to investigate the various issues that have affected the ALMA Archive hosted in CV for the past few weeks to months.  The team was initially just K. Scott.


Documented Issues


Timeline of events

  • 2020-03-19: ALMA suspends science observing and stows the array because of COVID-19.
  • 2020-06-24: Archive webapps (aq, asaz, rh, etc, but not SP) moved to new Docker Swarm (na-arc-*) system.  See more.
  • 2021-03-17: ALMA re-starts limited science observations, resuming Cycle 7.  See more.
  • 2021-10-01: ALMA starts Cycle 8 observations.  See more.
  • 2022-02-03: Science Portal (SP) upgraded Plone, Python, RHEL and moved into Docker Swarm.  All other webapps had already been in Docker Swarm.
  • 2022-04-18: First documented report of performance issues.  Webapps moved to pre-production Docker Swarm (natest-arc-*).  See more
  • 2022-05-09: moved Science Portal (SP) from Docker Swarm to an rsync copy on  http://almaportal.cv.nrao.edu/ for performance issues
  • 2022-05-31: moved Science Portal (SP) from rsync copy back to Docker Swarm
  • 2022-06-30: Tracy changed the eth0 MTU on the production docker swarm nodes (na-arc-*) from the default 1500 to 9000. The test swarm is still 1500.

Benchmarks

  • Using Apache Benchmarks every hour to load http://almascience.nrao.edu/ on rastan.aoc.nrao.edu
    • ssh.aoc.nrao.edu:/users/krowe/alma_archive/benchmarks/almascience.nrao.edu/data (times are in milliseconds)
    • ssh.aoc.nrao.edu:/users/krowe/alma_archive/benchmarks/almaportal.cv.nrao.edu/data (times are in milliseconds)
  • Using download script to get 2013.1.00226.S-small (no ASDM tarballs) every hour on cvpost-master.aoc.nrao.edu
    • ssh.cv.nrao.edu:/lustre/cv/users/krowe/tickets/scg-207/benchmarks/2013.1.00226.S-small
  • iperf tests using iperf3 -s -B <local IP> and  iperf3 -B <local IP> -c <dest IP>
  • 2022-08-15 krowe: I had tcpdump running on each na-arc-{1..5} nodes watching for traffic from almaportal tcpdump dst almaportal.  Then I would run the following wget on cvpost-master.  The first execution would be shown by tcpdump on na-arc-1, the second execution on na-arc-2 and so forth.  This is because of the round-robin nature of the web proxy on almaportal and was a nice confirmation of that process.  However, each execution also downloaded at about 32KB/s (0.3Mb/s) after a minute or so of downloading, which is about 300 times slower than expected.  Using the test swarm (natest-arc-{1..3}) I can download the same file at about 10MB/s (100Mb/s). Also, I did not see any difference in performance across the five nodes which was also surprising given that one of the nodes runs the downloader container and the other four need to forward traffic to the one download container.
    • cvpost-master wget --no-check-certificate https://almaportal.cv.nrao.edu/dataPortal/2013.1.00226.S_uid___A001_X122_X1f1_001_of_001.tar
  • 2022-08-15 krowe: I ran iperf tests from end to end and don't see any unexpected performance.
    • [nangas11] -- ~900Mb/s --> [rh-download container on na-arc-5] -- ~8,000Mb/s --> [almaportal] -- ~900Mb/s --> [cvpost-master]
    • [nangas11] -- ~900Mb/s --> [na-arc-5] -- ~8,000Mb/s --> [almaportal] -- ~900Mb/s --> [cvpost-master]


Table1

Production docker swarm iperf tests measured in Gb/s.

2022-08-11: After re-creating na-arc-3 (a clone of na-arc-2).  Also set the MTU to 1500.  The VM Host interfaces (p5p1.97 and br97 on naasc-vs-3) were still 1500 so we changed the interface on the VM guest (na-arc-3) to 1500 instead of changing the interfaces on the VM host to 9000 because there was concern that may interfere with other running VM guests on that host.


na-arc-1

(naasc-vs-4)

na-arc-2

(naasc-vs-4)

na-arc-3

(naasc-vs-3)

na-arc-4

(naasc-vs-4)

na-arc-5

(naasc-vs-5)

na-arc-1
1992110

na-arc-2

22
92010
na-arc-377
77
na-arc-421219
10
na-arc-5109810



Test docker swarm iperf tests measured in Gb/s


natest-arc-1

(naasc-dev-vs)

natest-arc-2

(naasc-vs-1)

natest-arc-3

(naasc-vs-5)

natest-arc-1
0.90.8
natest-arc-20.9
0.8
natest-arc-30.30.4

The test docker swarm (natest-arc-*) are performing as expected.  The VM hosts have 1Gb/s links so getting 80% to 90% bandwidth is about as good as one can expect.

Diagrams

Questions

  • Why is na-arc-5 using qdisc pfifo_fast for eth0 while all the other na-arc nodes are using qdisc fq_codel for eth0? (see ip addr)
    • na-arc-5 is running the rh-downloader service so its interface would affect all downloads.
    • Should we change it to match with something like tc qdisc replace dev eth0 root fq_codel
  • Is putting all the 1Gb/s production docker swarm nodes on the same ASIC on the same Fabric Extender of the cv-nexus switch a good idea?
    • I am thinking it does not matter because it looks like the production docker swarm nodes use the 10Gb/s network which is on cv-nexus9k
  • Why does natest-arc-3 have ens3 instead of eth0 and why is its speed 100Mb/s?
    • virsh domiflist natest-arc-3 shows the Model as rtl8139 instead of virtio
    • When I run ethtool eth0 on nar-arc-{1..5} natest-arc-{1..2} as root, the result is just Link detected: yes instead of the full report with speed while na-arc-3 shows 100Mb/s.
  • Can we set up a test archive query that uses the "other" docker swarm which in this case would be the production swarm (na-arc-*)?
  • Is putting the production swarm nodes (na-arc-*) on the 10Gb/s network a good idea?  Sure it makes a fast connection to cvsan but it adds one more hop to the nangas servers (e.g. na-arc-1 -> cv-nexus9k -> cv-nexus -> nangas14)
  • Why are there VLANs on the VM hosts.  e.g. em1.97 on naasc-vs-4?
    • 2022-08-12 dhart: If you want all of your guest VMs to be on the same subnet as the VM host, then VLAN awareness isn't needed.  However, in most cases we want the flexibility of being able to have VM guests on different networks (from one another and/or the VM host) so the VM host is configured with a trunk interface to the network to allow for any VLAN to be passed to the underlying VM guests housed on that VM host machine

    • 2022-08-12 dhart: 10.2.97.x (and 10.2.96.x) = internal VLAN for servers (primarily) 10.2.99.x = internal VLAN for server management 10.2.120.x = internal VLAN for 10 GE connections
  • When I connect to the container acralmaprod001.azurecr.io/offline-production/rh-download:2022.06.01.2022jun I get errors like unknown user 1009
  • Can we put 10Gb/s NICs in the nangas nodes?
  • Why does almaportal use ens3 while almascience uses eth0?

To Do

  1. Done: Recreate na-arc-3 so it gets the same performance as other na-arc-* nodes which is apparently at least 10Gb/s. (pmurphy)
    1. 2022-08-11: cloned na-arc-2 and moved the clone to naasc-vs-3 (zbutcher)
    2. 2022-08-11: moved old na-arc-3 to na-arc-3-OLD (thalstea)
    3. 2022-08-11: Renamed the clone to na-arc-3.  We connected it to the swarm successfully, but it had a low connection speed.
    4. 2022-08-11: Changed the model of  na-arc-3's vnet5 interface on naasc-vs-3 from rtl8139 to virtio to match all the other na-arc-* nodes.  Performance was still poor.
    5. 2022-08-11: Changed the MTU of na-arc-3 eth0 to 1500.  This is different than all the other na-arc-* nodes but it was either that or change the p5p1.120 and br97 on naasc-vs-3 from 9000 to 1500 which my have impacted other VM guests on that host.  Performance was now reasonable.  7Gb/s.  I was expecting about 9Gb/s but perhaps the 1500 MTU is affecting performance.
    6. 2022-08-11: Joined na-arc-3 to the swarm and started services (sbooth)
  2.  Done: Launch services on production swarm (sbooth)
    1. 2022-08-11: Joined na-arc-3 to the swarm and started services (sbooth)
  3. Test the production docker swarm with a test web interface. (lsharp)
    1. 2022-08-12: http://almaportal.cv.nrao.edu/
    2. 2022-08-12 krowe: rant tcpdump on all five na-arc-{1..5} nodes tcpdump dst almaportal and then downloaded a datafile wget --no-check-certificate https://almaportal.cv.nrao.edu/dataPortal/2013.1.00226.S_uid___A001_X122_X1f1_001_of_001.tar and with each execution of the wget, I could see the nex na-arc host report the traffic.  This is because the web proxy on almaportal will select the next na-arc node via round-robin.  All five nodes were providing about 6KB/s speeds to cvpost-master.
    3. 2022-08-12 krowe: I did iperf tests from host to host in the entire chain (nangas14 -> na-arc-{1..5} -> almaportal -> cvpost-master) and each step the performance was at least 900Mb/s yet downloading with wget was about 0.06Mb/s.
  4. Done: Ask other ARC if they use MTU 9000 on 10Gb. (krowe)
    1. JAO uses MTU of 1500
    2. ESO uses two VM hosts running VMware with 10Gb/s and MTU of 1500
  5. Switch the production docker swarm back to MTU 1500 since the test docker swarm uses MTU 1500 and is performing better?
  6. Fix natest-arc-3 so it's NIC Model is virtio instead of rtl8139
  7. Upgrade production swarm to meet ALMA requirements (16-core, 32GB)

People (not necessarily team members)

  • K. Scott Rowe - Tiger Team Lead
  • CJ Allen - sysadmin
  • Tom Booth - programmer
  • Liz Sharp - sysadmin
  • Brian Mason - DRM Scientist
  • Zhon Butcher - sysadmin
  • Tracy Halstead - sysadmin
  • Alvaro Aguirre - ALMA software
  • Pat Murphy - CIS lead
  • Rachel Rosen - previous ICT lead
  • Laura Jenson - current ICT lead
  • Catherine Vlahakis - Scientist


Communcation lines


Answers

  • Why does iperf show 10Gb/s between na-arc-5 and na-arc-[1,2,4]?  How is this possible if the default interface on the respective VM Hosts is 1Gb/s?
    • ANSWER: The vnets for the VM guests are tied to the 10Gb/s NICs on the VM hosts not the 1Gb/s NICs.
  • Why do natest-arc-{1..3} have 9 veth* interfaces in ip addr show while na-arc-{1..5} don't have any veth* interfaces?
    • Each container creates a veth* interface.
  • Why does na-arc-3 have such poor network performance to the other na-arc nodes?
    • ping na-arc-[1,2,4,5] with anything larger than -s 1490 drops all packets
    • iperf tests show 10Gb/s between the VM host of na-arc-3 (naasc-vs-3 p5p1.120) and the VM host of na-arc-5 (naasc-vs-5 p2p1.120).  So it isn't a bad card in either of the VM hosts.
    • iptables on na-arc-3 looks different than iptables on na-arc-[2,3,5].  na-arc-1 also looks a bit different.
    • docker_gwbridge interface on na-arc-[1,2,4,5] shows NO_CARRIER but not on na-arc-3.
    • na-arc-3 has a veth10fd1da@if37 interface.  None of the other na-arc-* nodes have a veth interface.
    • Production docker swarm iperf tests measured in Gb/s.


      na-arc-1

      (naasc-vs-4)

      na-arc-2

      (naasc-vs-4)

      na-arc-3

      (naasc-vs-3)

      na-arc-4

      (naasc-vs-4)

      na-arc-5

      (naasc-vs-5)

      na-arc-1
      180.0022010

      na-arc-2

      20
      0.0022010
      na-arc-30.0020.002
      0.0020.002
      na-arc-420190.002

      na-arc-510100.0021010

      There is clearly something wrong with na-arc-3

References

  • No labels