Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Using Apache Benchmarks every hour to load http://almascience.nrao.edu/ on rastan.aoc.nrao.edu
    • ssh.aoc.nrao.edu:/users/krowe/alma_archive/benchmarks/almascience.nrao.edu/data (times are in milliseconds)
      • Mode load time is 98ms
    • ssh.aoc.nrao.edu:/users/krowe/alma_archive/benchmarks/almaportal.cv.nrao.edu/data (times are in milliseconds)
      • Mode load time is 123ms
  • Using wget to get 2013.1.00226.S-small (about 700MB) every hour on cvpost-master.aoc.nrao.edu
    • ssh.cv.nrao.edu:/lustre/cv/users/krowe/tickets/scg-207/benchmarks/almascience.nrao.edu/2013.1.00226.S-small
      • 2022-08-16: average time to download is about 42 seconds which is about 16MB/s
  • iperf tests using iperf3 -s -B <local IP> and  iperf3 -B <local IP> -c <dest IP>
  • 2022-08-15 krowe: I had tcpdump running on each na-arc-{1..5} nodes watching for traffic from almaportal tcpdump dst almaportal.  Then I would run the following wget on cvpost-master.  The first execution would be shown by tcpdump on na-arc-1, the second execution on na-arc-2 and so forth.  This is because of the round-robin nature of the web proxy on almaportal and was a nice confirmation of that process.  However, each execution also downloaded at about 32KB/s (0.3Mb/s) after a minute or so of downloading, which is about 300 times slower than expected.  Using the test swarm (natest-arc-{1..3}) I can download the same file at about 10MB/s (100Mb/s). Also, I did not see any difference in performance across the five nodes which was also surprising given that one of the nodes runs the downloader container and the other four need to forward traffic to the one download container.
    • cvpost-master wget --no-check-certificate https://almaportal.cv.nrao.edu/dataPortal/2013.1.00226.S_uid___A001_X122_X1f1_001_of_001.tar
  • 2022-08-15 krowe: I ran iperf tests from end to end and don't see any unexpected performance.
    • [nangas11] -- ~900Mb/s --> [rh-download container on na-arc-5] -- ~8,000Mb/s --> [almaportal] -- ~900Mb/s --> [cvpost-master]
    • [nangas11] -- ~900Mb/s --> [na-arc-5] -- ~8,000Mb/s --> [almaportal] -- ~900Mb/s --> [cvpost-master]
  • 2022-08-17 krowe: doing scp tests of a 784MB file
    • [root@rh-download-na-production-2022jun tmp]# scp krowe@nangas13:/NGAS1/volume1/afa/2022-08-17/1/member.uid___A001_X158f_X90c.IRAS_09022-3615_sci.spw29.cube.I.pb.fits.gz /tmp (93MB/s)
    • [root@rh-download-na-production-2022jun tmp]# scp member.uid___A001_X158f_X90c.IRAS_09022-3615_sci.spw29.cube.I.pb.fits.gz krowe@almaportal:/tmp (70MB/s)
    • almaportal krowe >scp /tmp/member.uid___A001_X158f_X90c.IRAS_09022-3615_sci.spw29.cube.I.pb.fits.gz krowe@cvpost-master:/tmp (110MB/s)
  • tcpdump bandwidth tests
    • When I download a file from na-arc-5 like so `wget --no-check-certificate http://na-arc-5.cv.nrao.edu:8088/dataPortal/member.uid___A001_X122_X1f1.LKCA_15_13CO_cube.image.fits` which lives on nangas13, to cvpost-master, the download runs at about 32KB/s.
      • On nangas13 I see about that much traffic (32KB/s to 50KB/s) almost all of it going to na-arc-5.
      • on na-arc-5 (rh-download container) I see between about 200KB/s and 300KB/s of traffic.
      • on na-arc-2 (httpd container) I see between about 100KB/s and 150KB/s of traffic.  It seems like it is about half the traffic na-arc-5 sees.
  • 2022-08-19 krowe: For some reason, all the swarm services on na-arc-5 shutdown about 24 hours ago (which is around 11am Central Aug. 18, 2022).  And now my wget tests are getting about 100MB/s and I tested this five times to walk through all five nodes.
    • na-arc-5 was running
      • acralmaprod001.azurecr.io/offline-production/asax-elasticsearch:2022.02.01.2022feb (now on na-arc-3)
      • acralmaprod001.azurecr.io/offline-production/asax-explorer:2022.04.01.2022apr (now on na-arc-2)
      • acralmaprod001.azurecr.io/offline-production/asax-ingestor:2022.06.01.2022jun (now on na-arc-3)
      • acralmaprod001.azurecr.io/offline-production/rh-download:2022.06.01.2022jun (now on na-arc-2)
      • acralmaprod001.azurecr.io/offline-production/rh-logging:2022.06.01.2022jun (now on na-arc-4)
    • na-arc-5 didn't reboot.  It has been up for 29 days.


Table1

Production docker swarm iperf tests measured in Gb/s.

...