...
- https://ictjira.alma.cl/browse/AES-52
- https://confluence.alma.cl/pages/viewpage.action?pageId=91826715
- You can see poor performance with a command like
- wget --no-check-certificate http://almaportal.cv.nrao.edu/dataPortal/member.uid___A001_X1358_Xd2.3C286_sci.spw31.cube.I.pbcor.fits
Diagrams
Timeline of events
- 2020-03-19: ALMA suspends science observing and stows the array because of COVID-19.
- 2020-06-24: Archive webapps (aq, asaz, rh, etc, but not SP) moved to new Docker Swarm (na-arc-*) system. See more.
- 2021-03-17: ALMA re-starts limited science observations, resuming Cycle 7. See more.
- 2021-10-01: ALMA starts Cycle 8 observations. See more.
- 2022-02-03: Science Portal (SP) upgraded Plone, Python, RHEL and moved into Docker Swarm. All other webapps had already been in Docker Swarm.
- 2022-04-18: First documented report of performance issues. Webapps moved to pre-production Docker Swarm (natest-arc-*). See more
- 2022-05-09: moved Science Portal (SP) from Docker Swarm to an rsync copy on http://almaportal.cv.nrao.edu/ for performance issues
- 2022-05-31: moved Science Portal (SP) from rsync copy back to Docker Swarm
- 2022-06-30: Tracy changed the eth0 MTU on the production docker swarm nodes (na-arc-*) from the default 1500 to 9000. The test swarm is still 1500.
- 2022-07-25: Jeff Kern asked K. Scott Rowe to head a tiger team to investigate the various issues that have affected the ALMA Archive.
- 2022-08-11: cloned na-arc-2 and moved the clone to naasc-vs-3 as na-arc-3 and change MTU to 1500. Other na-arc nodes are 9000 but changing na-arc-3 to 9000 would require changing naasc-vs-3 which could affect other, non-archive, VM guests.
- 2022-08-12: setup http://almaportal.cv.nrao.edu/ which uses the five na-arc nodes. This is for internal testing. Results show download speed at about 32KB/s regaurdless of which na-arc node the web proxy chooses.
- 2022-08-17 krowe: Changed eth0 on na-arc-5 from qdisc pfifo_fast to qdisc fq_codel to match all the other na-arc and natest-arc nodes. This seemed to have no affect on performance.
- tc qdisc replace dev eth0 root fq_codel
- 2022-08-19 krowe: For some reason, all the swarm services on na-arc-5 shutdown about 11am Central Aug. 18, 2022. Now my wget tests are getting about 100MB/s and I tested this five times to walk through all four nodes. I then moved the httpd to na-arc-5 and now na-arc-[1,2,4] download at ~32KB/s while na-arc-[3,5] download at ~100MB/s.
- 2022-08-25 krowe: Tracy cahnged the following sysctl options on naasc-vs-5 to match the other VM Hosts. Sadly it seems to have had no effect on wget performance. na-arc-1, na-arc-2, na-arc-4 are 32KB/s while na-arc-3 and na-arc-5 are 45MB/s.
- net.ipv4.conf.all.accept_redirects = 0
- net.ipv4.conf.all.forwarding = 1
- 2022-09-01: Tracy rebooted naasc-vs-5 which hosts na-arc-5 just in case this was necessary for the net.ipv4.conf.all.forwarding sysctl change to take effect. Sadly, no change in performance.
...