Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The reports appear to be (in order of confidence and severity):  VLASS calibration pipeline performance with CASA5 CASA-5 CASA-6, VLASS imaging slow down with CASA-5 vs CASA-5 months ago, particularly during major cycle,  ALMA cube imaging in NM vs CV ,  apocraphyl misc reports like DA plotms calls.  Each will be tracked in a sub page.

...

Lustre:  Tests vs NVME

OS version: CASA-5.6 on RHEL6 vs RHEL7

Kernel: Current vs New.

...

Which shows the heavy cost of the spin lock vs the small amount of casa CASA processing during that sample window.

...

  1. Modify CASA tclean() to not open cfcache in write mode.  Presumably it opens to write, if err it creates else continues.  It should test for existence, create and close if it doesn't exist and then open for read.
  2. In the meantime all SE cont jobs should utilize their own copy of the cfcache.  It would be good to have the executing scripts imply simply copy it in before starting casa CASA and delete it after CASA exits thus avoiding the contention without creating many copies of an other wise otherwise large directory (33GB or so). 

Step 2 successfully reduced the MDS load,.  below Below is a plot of the MDS load for the week of February 21 to 28th.  It shows the initial load reduction on the 26th Wed. (Feb. 26, 2020) when all VLASS jobs were converted to local cfcache,  the 27th .  Thu. (Feb. 27, 2020) shows a large spike while a pointed observation was test, the 28th tested and Fri. (Feb. 28, 2020) shows steady state after all awproject jobs were either converted to local cfcache or stopped.

...

To test this, we rebooted nmpost071 through nmpost090 back into RHEL6/Lustre-2.5.5 so that vlasstest jobs will run on them.  I have started 10 casa CASA jobs, each one on a node in this range, started about 20 minutes apart.  The system CPU usage on aocmds has not significantly changed because of these jobs which is inconclusive.  The proper test of this idea may just be upgrading the Lustre servers to 2.10.

...