Page for tracking an apparently slow down w.r.t CASA-5 and CASA-6 for VLASS calibration: https://open-jira.nrao.edu/browse/PIPE-568
Comparing CASA-5 and CASA-6 (casa-pipeline-validation-8) across the two different CPUs available for batch processing in NM and CV shows that the newer CPUs (E5-2640v3) run a small calibration job (6.7GB) about 1.25 times faster than the old CPUs (E5-2670) with CASA-6 performing slower in every case. There was no significant run-time difference between NM and CV for similar hardware and software. Results are in minutes.
Here is the full pipeline script I have used for all of these tests casa_pipescript.py For some tests, I commented out all but hifv_importdata.
Full, serial pipeline with small dataset
RHEL7 - 6.7GB dataset with NM Lustre-2.5.5 (results are in minutes)
CASA | nmpost051 (E5-2640v3) | cvpost020 (E5-2640v3) | nmpost038 (E5-2670) | cvpost003 (E5-2670) |
---|---|---|---|---|
5 | 114, 117 | 110, 111 | 144, 143 | 140, 141 |
6 | 156*, 164* | 156*, 158* | 200*, 201* | 197*, 199* |
RHEL7 - 6.7GB dataset after NM upgrade Lustre-2.10.8 and CV results copied from last test (results are in minutes)
CASA | NM (E5-2640v3) | CV (E5-2640v3) | NM (E5-2670) | CV (E5-2670) |
---|---|---|---|---|
5 | 113, 110 | 110, 111 | 142, 141 | 140, 141 |
6 | 155* | 156*, 158* | 198* | 197*, 199* |
Mar. 3, 2020 krowe: I tried the nmpost051-casa6-rhel7 with the latest casa-pipeline-validation-17. The run-time was the same as were the tclean() errors.
"*" Means it completed with tclean() errors
Just serial hifv_importdata() with large dataset
RHEL7 - 350GB dataset with NM Lustre-2.5.5 (results are in minutes)
CASA | nmpost051 (E5-2640v3) | cvpost020 (E5-2640v3) | nmpost038 (E5-2670) | cvpost003 (E5-2670) |
---|---|---|---|---|
5 | 192 | 196 | 239 | 251 |
6 | 328 | 364, 378 | 411, 427 | 453 |
You can see that running just hifv_importdata() on a larger data set (350GB) shows that nmpost nodes run about 2% to 10% faster than similar cvpost nodes with CASA-6 performing slower in every case.
RHEL7 - 350GB dataset with NM Lustre-2.10.8 (results are in minutes)
CASA | nmpost051 (E5-2640v3) | cvpost020 (E5-2640v3) | nmpost048 (E5-2670) | cvpost003 (E5-2670) |
---|---|---|---|---|
5 | 187 | 196 | 244 | 251 |
6 | 364, 378 | 453 |
Full, serial pipeline with large dataset
RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)
CASA | NM (E5-2640v3) | CV (E5-2640v3) | NM (E5-2670) | CV (E5-2670) |
---|---|---|---|---|
5 | 3,045^ | 3,011^ | 3,431^ | 3,401^ |
6 | 3,640 | 3,466 | 4,511 | 4,392 |
"^" Means "SEVERE setjy No rows were selected"
Full, serial pipeline with large dataset and Profiling Metrics
RHEL7 - 350GB dataset with NM Lustre-2.10.x,CASA-pipeline-5.6.3-9 and CASA 6.0.0.23a100.dev17 (results are in minutes)
CASA | NM (E5-2640v3) | CV (E5-2640v3) | NM (E5-2670) | CV (E5-2670) |
---|---|---|---|---|
5 | 4,095^, 2,645^ | 4,559^ | 3,397^ | 3,410^ |
6 | 3,527 | 4,442 |
"^" Means "SEVERE setjy No rows were selected"
Current Pipeline Script
Mar. 17, 2020 I started using the same pipeline script that Brian is currently using.
Full, new, serial pipeline with small dataset
RHEL7 - 6.7GB dataset with NM Lustre-2.10.x (results are in minutes) I testing a CASA-6 job with and without cf.validate_parameters = False and both jobs took the same amount of time +/- 1 minute.
"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"
Full, new, serial pipeline with large dataset
RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)
CASA | NM (E5-2640v3) | CV (E5-2640v3) | NM (E5-2670) | CV (E5-2670) |
---|---|---|---|---|
5 | 3,350*^ | 3,362*^ | 4,605*^ | 4,480*^ |
6 | 4,016* | 3,943* | 5,671* | 5,253* |
"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"
"^" Means "SEVERE setjy No rows were selected"
Full, new, serial pipeline with large dataset and profiling metrics
RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)
CASA | NM (E5-2640v3) | CV (E5-2640v3) | NM (E5-2670) | CV (E5-2670) |
---|---|---|---|---|
5 | 3,326*^ | 4,485*^ | ||
6 | 4,172* |
"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"
"^" Means "SEVERE setjy No rows were selected"
TASK | NM (E5-2640v3) CASA-5 | NM (E5-2640v3) CASA-6 | |
---|---|---|---|
h_init | 0 | 0 | |
hifv_importdata | 10930 | 19740 | |
hifv_hanning | 7682 | 8625 | |
hifv_flagdata | 13112 | 15448 | |
hifv_vlasetjy | 3488 | 11115 | |
hifv_priorcals | 11973 | 15125 | |
hifv_testBPdcals | 3516 | 4652 | |
hifv_flagbaddef | 22 | 32 | |
hifv_checkflag | 3064 | 3206 | |
hifv_semiFinalBPdcals | 3352 | 7335 | |
hifv_checkflag | 8386 | 11990 | |
hifv_solint | 2927 | 4108 | |
hifv_fluxboot2 | 4823 | 8685 | |
hifv_finalcals | 7444 | 8448 | |
hifv_circfeedpolcal | 1431 | 1586 | |
hifv_flagcal | 15 | 25 | |
hifv_applycals | 10021 | 10807 | |
hifv_checkflag | 75033 | 74867 | |
hifv_statwt | 27565 | 28557 | |
hifv_plotsummary | 4785 | 15778 | |
h_save | 3 | 1 |