Page for tracking an apparently slow down w.r.t CASA-5 and CASA-6 for VLASS calibration: https://open-jira.nrao.edu/browse/PIPE-568
Comparing CASA-5 and CASA-6 (casa-pipeline-validation-8) across the two different CPUs available for batch processing in NM and CV shows that the newer CPUs (E5-2640v3) run a small calibration job (6.7GB) about 1.25 times faster than the old CPUs (E5-2670) with CASA-6 performing slower in every case. There was no significant run-time difference between NM and CV for similar hardware and software. Results are in minutes.
Here is the full pipeline script I have used for all of these tests casa_pipescript.py For some tests, I commented out all but hifv_importdata.
Full, serial pipeline with small dataset
RHEL7 - 6.7GB dataset with NM Lustre-2.5.5 (results are in minutes)
CASA | nmpost051 (E5-2640v3) | cvpost020 (E5-2640v3) | nmpost038 (E5-2670) | cvpost003 (E5-2670) |
---|---|---|---|---|
5 | 114, 117 | 110, 111 | 144, 143 | 140, 141 |
6 | 156*, 164* | 156*, 158* | 200*, 201* | 197*, 199* |
RHEL7 - 6.7GB dataset after NM upgrade Lustre-2.10.8 and CV results copied from last test (results are in minutes)
CASA | NM (E5-2640v3) | CV (E5-2640v3) | NM (E5-2670) | CV (E5-2670) |
---|---|---|---|---|
5 | 113, 110 | 110, 111 | 142, 141 | 140, 141 |
6 | 155* | 156*, 158* | 198* | 197*, 199* |
Mar. 3, 2020 krowe: I tried the nmpost051-casa6-rhel7 with the latest casa-pipeline-validation-17. The run-time was the same as were the tclean() errors.
"*" Means it completed with tclean() errors
Just serial hifv_importdata() with large dataset
RHEL7 - 350GB dataset with NM Lustre-2.5.5 (results are in minutes)
CASA | nmpost051 (E5-2640v3) | cvpost020 (E5-2640v3) | nmpost038 (E5-2670) | cvpost003 (E5-2670) |
---|---|---|---|---|
5 | 192 | 196 | 239 | 251 |
6 | 328 | 364, 378 | 411, 427 | 453 |
You can see that running just hifv_importdata() on a larger data set (350GB) shows that nmpost nodes run about 2% to 10% faster than similar cvpost nodes with CASA-6 performing slower in every case.
RHEL7 - 350GB dataset with NM Lustre-2.10.8 (results are in minutes)
CASA | nmpost051 (E5-2640v3) | cvpost020 (E5-2640v3) | nmpost048 (E5-2670) | cvpost003 (E5-2670) |
---|---|---|---|---|
5 | 187 | 196 | 244 | 251 |
6 | 364, 378 | 453 |
Full, serial pipeline with large dataset
RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)
CASA | NM (E5-2640v3) | CV (E5-2640v3) | NM (E5-2670) | CV (E5-2670) |
---|---|---|---|---|
5 | 3,045^ | 3,011^ | 3,431^ | 3,401^ |
6 | 3,640 | 3,466 | 4,511 | 4,392 |
"^" Means "SEVERE setjy No rows were selected"
Full, serial pipeline with large dataset and Profiling Metrics
RHEL7 - 350GB dataset with NM Lustre-2.10.x,CASA-pipeline-5.6.3-9 and CASA 6.0.0.23a100.dev17 (results are in minutes)
CASA | NM (E5-2640v3) | CV (E5-2640v3) | NM (E5-2670) | CV (E5-2670) |
---|---|---|---|---|
5 | 4,095^, 2,645^ | 4,559^ | 3,397^ | |
6 | 3,527 | 4,442 |
"^" Means "SEVERE setjy No rows were selected"
Current Pipeline Script
Mar. 17, 2020 I started using the same pipeline script that Brian is currently using.
Full, new, serial pipeline with small dataset
RHEL7 - 6.7GB dataset with NM Lustre-2.10.x (results are in minutes) I testing a CASA-6 job with and without cf.validate_parameters = False and both jobs took the same amount of time +/- 1 minute.
"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"
Full, new, serial pipeline with large dataset
RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)