VLASS calibration pipeline CASA-5 vs CASA-6

Page for tracking an apparently slow down w.r.t CASA-5 and CASA-6 for VLASS calibration: https://open-jira.nrao.edu/browse/PIPE-568

Comparing CASA-5 and CASA-6 (casa-pipeline-validation-8) across the two different CPUs available for batch processing in NM and CV shows that the newer CPUs (E5-2640v3) run a small calibration job (6.7GB) about 1.25 times faster than the old CPUs (E5-2670) with CASA-6 performing slower in every case. There was no significant run-time difference between NM and CV for similar hardware and software. Results are in minutes.

Here is the full pipeline script I have used for all of these tests casa_pipescript.py For some tests, I commented out all but hifv_importdata.

Full, serial pipeline with small dataset

RHEL7 - 6.7GB dataset with NM Lustre-2.5.5 (results are in minutes)

CASA	nmpost051 (E5-2640v3)	cvpost020 (E5-2640v3)	nmpost038 (E5-2670)	cvpost003 (E5-2670)
5	114, 117	110, 111	144, 143	140, 141
6	156, 164	156, 158	200, 201	197, 199

RHEL7 - 6.7GB dataset after NM upgrade Lustre-2.10.8 and CV results copied from last test (results are in minutes)

CASA	NM (E5-2640v3)	CV (E5-2640v3)	NM (E5-2670)	CV (E5-2670)
5	113, 110	110, 111	142, 141	140, 141
6	155*	156, 158	198*	197, 199

Mar. 3, 2020 krowe: I tried the nmpost051-casa6-rhel7 with the latest casa-pipeline-validation-17. The run-time was the same as were the tclean() errors.

"*" Means it completed with tclean() errors

Just serial hifv_importdata() with large dataset

RHEL7 - 350GB dataset with NM Lustre-2.5.5 (results are in minutes)

CASA	nmpost051 (E5-2640v3)	cvpost020 (E5-2640v3)	nmpost038 (E5-2670)	cvpost003 (E5-2670)
5	192	196	239	251
6	328	364, 378	411, 427	453

You can see that running just hifv_importdata() on a larger data set (350GB) shows that nmpost nodes run about 2% to 10% faster than similar cvpost nodes with CASA-6 performing slower in every case.

RHEL7 - 350GB dataset with NM Lustre-2.10.8 (results are in minutes)

CASA	nmpost051 (E5-2640v3)	cvpost020 (E5-2640v3)	nmpost048 (E5-2670)	cvpost003 (E5-2670)
5	187	196	244	251
6		364, 378		453

Full, serial pipeline with large dataset

RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)

CASA	NM (E5-2640v3)	CV (E5-2640v3)	NM (E5-2670)	CV (E5-2670)
5	3,045^	3,011^	3,431^	3,401^
6	3,640	3,466	4,511	4,392

"^" Means "SEVERE setjy No rows were selected"

Full, serial pipeline with large dataset and Profiling Metrics

RHEL7 - 350GB dataset with NM Lustre-2.10.x,CASA-pipeline-5.6.3-9 and CASA 6.0.0.23a100.dev17 (results are in minutes)

CASA	NM (E5-2640v3)	CV (E5-2640v3)	NM (E5-2670)	CV (E5-2670)
5	4,095^, 2,645^	4,559^	3,397^	3,410^
6	3,527		4,442

"^" Means "SEVERE setjy No rows were selected"

CASA	NM (E5-2640v3)	CV (E5-2640v3)	NM (E5-2670)	CV (E5-2670)
5
6

Current Pipeline Script

Mar. 17, 2020 I started using the same pipeline script that Brian is currently using.

Full, new, serial pipeline with small dataset

RHEL7 - 6.7GB dataset with NM Lustre-2.10.x (results are in minutes) I testing a CASA-6 job with and without cf.validate_parameters = False and both jobs took the same amount of time +/- 1 minute.

CASA	NM (E5-2640v3)	CV (E5-2640v3)	NM (E5-2670)	CV (E5-2670)
5	95*
6	130*

"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"

Full, new, serial pipeline with large dataset

RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)

CASA	NM (E5-2640v3)	CV (E5-2640v3)	NM (E5-2670)	CV (E5-2670)
5	3,350*^	3,362*^	4,605*^	4,480*^
6	4,016*	3,943*	5,671*	5,253*

"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"

"^" Means "SEVERE setjy No rows were selected"

Full, new, serial pipeline with large dataset and profiling metrics

RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)

CASA	NM (E5-2640v3)	CV (E5-2640v3)	NM (E5-2670)	CV (E5-2670)
5	3,326*^		4,485*^
6	4,172*

"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"

"^" Means "SEVERE setjy No rows were selected"

Space shortcuts

Page tree

Full, serial pipeline with small dataset

Just serial hifv_importdata() with large dataset

Full, serial pipeline with large dataset

Full, serial pipeline with large dataset and Profiling Metrics

Current Pipeline Script

Full, new, serial pipeline with small dataset

Full, new, serial pipeline with large dataset

Full, new, serial pipeline with large dataset and profiling metrics