Page History

Track Brian Kent's issue.Page for tracking an apparently slow down w.r.t CASA-5 and CASA-6 for VLASS calibration: https://open-jira.nrao.edu/browse/PIPE-568

Comparing CASA-5 and CASA-6 (casa-pipeline-validation-8) across the two different CPUs available for batch processing in NM and CV shows that the newer CPUs (E5-2640v3) run a simple small calibration job (6.7GB) about 1.25 times faster than the old CPUs (E5-2670) with CASA-6 performing slower in every case.

...

. There was no significant run-time difference between NM and CV for similar hardware and software. Results are in minutes.

Here is the full pipeline script I have used for all of these tests casa_pipescript.py For some tests, I commented out all but hifv_importdata.

Full, serial pipeline with small dataset

RHEL7 - 6.7GB dataset with NM Lustre-2.5.5 (results are in minutes)

CASA	nmpost051 (E5-2640v3)	cvpost020 (E5-2640v3)	nmpost038 (E5-2670)	cvpost003 (E5-2670)
5	114, 117	110, 111	144, 143	140, 141
6	156, 164	156, 158	200, 201	197, 199

RHEL7 - 6.7GB dataset after NM upgrade Lustre-2.10.8 and CV results copied from last test (results are in minutes)

CASA	nmpost051 NM (E5-2640v3)	cvpost020 CV (E5-2640v3)	nmpost048 NM (E5-2670)	cvpost003 CV (E5-2670)
5	113, 110	110, 111	142, 141	140, 141
6	155*	156, 158	198*	197*, 199

...

*

Mar. 3, 2020 krowe: I tried the nmpost051-casa6-rhel7 with the latest casa-pipeline-validation-17. The run-time was the same as were the tclean() errors.

"*" Means it completed with tclean() errors

Full, new, serial pipeline with large dataset

Mar. 17, 2020 I started using the same pipeline script that Brian is currently using.

RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)

CASA	nmpost051 NM (E5-2640v3)	cvpost020 CV (E5-2640v3)	nmpost038 NM (E5-2670)	cvpost003 CV (E5-2670)
5	192	196	239	251
6	328	364, 378	411, 427	453
3,350*^	3,362*^	4,605*^	4,480*^
6	4,016*	3,943*	5,671*	5,253*

"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"

"^" Means "SEVERE setjy No rows were selected"

Full, new, serial pipeline with large dataset and profiling metrics

Mar. 17, 2020 I started using the same pipeline script that Brian is currently usingYou can see that running just hifv_importdata() on a larger data set (350GB) shows that nmpost nodes run about 2% to 10% faster than similar cvpost nodes with CASA-6 performing slower in every case.

RHEL7 - 350GB dataset with NM Lustre-2.10.8.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)

CASA	nmpost051 NM (E5-2640v3)	cvpost020 CV (E5-2640v3)	nmpost048 NM (E5-2670)	cvpost003 CV (E5-2670)
5	187	196	244	251	6	364, 378	453

Running both hifv_importdata() and hifv_hanning().

RHEL6 - with NM Lustre-2.5.5

...

RHEL7 - with NM Lustre-2.5.5

...

Running entire pipeline with -n 8

RHEL7

...

"*" After 14 days of running the setjy task, and using Felipe's profiling metrics, I canceled the job.

Running entire pipeline with -n 9

RHEL7

3,326*^ Image Added		4,485*^ Image Added
6	4,172* Image Added		5,572* Image Added

"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"

"^" Means "SEVERE setjy No rows were selected"

Full, new serial pipeline with large dataset and times per pipeline task

Comparing two profiling jobs against one of Brian's jobs (/lustre/aoc/sciops/bkent/pipetest/llama3/workingtest60_2) on the same hardware (E5-2670) in NM. Times were calculated from the CASA logs. Times are in minutes.

Large dataset (350GB) times are in minutes	CASA-5.6.3-9, Pipeline 43128	CASA-6.0.0.23-pipeline-validation-17, Pipeline master-v0.1-145-ge322387-dirty	CASA-6.0.0.23-pipeline-validation-17, Pipeline master-v0.1-18-g2de4d78-dirty	CASA-6.0.0.23-pipeline-validation-17, Pipeline master-v0.1-18-g2de4d78-dirty
Task	kent2-pr-c5-l-70	kent2-pr-c6-l-70	kent3b-no-c6-l-70	CASA-6 Bkent
hifv_importdata	247	425	403	392
hifv_hanning	175	188	334	460
hifv_flagdata	272	323	374	452
hifv_vlasetjy	75	199	255	357
hifv_priorcals	254	281	539	494
hifv_testBPdcals	74	84	98	123
hifv_flagbaddef	0	1	0	0
hifv_checkflag	68	70	69	69
hifv_semiFinalBPdcals	75	153	154	154
hifv_checkflag	189	254	250	253
hifv_solint	66	89	105	105
hifv_fluxboot2	104	181	185	175
hifv_finalcals	162	182	177	177
hifv_circfeedpolcal	31	33	32	32
hifv_flagcal	0	1	0	0
hifv_applycals	205	212	358	437
hifv_checkflag	1741	1840	2388	2930
hifv_statwt	645	710	812	500
hifv_plotsummary	101	346	350	350

TOTAL (minutes)	4484	5573	6884	7460

K. Scott finished three runs on Apr. 8, 2020 using Brian's large dataset (350GB), CASA-6.0.0.23-pipeline-validation-17 and Pipeline master-v0.1-18-g2de4d78-dirty separated by about an hour each. Each job requested 1 node with 8 cores and 96gb; essentially a NUMA node. system.resources.memory was unset and _cf.validate_parameters = False. (Times are in minutes)

Task	kent3a-no-c6-l-70	kent3b-no-c6-l-70	kent3c-no-c6-l-70
hifv_importdata	410	403	407
hifv_hanning	364	334	359
hifv_flagdata	381	374	386
hifv_vlasetjy	263	255	256
hifv_priorcals	513	539	511
hifv_testBPdcals	97	98	98
hifv_flagbaddef	0	0	0
hifv_checkflag	68	69	68
hifv_semiFinalBPdcals	153	154	152
hifv_checkflag	251	250	250
hifv_solint	105	105	106
hifv_fluxboot2	174	185	174
hifv_finalcals	178	177	180
hifv_circfeedpolcal	31	32	31
hifv_flagcal	0	0	0
hifv_applycals	353	358	366
hifv_checkflag	2501	2388	2302
hifv_statwt	832	812	806
hifv_plotsummary	348	350	345

TOTAL (minutes)	7023	6884	6799

...

Space shortcuts

Page tree

Versions Compared

Old Version 21

New Version Current

Key