Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Track Brian Kent's issue.Page for tracking an apparently slow down w.r.t CASA-5 and CASA-6 for VLASS calibration: https://open-jira.nrao.edu/browse/PIPE-568

Comparing CASA-5 and CASA-6 (casa-pipeline-validation-8) across the two different CPUs available for batch processing in NM and CV shows that the newer CPUs (E5-2640v3) run a simple small calibration job (6.7GB) about 1.25 times faster than the old CPUs (E5-2670) with CASA-6 performing slower in every case.  There was no significant run-time difference between NM and CV for similar hardware and software.  Results are in minutes.

Here is the full pipeline script I have used for all of these tests casa_pipescript.py For some tests, I commented out all but hifv_importdata.

Full, serial pipeline with small dataset

RHEL7 (Times - 6.7GB dataset with NM Lustre-2.5.5 (results are in minutes with hh:mm in paranthises)

CASAnmpost051 (E5-2640v3)cvpost020 (E5-2640v3)nmpost038 (E5-2670)cvpost003 (E5-2670)
5114 (1:54), 117 (1:57)110 (1:50), 111 (1:51)144 (2:22), 143 (2:23)140 (2:20), 141 (2:21)
6156 (2:36)*, 164 (2:44)*156 (2:36)*, 158 (2:38)*200 (3:20)*, 201 (3:21)*197 (3:17)*, 199 (3:19)

Running just hifv_importdata() on a larger data set (350GB) shows that nmpost nodes run about 2% to 10% faster than similar cvpost nodes with CASA-6 performing slower in every case.

RHEL7 (Times are in minutes with hh:mm in paranthises)

*


RHEL7 - 6.7GB dataset after NM upgrade Lustre-2.10.8 and CV results copied from last test (results are in minutes)

CASANM CASAnmpost051 (E5-2640v3)cvpost020 CV (E5-2640v3)nmpost038 NM (E5-2670)cvpost003 CV (E5-2670)
5192 (3:12)196 (3:16)239 (3:59)251 (4:11)
6328 (5:28)364 (6:04), 378 (6:18)411 (6:51), 427 (7:07)453 (7:33)

Running both hifv_importdata() and hifv_hanning().

RHEL6

113, 110110, 111142, 141140, 141
6155*156*, 158*198*197*, 199*

Mar. 3, 2020 krowe: I tried the nmpost051-casa6-rhel7 with the latest casa-pipeline-validation-17.  The run-time was the same as were the tclean() errors.

"*" Means it completed with tclean() errors


Full, new, serial pipeline with large dataset

Mar. 17, 2020 I started using the same pipeline script that Brian is currently using.

RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)

CASANM CASAnmpost051 (E5-2640v3)cvpost020 CV (E5-2640v3)nmpost038 NM (E5-2670)cvpost003 CV (E5-2670)
5344 (5:44)

RHEL7 (Times are in minutes with hh:mm in paranthises)

3,350*^3,362*^4,605*^4,480*^
64,016*3,943*5,671*5,253*

"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"

"^" Means "SEVERE setjy No rows were selected"


Full, new, serial pipeline with large dataset and profiling metrics

Mar. 17, 2020 I started using the same pipeline script that Brian is currently using.

RHEL7 - 350GB dataset with NM Lustre-2.10.x, CASA-pipeline-5.6.3-9 or CASA 6.0.0.23a100.dev17 (results are in minutes)

CASANM CASAnmpost051 (E5-2640v3)cvpost020 CV (E5-2640v3)nmpost038 NM (E5-2670)cvpost003 CV (E5-2670)
5343 (5:43)413 (6:53)6

Running entire pipeline with -n 8 (times are in minutes)

RHEL6

...

RHEL7 (Times are in minutes with hh:mm in paranthises)

...

Running entire pipeline with -n 9 (times are in minutes)

RHEL7 (Times are in minutes with hh:mm in paranthises)

3,326*^

Image Added


4,485*^

Image Added



6

4,172*

Image Added


5,572*

Image Added


"*" Means "SEVERE pipeline.hifv.tasks.flagging No flag summary statistics"

"^" Means "SEVERE setjy No rows were selected"


Full, new serial pipeline with large dataset and times per pipeline task

Comparing two profiling jobs against one of Brian's jobs (/lustre/aoc/sciops/bkent/pipetest/llama3/workingtest60_2) on the same hardware (E5-2670) in NM.  Times were calculated from the CASA logs.  Times are in minutes.

Large dataset (350GB) times are in minutes

CASA-5.6.3-9,

Pipeline 43128

CASA-6.0.0.23-pipeline-validation-17,

Pipeline master-v0.1-145-ge322387-dirty

CASA-6.0.0.23-pipeline-validation-17,

Pipeline master-v0.1-18-g2de4d78-dirty

CASA-6.0.0.23-pipeline-validation-17,

Pipeline master-v0.1-18-g2de4d78-dirty

Taskkent2-pr-c5-l-70kent2-pr-c6-l-70kent3b-no-c6-l-70CASA-6 Bkent
hifv_importdata247425403392
hifv_hanning175188334460
hifv_flagdata272323374452
hifv_vlasetjy75199255357
hifv_priorcals254281539494
hifv_testBPdcals748498123
hifv_flagbaddef0100
hifv_checkflag68706969
hifv_semiFinalBPdcals75153154154
hifv_checkflag189254250253
hifv_solint6689105105
hifv_fluxboot2104181185175
hifv_finalcals162182177177
hifv_circfeedpolcal31333232
hifv_flagcal0100
hifv_applycals205212358437
hifv_checkflag1741184023882930
hifv_statwt645710812500
hifv_plotsummary101346350350





TOTAL (minutes)4484557368847460




K. Scott finished three runs on Apr. 8, 2020 using Brian's large dataset (350GB), CASA-6.0.0.23-pipeline-validation-17 and Pipeline master-v0.1-18-g2de4d78-dirty separated by about an hour each.  Each job requested 1 node with 8 cores and 96gb; essentially a NUMA node. system.resources.memory was unset and _cf.validate_parameters = False. (Times are in minutes)

Taskkent3a-no-c6-l-70kent3b-no-c6-l-70kent3c-no-c6-l-70
hifv_importdata410403407
hifv_hanning364334359
hifv_flagdata381374386
hifv_vlasetjy263255256
hifv_priorcals513539511
hifv_testBPdcals979898
hifv_flagbaddef000
hifv_checkflag686968
hifv_semiFinalBPdcals153154152
hifv_checkflag251250250
hifv_solint105105106
hifv_fluxboot2174185174
hifv_finalcals178177180
hifv_circfeedpolcal313231
hifv_flagcal000
hifv_applycals353358366
hifv_checkflag250123882302
hifv_statwt832812806
hifv_plotsummary348350345




TOTAL (minutes)702368846799

...