Track Brian Kent's issue.
Comparing CASA-5 and CASA-6 (casa-pipeline-validation-8) across the two different CPUs available for batch processing in NM and CV shows that the newer CPUs (E5-2640v3) run a small calibration job (6.7GB) about 1.25 times faster than the old CPUs (E5-2670) with CASA-6 performing slower in every case. There was no significant run-time difference between NM and CV for similar hardware and software.
Full, serial pipeline with small dataset
RHEL7 - 6.7GB dataset with NM Lustre-2.5.5
CASA | nmpost051 (E5-2640v3) | cvpost020 (E5-2640v3) | nmpost038 (E5-2670) | cvpost003 (E5-2670) |
---|---|---|---|---|
5 | 114, 117 | 110, 111 | 144, 143 | 140, 141 |
6 | 156*, 164* | 156*, 158* | 200*, 201* | 197*, 199* |
RHEL7 - 6.7GB dataset after NM upgrade Lustre-2.10.8 and CV results copied from last test
CASA | nmpost051 (E5-2640v3) | cvpost020 (E5-2640v3) | nmpost048 (E5-2670) | cvpost003 (E5-2670) |
---|---|---|---|---|
5 | 113 | 110, 111 | 142 | 140, 141 |
6 | 155* | 156*, 158* | 197*, 199* |
"*" Means it completed with tclean() errors
Just serial hifv_importdata() with large dataset
RHEL7 - 350GB dataset with NM Lustre-2.5.5
CASA | nmpost051 (E5-2640v3) | cvpost020 (E5-2640v3) | nmpost038 (E5-2670) | cvpost003 (E5-2670) |
---|---|---|---|---|
5 | 192 | 196 | 239 | 251 |
6 | 328 | 364, 378 | 411, 427 | 453 |
You can see that running just hifv_importdata() on a larger data set (350GB) shows that nmpost nodes run about 2% to 10% faster than similar cvpost nodes with CASA-6 performing slower in every case.
RHEL7 - 350GB dataset with NM Lustre-2.10.8
CASA | nmpost051 (E5-2640v3) | cvpost020 (E5-2640v3) | nmpost048 (E5-2670) | cvpost003 (E5-2670) |
---|---|---|---|---|
5 | 187 | 196 | 244 | 251 |
6 | 364, 378 | 453 |
Both serial hifv_importdata() and hifv_hanning().
RHEL6 - with NM Lustre-2.5.5
RHEL7 - with NM Lustre-2.5.5
Full parallel pipeline with -n 8
RHEL7
"*" After 14 days of running the setjy task, and using Felipe's profiling metrics, I canceled the job.
Full parallel pipeline with -n 9
RHEL7
"*" Means it completed with pbcor errors