Page for presenting results; description of the tests is presented on parent page:
Calibration pipeline
Goal: Test the behavior of Tier0 parallelization of calibrator imaging in the calibration pipelline
Table: Runtime of the calibration pipeline vs. CASA version, 8 way parallel runs (unless stated otherwise) and 256 GB memory limit.
Project | casa-pipeline-release-5.6.1-8.el7 | casa-6.1.0-63 | casa-CAS-9386-53 | casa-CAS-9386-53_2MPI |
2017.1.00717.S | 28h51m11s | 32h14m43s | 31h37m52s | 33h09m28s |
2017.1.00750.T | 05h17m12s | 04h52m57s | 04h50m23s | 04h42m11s |
2017.1.00884.S | 09h15m26s | 08h46m16s | 08h43m33s | 08h36m50s |
2017.1.00983.S | 55h41m06s | 51h29m49s | 51h27m05s | 58h05m08s |
2017.1.01214.S | 28h54m29s | 22h41m17s | 22h39m23s | 20h49m49s |
E2E6.1.00080.S | 14h01m18s | 13h14m39s | 13h18m04s | 14h25m03s |
E2E6.1.00092.S | 61h26m59s | 65h46m20s | 66h13m38s | 70h14m20s |
No obvious issues in calibration, slight increase in runtime which is not unexpected given tclean() runtime change. Some things could be investigated but no current plans given competing time demands.
Imaging pipeline
Goal: Demonstrate that the refactored code has the desired memory footprint effect.
Table: Memory footprint of entire pipeline run vs. CASA version, 8 way parallel runs with 128 GB memory limit,
Memory footprint (GB) | |||
Project | casa-pipeline-release-5.6.1-8.el7 | casa-6.1.0-63 | casa-CAS-9386-53 |
2017.1.00717.S | 15.96028519 | 15.35385895 | 17.16140747 |
2017.1.00750.T | 5.300235748 | 3.992595673 | 4.630950928 |
2017.1.00884.S | 48.88618088 | 53.2240715 | 79.35606766 |
2017.1.00983.S | 50.89418411 | 50.69113159 | 59.88224411 |
2017.1.01214.S | 20.72197723 | 20.30682755 | 24.82666397 |
E2E6.1.00080.S | 47.14878082 | 45.94787979 | 49.49978256 |
E2E6.1.00092.S | 23.10336304 | 22.3993721 | 57.35647202 |
Memory footprint is systematically higher for the refactor case. Which is opposite of expected. Potentially this is as an efficiency improvement in chanchunk estimation and not the actual natural unconstrained limit which would suggest that the memory usage would increase as available memory increased.
Goal: Demonstrate the runtime cost of the refactored code and whether it's a fixed overhead so it's contribution goes to zero for larger data sets or whether the overhead scales with image complexity
Table: Runtime of imaging pipeline vs. CASA version, 8 way parallel runs with 128 GB memory limit.
Project | casa-pipeline-release-5.6.1-8.el7 | casa-6.1.0-63 | casa-CAS-9386-53 |
2017.1.00717.S | 18h36m15s | 17h21m13s | 15h43m25s |
2017.1.00750.T | 06h41m32s | 05h30m07s | 02h55m08s |
2017.1.00884.S | 07h17m28s | 06h45m21s | 05h50m07s |
2017.1.00983.S | 110h33m48s | 117h53m42s | 276h24m43s |
2017.1.01214.S | 06h46m09s | 06h11m58s | 05h31m04s |
E2E6.1.00080.S | 35h22m22s | 35h16m56s | 29h31m27s |
E2E6.1.00092.S | 126h30m37s | 130h18m17s | 53h35m03s |
Several unanticipated or difficult to explain behaviors seen in imaging pipeline:
Why the significant refactor runtime increase (>2x) with 983.S
Why the significant decrease with 092.S.
Why the systematic slight decrease in runtime with all other data sets, we expected a slight increase in runtime.
For 2017.1.00983.S the two most likely culprits are time per major cycle and number of major cycles.
The average major cycle time plus transition time to minor cycle duration per tclean calls varies with a bias toward the refactor code taking longer as expected but not at the 2x level plus. For some cases it is faster, which is yet to be explained.
Significant increase in number of major cycles.for SPWs 29 and 31 for both targets. So for 2017.1.00983.S the increase in runtime for the pipeline is a slight increase in per major cycle time and a significant increase in number of major cycles for two SPWs. The latter probably warrants further examination, it is likely that real world data sets could see significantly longer run times due to convergence issues with the new cube stop criteria.
For E2E6.1.00092.S, there are 3 possible explanations, decreased major cycle runtime (shouldn’t be possible), decreased number of major cycles, parallelization effects.
The above plot shows much less runtime per major cycle for refactored, about 5x which would be consistent with serial vs parallel with 8 engines.
Consistent number of major cycles per tclean calls.
The ganglia plots bellow show the CPU load of cube imaging for casa-6.1.0-63 and casa-CAS-9386-53.el7, respectively.
The casa5 and casa6 logs show tclean is explicitly called with parallel=False, refactor is with true.
I/O wait at 15:50 is most likely a local lustre response issue.
Lastly the log scale plot of imaging pipeline runtime shows inverted improvement from expectation.
Below is a plot for number of major cycle references per casa version per data set (without E2E6.1.00092.S and 2017.1.00750.T because of parallelization). This is across *all* tclean calls for all imaging cases in the pipeline.