This page is for tracking the testing of the CASA 6 cube refactor documented here:
https://docs.google.com/document/d/1Jv-t_4k5Vv1Rgh_eYeU4j84UWrrnO3d6INLBFtuMD6s/edit?usp=sharing
There are three primary goals to our tests:
- Test the behavior of Tier0 parallelization of calibrator imaging in the calibration pipelline (provides CASA6 based calibrates MSes as a side effect for imaging run)
- Demonstrate that the refactored code has the desired memory footprint effect. We'll start with the referenced data set and then expand to larger data sets.
- Demonstrate the runtime cost of the refactored code and whether it's a fixed overhead so it's contribution goes to zero for larger data sets or whether the overhead scales with image complexity
Results for the tests described in this page are shown in the results page:
Phase 1, calibrator imaging tests run vs hifacal.py (8 way parallelization unless stated otherwise) (runs located at /lustre/aoc/sciops/fmadsen/tests/tclean_cube_refactor/<casa version>/calibration/<project>/working)
ALMA dataset (project) | casa-pipeline-release-5.6.1-8.el7 pipeline rev. 42866 (hifacal.py) | casa-6.1.0-63 Pipeline master-v0.1-143-g6f5b3d8 (hifacal.py) | casa-CAS-9386-51 (CASA 6.1.0.54a9386.dev51) Pipeline master-v0.1-145-ge322387-dirty (hifacal.py) | casa-CAS-9386-53 (CASA 6.1.0.54a9386.dev53) Pipeline master-v0.1-143-g6f5b3d8 (hifacal.py) | casa-CAS-9386-53 (CASA 6.1.0.54a9386.dev53) Pipeline master-v0.1-143-g6f5b3d8 (hifacal.py) 2 way parallelization |
---|---|---|---|---|---|
2017.1.00717.S | complete | complete | complete (local run) | complete | complete |
2017.1.01214.S | complete | complete | complete (local run) | complete | complete |
2017.1.00884.S | complete | complete | complete (local run) | complete | complete |
E2E6.1.00080.S | complete | complete | complete (local run) | complete | complete |
2017.1.00983.S | complete | complete | complete (local run) | complete | complete |
2017.1.00750.T | complete | complete | - | complete | complete |
E2E6.1.00092.S | complete | complete | - | complete | complete |
For all tests below Record tclean parameters and telemetry data for each of the 3 tclean calls.
Run each standard ALMA imaging pipeline generated data set through the following 3 casa revs. All tests run with 8 way parallelization and 128GB memory limit. All tests run within AWS.
ALMA dataset (project) | casa-pipeline-release-5.6.1-8.el7 pipeline rev. 42866 hifatargets.py | casa-6.1.0-63 Pipeline master-v0.1-143-g6f5b3d8 hifatargets.py | casa-CAS-9386-53 (CASA 6.1.0.54a9386.dev53) Pipeline master-v0.1-143-g6f5b3d8 hifatargets.py |
---|---|---|---|
2017.1.00717.S | complete (aws); complete (local run) | complete | complete |
2017.1.01214.S | complete (aws); complete (local run) | complete | complete |
2017.1.00884.S | complete (aws); complete (local run) | complete | complete |
E2E6.1.00080.S | complete (aws); complete (local run) | complete | complete (dev66) |
2017.1.00983.S | complete (aws); complete (local run) | complete | complete |
The following data sets will run on NRAO clusters with 8 way parallelization and 128 GB memory limit, some as a check against AWS runs. Of the 5 test data sets 2017.1.00884.S has the highest memory footprint, 2017.1.00983.S is the longest running.
ALMA dataset (project) | casa-pipeline-release-5.6.1-8.el7 pipeline rev. 42866 hifatargets.py | casa-6.1.0-63 Pipeline master-v0.1-143-g6f5b3d8 hifatargets.py | casa-CAS-9386-66 (CASA 6.1.0.54a9386.dev66) (dev53 marked with *) Pipeline master-v0.1-143-g6f5b3d8 hifatargets.py | casa-CAS-9386-66 (CASA 6.1.0.54a9386.dev66) Pipeline master-v0.1-143-g6f5b3d8 hifatargets.py NVME |
---|---|---|---|---|
2017.1.00750.T* | complete | complete | complete | - |
E2E6.1.00092.S* | complete | complete | complete | - |
2017.1.00884.S | complete | complete | complete; previously failed (segfault; imageprecheck) | complete |
2017.1.00983.S | testing | failed (segfault during cube imaging); rerunning | testing | testing |
The following tests vary memory environment for each data set, all tests using casa-CAS-9386-53 (CASA 6.1.0.54a9386.dev53) refactor code. All tests run within AWS
ALMA dataset (project) | 128 GB memory 8 way parallelization hifatargets.py | 256 GB memory 8 way parallelization hifatargets.py | 512 GB memory 8 way parallelization hifatargets.py |
---|---|---|---|
2017.1.00717.S | complete | complete | complete |
2017.1.01214.S | complete | complete | complete |
2017.1.00884.S | complete | complete | complete |
E2E6.1.00080.S | complete (dev66) | complete | complete |
2017.1.00983.S | complete | complete | complete |
As a control the following two data sets will be run on NRAO clusters as a check against AWS runs. Of the 5 test data sets 2017.1.00884.S has the highest memory footprint, 2017.1.00983.S is the longest running.
ALMA dataset (project) | 128 GB memory 8 way parallelization | 128 GB memory 8 way parallelization NVME | 256 GB memory 8 way parallelization | 512 GB memory 8 way parallelization |
---|---|---|---|---|
2017.1.00884.S | complete; previously failed (segfault; imageprecheck) | complete | failed (userlock) | failed (userlock) |
2017.1.00983.S | not started (due to other runs failing) | - | failed (userlock) | failed (userlock) |