Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Test the behavior of Tier0 parallelization of calibrator imaging in the calibration pipelline (provides CASA6 based calibrates MSes as a side effect for imaging run). Results
  2. Demonstrate that the refactored code has the desired memory footprint effect.  We'll start with the referenced data set and then expand to larger data sets.   Results
  3. Demonstrate the runtime cost of the refactored code and whether it's a fixed overhead so it's contribution goes to zero for larger data sets or whether the overhead scales with image complexityResults


Results for the tests described in this page are shown in the results page:

Children Display


Phase 1, calibrator imaging tests run vs hifacal.py (8 way parallelization unless stated otherwise) (runs located at /lustre/aoc/sciops/fmadsen/tests/tclean_cube_refactor/<casa version>/calibration/<project>/working)

ALMA dataset (project)

casa-pipeline-release-5.6.1-8.el7

pipeline rev. 42866 (hifacal.py)

casa-6.1.0-63

Pipeline master-v0.1-143-g6f5b3d8

(hifacal.py)

casa-CAS-9386-51 (CASA 6.1.0.54a9386.dev51)

Pipeline master-v0.1-145-ge322387-dirty (hifacal.py)

casa-CAS-9386-

52

53 (CASA 6.1.0.54a9386.

dev52

dev53)

Pipeline master-v0.1-143-g6f5b3d8 (hifacal.py)

casa-CAS-9386-

52

53 (CASA 6.1.0.54a9386.

dev52

dev53)

Pipeline master-v0.1-143-g6f5b3d8 (hifacal.py)

2 way parallelization

2017.1.00717.S

complete

not startednot started

complete

complete (local run)

complete
complete
2017.1.01214.Scomplete
not started
complete

complete (local run)

complete
complete
not started
2017.1.00884.Scomplete complete
not started

complete (local run)

complete
complete
not started
E2E6.1.00080.Scomplete complete

complete


not started

(local run)

complete
complete
not startednot started
2017.1.00983.Scomplete
not started
complete

complete (local run)

completecomplete
2017.1.00750.T

complete

complete
-complete
complete

E2E6.1.00092.S

complete
complete
-complete
complete

For all tests below Record tclean parameters and telemetry data for each of the 3 tclean calls.

...

Run each standard ALMA imaging pipeline generated data set through the following 3 casa revs.  All tests run with 8 way parallelization and 128GB memory limit.  All tests run within AWS. 

ALMA dataset (project)

casa-pipeline-release-5.6.1-8.el7

pipeline rev. 42866 hifatargets.py

casa-6.1.0-63

Pipeline master-v0.1-143-g6f5b3d8 hifatargets.py


casa-CAS-9386-

52

53 (CASA 6.1.0.54a9386.

dev52

dev53)

Pipeline master-v0.1-143-g6f5b3d8 hifatargets.py

2017.1.00717.S
testing
complete (aws); complete (local run)
testing
complete
testing
complete
2017.1.01214.S
testing
complete (aws); complete (local run)
not started
complete
not started
complete
2017.1.00884.S
testing
complete (aws); complete (local run)
testing
complete
testing
complete
E2E6.1.00080.S
testing
complete (aws); complete (local run)
not startednot started
completecomplete (dev66)
2017.1.00983.S
testing
complete (aws); complete (local run)complete
complete


The following data sets will run on NRAO clusters with 8 way parallelization and 128 GB memory limit, some as a check against AWS runs. Of the 5 test data sets 2017.1.00884.S has the highest memory footprint, 2017.1.00983.S is the longest running.

ALMA dataset (project)

casa-pipeline-release-5.6.1-8.el7

pipeline rev. 42866 hifatargets.py

casa-6.1.0-63

Pipeline master-v0.1-143-g6f5b3d8 hifatargets.py


casa-CAS-9386-66 (CASA 6.1.0.54a9386.dev66)

(dev53 marked with *)

Pipeline master-v0.1-143-g6f5b3d8 hifatargets.py

casa-CAS-9386-66 (CASA 6.1.0.54a9386.dev66)

Pipeline master-v0.1-143-g6f5b3d8 hifatargets.py

NVME

2017.1.00750.T*completecompletecomplete-
E2E6.1.00092.S*completecompletecomplete-
2017.1.00884.S

complete

completecomplete; previously failed (segfault; imageprecheck)
complete
2017.1.00983.Stestingfailed (segfault during cube imaging); rerunning
testingtesting



The following tests vary memory environment for each data set, all tests using casa-CAS-9386-52 53 (CASA 6.1.0.54a9386.dev52dev53) refactor code. All tests run within AWS

ALMA dataset (project)

128 GB memory 8 way parallelization

hifatargets.py

256 GB memory 8 way parallelization

hifatargets.py

512 GB memory 8 way parallelization

hifatargets.py

2017.1.00717.Stestingcompletetesting

complete

testingcomplete
2017.1.01214.Snot startedcompletecompletenot started

completenot started

2017.1.00884.Stestingcompletetestingcomplete
testingcomplete
E2E6.1.00080.Snot startednot startedcomplete (dev66)completecompletenot started
2017.1.00983.Scompletetesting
testing
complete
completetesting


As a control the following two data sets will be run on NRAO clusters as a check against AWS runs.  Of the 5 test data sets 2017.1.008400884.S has the highest memory footprint, 2017.1.00983.S is the longest running.

ALMA dataset (project)

128 GB memory 8 way parallelization

128 GB memory 8 way parallelization

NVME

256 GB memory 8 way parallelization

512 GB memory 8 way parallelization


2017.1.00884.Snot started
not startedcomplete; previously failed (segfault; imageprecheck)
completefailed (userlock)
not startedfailed (userlock)
2017.1.00983.Snot started not started(due to other runs failing)-failed (userlock)
not startedfailed (userlock)