This his work is a joint effort of SCG and ARDG to decompose imaging in small work units that can be processed as independent jobs on an HTC environment, making use of HTCondor's dagman.
Initial testing consists of running a single gridding cycle on a previously partitioned MS. Original scripts are located at /lustre/aoc/users/sbhatnag/11B-157/Continuum/IMAGING_CTB80/MTWBAWP/PARALLEL/HTCondor/SCRIPT_TEST
At this stage, all scripts assume a shared file system across all compute nodes (lustre). Follows a short description of each script.
imaging.py | Python script that splits the input MS into smaller MSes and produces the DAG (in tclean.dag file) This also has the tclean parameters |
mkres.py | The Python script that sets up the SynthesisImager tool of CASA, runs the gridder on the input MS and produces images with the given basename. The input MS (via the DAG nodes) are the sub-MSes produced by imaging.py |
tclean.dag | The DAG to convert sub-MSes to sub-images. Uses tclean.htc HTCondor script. |
tclean.htc | The HTCondor script that uses CASA to run mkres.py with the (sub-)MS and Image name. |
The convolution functions have to be obtained prior to the partition of the MS. They are contained in the (sub)directory cf.tt_tclean_allSPW_withW.ps. A copy of the original scripts, data (before and after partition) and convolution functions is located in the directory script_test_0 under /lustre/aoc/sciops/fmadsen/HTCondor/imaging_ctb80, that will be used as root directory for subsequent testing.
A top-level diagram of the imaging process is shown below. This is updated dynamically as the development progresses, to represent software structure, as well as important findings and remarks.
The DAG condor_imaging.dag under /lustre/aoc/sciops/fmadsen/HTCondor/imaging_ctb80/script_test_1 calls a first job MSpartition. This script partitions the MS based on inputs and writes the subdag allImagers.dag, that is called by condor_imaging.dag as the child of MSpartition. The file allImagers.dag is needed at DAG submission, but it can be empty.
The software structure has changed and now dagutils.py (the former imaging.py with small changes for integration) is the module that is imported on CASA and has all the function definitions that are used in both MSpartition.py and allImagers.py.
With respect to "test_0", the following has been accomplished on "test_1":
- software integration: a common source of function definitions is used by the scripts that do MS partitioning and run the first major cycle to produce the sub-images
- a DAG that runs MS partitioning as the first job and gridding/imaging the subMSes as second job, and easily scalable to run addition of sub-images and deconvolution
The new software modules are:
condor_imaging.dag | top-level DAG, currently defining the job MSpartition and the SUBDAG allImagers |
dagutils.py | based on the former imaging.py, intended to be a centralized source of definitions for all the stages in the imaging process |
MSpartition.py | simple script that imports dagutils.py and runs the module daggen, that partitions the original MS based on the maximum size of sub-MSes and generates the subDAG (allImagers.dag) with the corresponding jobs to produce the sub-images with all the sub-MSes |
MSpartition.htc | file containing job submission definitions to run MSpartition.py |
allImagers.dag | empty file at job submission, is written by MSpartition.py and called as SUBDAG by condor_imaging.dag after the completion of job MSpartition |
allImagers.py | simple script that imports dagutils.py and runs the module mkImage on the input (sub-)MS to create a (sub-)Image |
allImagers.htc | file containing job submission definitions to run allImagers.py |
What has not yet changed on "test_1":
- convolution functions are obtained manually prior to running the DAG
- although imaging parameters now have a unique source in dagutils.py, they still are not accessible as parameters for general imaging
- sub-images are not (yet) added
We successfully applied this approach to show partitioning of the MS, running independent gridding on each subMS and gathering all residual (sub)images into one residual image for deconvolution. The software modules, subMSes and images generated in each step are stored under directories 'test_2' and 'integrate_gatherimages'.
After adding subimages to generate a single image for deconvolution, we decided to do a separate implementation to improve our understanding and control of the breaking of imaging cycle and iteration control, without the added complexity of partitioning the MS and processing subimages.
Breaking of imaging cycle and iteration control
Related open-Jira ticket: https://open-jira.nrao.edu/browse/SCG-95
This implementation is built on the basis of the casa tools that run under the hood in tclean, in order to have more flexibility in breaking the imaging cycle into a modular approach. This is particularly interesting in the context of HTCondor to allow for shorter running jobs, so the main idea in the modular approach is that steps of the imaging process can run as separate jobs in a DAG, and that iteration control can be performed at the DAG level.
We have started this development track by implementing separate python functions that use casa tools to execute subsets of tclean functionalities (tools referred to in the list below are methods of imagerhelpers.imager_base.PySynthesisImager)
- makePB_PSF: this function computes the primary beam and the point spread function. It runs only once, before iterating on imaging cycles
- runResidualCycle: calls casa tool runMajorCycle() to grid the visibilities and generate (update) residual images
- runModelCycle: calls casa tool runMinorCycle() to iterate on PSF subtraction and derive model images from residual images
- stopIterations: performs the convergence tests implemented in hasConverged() to return a boolean used to stop imaging cycle iterations
- finalizeImages: after stopping imaging cycle iterations, performs PSF and primary beam corrections to generate final images
This is implemented in /lustre/aoc/sciops/fmadsen/tickets/scg-95/toolcalls/bin/imagingtools.py. Each of the functions described above was designed to run as a standalone CASA session, each of which will ultimately become a standalone HTCondor job called by a DAG. Initial testing of this software structure can be done on a shell script that calls each standalone CASA session and performs iteration control, so that CASA is only called when more iterations of the imaging cycle are required.