VLASS VIP script in HTCondor

I list each task here and what is necessary to run them in HTCondor. I am assuming this will be running without a shared filesystem and also without access to NRAO filesystems. So any call to /lustre/aoc or /users/<username> or other such things need to be altered to be site agnostic.

Every DAG or task creates .log, .out and maybe .png files that we want to keep. Also, .last files like tclean.last are often created. These are not necessary but can be usefull for debugging things. I assume that almost all tasks require the Measurement Set (MS). I question what tasks actually modify the MS. run_tclean() defaults to using the corrected datacolumn. Does that mean it is changing this column? The reference to datacolumn is stating which column it should read from, it does not imply any change to the MS. Task07 reference savemodel='datacolumn', this actually modifies the MS.

This document it not complete. I am sure I am missing inputs and perhaps outputs as well.

In this document, "data" when referenced as an input or an output is a directory containing the Measurement Set (E.g. VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms/)

How do we handle the want to start a job at a given task? For example, say a job ran to completion but you want to re-run the job after altering something in task17. It would be unfortunate to have to run tasks 1 through 16. It would be better to start and task17 and run through to the end of task25. To do this requires saving the output of each task. But how? Incremental or Differential? Using prolog and epilog scripts? Other?

The jobs are run in the working directory so any file references are relative to that.

This process doesn't need the SYSPOWER table. How can we remove it from the MS? Presumably we can just cp /dev/null SYSPOWER/table.f0 and cp /dev/null SYSPOWER/table.f0i

Does run_tclean need just the .psf directories or does it need more than that? Tclean will need all image types (suffixes) for the named image. For instance Task01 makes a set of 'iter0' images, task04 makes an 'iter1' set of images. Task 5 references both. It would be acceptable to pass images of iter0* and iter1* but in practice it only needs the PSF from both so something like iter0*.psf and iter1*.psf should work.

How do we transfer input files for each DAG?

explicity list every file/directory in transfer_input_files (it doens't grok regexps). This would be a large list . E.g.
- transfer_input_files = "working/VIP_iter0.gridwt, working/VIP_iter0.pb.tt0, working/VIP_iter0.psf.tt0, working/VIP_iter0.psf.tt1, working/VIP_iter0.psf.tt2, working/VIP_iter0.sumwt.tt0, working/VIP_iter0.sumwt.tt1, working/VIP_iter0.sumwt.tt2, working/VIP_iter0.weight.tt0, working/VIP_iter0.weight.tt1, working/VIP_iter0.weight.tt2"
Can transfer_input_fies take a manifest? E.g a file containing the list of files to transfer
Make a temporary director on the submit host, and transfer that (possibly tarring it up)
Set the inputs and outputs for both data and working as a variables in the unified DAG file. The task.sh script uses rsync to merge the various data_inputs together into one data directory and the various working_inputs together into one working directory. Then at the end, task.sh moves data to data-<dagstep> and working to working-<dagstep> and the appropriate dirs/files from these are transferred back to the submithost. The result of all this is that the data needed as an input for a step (E.g. Task08) may need to be combined from multiple places (initial data and data output from Task07)

Task01

Doesn't alter the MS

run_tclean( 'iter0', cfcache=cfcache_nowb, robust=-2.0, uvtaper='3arcsec', calcres=False )

input: ../data
Input: cfcache_nowb='/mnt/scratch/cfcache/cfcache_spw2-17_imsize16384_cell0.6arcsec_w32_conjT_psf_wbawp_False.cf'
output: VIP_iter0.*

Task02

This tasks creates VIP_iter0b.* but I don't see those files ever referenced in this script again. What does this task do that is necessary to other tasks? Josh Marvil said that this is a leftover task and can be removed.

~~Doesn't alter the MS~~

~~run_tclean( 'iter0b', cfcache=cfcache_nowb, calcres=False )~~

~~input: ../data~~
~~input: cfcache_nowb='/mnt/scratch/cfcache/cfcache_spw2-17_imsize16384_cell0.6arcsec_w32_conjT_psf_wbawp_False.cf'~~
output: VIP_iter0b.*

Task03

This task doesn't parallelize and only takes tens of seconds to run. Should this be stuck on the end of task01?

Doesn't alter the MS

mask_from_catalog(inext=inext,outext="QLcatmask.mask",catalog_search_size=1.5,catalog_fits_file='../VLASS1Q.fits')

input: ../data
input: ../VLASS1Q.fits, VIP_iter0.psf.tt0
output: mask_from_cat.crtf, VIP_QLcatmask.mask

Task04

Doesn't alter the MS

run_tclean( 'iter1', robust=-2.0, uvtaper="3arcsec" )

input: ../data
output: VIP_iter1.*

Task05

Doesn't alter the MS

replace_psf('iter1','iter0')

This is just some python that deletes VIP_iter1.psf.* and copies VIP_iter0.psf.* to VIP_iter1.psf.*. It is inefficient to ever make this task be its own DAG. I suggest it always be in the same DAG as Task04. Will produce an error because *.workdirectory doesn't exist but that error is ignorable.

input: VIP_iter0.psf.*, VIP_iter1.psf.*
output: VIP_iter1.psf.*

Task06

Doesn't alter the MS

run_tclean( 'iter1', robust=-2.0, uvtaper="3arcsec", niter=20000, nsigma=5.0, mask="QLcatmask.mask", calcres=False, calcpsf=False )

input: ../data
input: VIP_iter1.*, VIP_QLcatmask.mask
output: VIP_iter1.*

Task07

Alters the MS

run_tclean( 'iter1', calcres=False, calcpsf=False, savemodel='modelcolumn', parallel=False )

Task08

Alters the MS

Tasks 08, 09, 10 and 11 take only minutes to run so could be combined into one DAG step.

flagdata(vis=vis, mode='rflag', datacolumn='residual_data',timedev='tdev.txt',freqdev='fdev.txt',action='calculate')

replace_rflag_levels()

flagdata(vis=vis, mode='rflag', datacolumn='residual_data',timedev='tdev.txt',freqdev='fdev.txt',action='apply',extendflags=False)

flagdata(vis=vis, mode='extend', extendpols=True, growaround=True)

input: ../data
output: tdev.txt,. fdev.txt
ouput: ../data
output: ../data/VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms.flagversions

Task09

Alters the MS

statwt(vis=vis,combine='field,scan,state,corr',chanbin=1,timebin='1yr', datacolumn='residual_data' )

input: ../data
output: ../data
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms.flagversions/FLAG_VERSION_LIST
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms.flagversions/flags.statwt_1/table.f1
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms.flagversions/flags.statwt_1/table.dat
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms.flagversions/flags.statwt_1/table.f0
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms.flagversions/flags.statwt_1/table.lock
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms.flagversions/flags.statwt_1/table.info
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms.flagversions/flags.statwt_1/table.f0_TSM1
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms/table.f25_TSM1
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms/table.f22_TSM1
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms/table.f6
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms/HISTORY/table.f0
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms/HISTORY/table.lock
- VLASS1.2.sb36491855.eb36574404.58585.53016267361_split.ms/table.lock

Task10

Doesn't alter the MS

gaincal(vis=vis,caltable='g.0',gaintype='T',calmode='p',refant='0',combine='field,spw',minsnr=5)

input: ../data
output: g.0

Task11

Alters the MS

applycal(vis=vis,calwt=False,applymode='calonly',gaintable='g.0',spwmap=18*[2], interp='nearest')

Task12

Doesn't alter the MS

run_tclean( 'iter0c', datacolumn='corrected', cfcache=cfcache_nowb, robust=-2.0, uvtaper='3arcsec', calcres=False )

input: ../data
output: VIP_iter0c.*

Task13

Doesn't alter the MS

run_tclean( 'iter0d', datacolumn='corrected', cfcache=cfcache_nowb, calcres=False )

input: ../data
output: VIP_iter0d.*

Task14

Doesn't alter the MS

run_tclean( 'iter1b', datacolumn='corrected', robust=-2.0, uvtaper="3arcsec" )

input: ../data
output: VIP_iter1b.*

Task15

Doesn't alter the MS

replace_psf('iter1b','iter0c')

This is just some python that deletes VIP_iter1b.psf.* and copies VIP_iter0c.psf.* to VIP_iter1b.psf.*. It is inefficient to ever make this task be its own DAG. I suggest it always be in the same DAG as Task14. Will produce an error because *.workdirectory doesn't exist but that error is ignorable.

input: VIP_iter1b.psf.*, VIP_iter0c.psf.*
output: VIP_iter1b.psf.*

Task16

Doesn't alter the MS

run_tclean( 'iter1b', datacolumn='corrected', robust=-2.0, uvtaper="3arcsec", niter=20000, nsigma=5.0, mask="QLcatmask.mask", calcres=False, calcpsf=False )

input: ../data
input: VIP_iter1b.*, VIP_QLcatmask.mask
output: inter1b

Task17

imsmooth(imagename=imagename_base+"iter1b.image.tt0", major='5arcsec', minor='5arcsec', pa='0deg', outfile=imagename_base+"iter1b.image.smooth5.tt0")

input: ../data
input: VIP_iter1b.image.tt0
output: VIP_iter1b.image.smooth5.tt0

Task18

exportfits(imagename=imagename_base+"iter1b.image.smooth5.tt0", fitsimage=imagename_base+"iter1b.image.smooth5.fits")

input: ../data
input: VIP_iter1b.image.smooth5.tt0
output: VIP_iter1b.image.smooth5.fits

Task19

subprocess.call(['/users/jmarvil/scripts/run_bdsf.py', imagename_base+'iter1b.image.smooth5.fits'],env={'PYTHONPATH':''})

This needs some modification. It calls a script from Josh's homedir and runs bdsf out of /lustre.

Task20

edit_pybdsf_islands(catalog_fits_file=imagename_base+'iter1b.image.smooth5.cat.fits')

mask_from_catalog(inext=inext,outext="secondmask.mask",catalog_fits_file=imagename_base+'iter1b.image.smooth5.cat.edited.fits', catalog_search_size=1.5)

input: VIP_iter1b.image.smooth5.cat.fits
input: VIP_iter1b.image.smooth5.cat.edited.fits
output: secondmask.mask

Task21

immath(imagename=[imagename_base+'secondmask.mask',imagename_base+'QLcatmask.mask'],expr='IM0+IM1',outfile=imagename_base+'sum_of_masks.mask')

im.mask(image=imagename_base+'sum_of_masks.mask',mask=imagename_base+'combined.mask',threshold=0.5)

input: secondmask.mask, QLcatmask.mask
output: sum_of_masks.mask
input: sum_of_masks.mask
output: combined.mask

Task22

run_tclean( 'iter2', datacolumn='corrected' )

input: ../data
output: VIP_iter2.*

Task23

replace_psf('iter2', 'iter0d')

This is just some python that deletes VIP_iter2.psf.* and copies VIP_iter0d.psf.* to VIP_iter2.psf.*. It is inefficient to ever make this task be its own DAG. I suggest it always be in the same DAG as Task22.

input: VIP_iter2.psf.*, VIP_iter0d.psf.*
output: VIP_iter2.psf.*

Task24

run_tclean( 'iter2', datacolumn='corrected', scales=[0,5,12], nsigma=3.0, niter=20000, cycleniter=3000, mask="QLcatmask.mask", calcres=False, calcpsf=False )

input: ../data
input: VIP_iter2.*, QLcatmask.mask
VIP_iter2.*

Task25

os.system('rm -rf *.workdirectory')

os.mkdir('iter2_intermediate_results')

os.system('cp -r *iter2* iter2_intermediate_results')

shutil.rmtree(imagename_base+'iter2.mask')

shutil.copytree(imagename_base+'combined.mask',imagename_base+'iter2.mask')

run_tclean( 'iter2', datacolumn='corrected', scales=[0,5,12], nsigma=3.0, niter=20000, cycleniter=3000, mask="", calcres=False, calcpsf=False )

This does some file cleaning and then runs run_tclean. Where do we want to do that file cleaning? In the previous task? On the submit host?

input: ../data
input: VIP_iter2.*
output: VIP_iter2.*

Space shortcuts

Page tree

Task01

Task02

Task03

Task04

Task05

Task06

Task07

Task08

Task09

Task10

Task11

Task12

Task13

Task14

Task15

Task16

Task17

Task18

Task19

Task20

Task21

Task22

Task23

Task24

Task25