The AUDI imaging task will sometimes crash, most commonly when the restore on an early Cycle 5 dataset fails. For that specific case, a workaround known as "The Kludge" has been written by SSA. Other failure modes may still need manual runs. For fully manual runs, a dedicated node (currently cvpost016) is needed to run the interactive job, as the almapipe credentials are only supported on one processing node at a time.
Step-by-step guide - "Classic" Kludged runs - restore with 5.1.1-5
- Run will send fail email with error code 2 (note that error code 2 can also refer to other issues such as incomplete ASDM downloads, so double-check that the ASDMs in the raw directory do not have any ASDMBinary files ending .missing)
- Starting in the spool/xxxx/xxxx/working directory, chmod the directory to make is group-writeable (chmod g+w working), then run almaReimageCube with specified restore and imaging versions (full path needed) and supply the job ID and directory uid (not the MOUS uid) to the --request parameter e.g. almaReimageCube --restore_casa /home/casa/packages/RHEL7/release/casa-release-5.1.1-5 --image_casa /home/casa/packages/RHEL7/release/casa-6.5.4-9-pipeline-2023.1.0.124 New:
/home/casa/packages/RHEL8/release/casa-6.6.1-17-pipeline-2024.1.0.8
--request 475229560 uid___A002_Xc89480_X1a40 (note this only works in pipelines that have the separate imaging recipe, CASA 6+) - The run should terminate as usual and the usual QA should be possible.
Step-by-step guide - Kludged restores for renorm issue - restore with 6.4.1
- Run will send fail email with error code 2 (note that error code 2 can also refer to other issues such as incomplete ASDM downloads, so double-check that the ASDMs in the raw directory do not have any ASDMBinary files ending .missing)
- Starting in the spool/xxxx/xxxx/working directory, chmod the directory to make is group-writeable (chmod g+w working), then run almaReimageCube with specified restore and imaging versions (full path needed) and supply the job ID and directory uid (not the MOUS uid) to the --request parameter e.g. almaReimageCube --restore_casa /home/casa/packages/RHEL7/release/casa-6.4.1-12-pipeline-2022.2.0.68 --image_casa /home/casa/packages/RHEL7/release/casa-6.5.4-9-pipeline-2023.1.0.124 --request 475229560 uid___A002_Xc89480_X1a40 (note this only works in pipelines that have the separate imaging recipe, CASA 6+)
- The run should terminate as usual and the usual QA should be possible.
Step-by-step guide - Kludged restores for Session mapping bug - restore with 5.4.2-8
- Run will send fail email with error code 2 (note that error code 2 can also refer to other issues such as incomplete ASDM downloads, so double-check that the ASDMs in the raw directory do not have any ASDMBinary files ending .missing)
- Starting in the spool/xxxx/xxxx/working directory, run almaReimageCube with specified restore and imaging versions (full path needed) and supply the job ID and directory uid (not the MOUS uid) to the --request parameter e.g. almaReimageCube --restore_casa /home/casa/packages/RHEL7/release/casa-release-5.4.2-8 --image_casa /home/casa/packages/pipeline/casa-6.1.1-10-pipeline-2020.1.0.36 --request 475229560 uid___A002_Xc89480_X1a40 (note this only works in pipelines that have the separate imaging recipe, CASA 6+)
- The run should terminate as usual and the usual QA should be possible.
Step-by-step guide - fully manual runs
- Run the restore on the dataset using the appropriate version of CASA.
- Copy the <uid>casapipescript.py from the original failed imaging run's spool/<uid>/products directory into the working directory of the restore.
- Edit the imaging casapipescript to replace hifa_restore for the raw ASDM with hifa_importdata(vis=<calibrated msname[s]>, session=[sessionid], dbservice=False) for the restored MS, also add hifa_exportdata(imaging_products_only=True) at the end.
- Remove the products directory from the restore run.
- start casa --pipeline
- execfile('<uid>casapipescript.py')
- Make a <jobid>/<uid> directory in the image-qa area
- copy the rawdata, products and working into the image-qa area
- Do QA and run audiPass in the usual way.
Step-by-step guide - large cubes
- Run the restore on the dataset using the appropriate version of CASA or grab the mses from the working directory of the failed run.
- Typically the first attempt will have failed in findcont, need to rerun - options are: (1) download the cont.dat file from the ALMA archive (in auxproducts), rerun without findcont or (2) insert hif_productsize into the script or PPR, and see it it will rerun with mitigations (this can also be used to generate a casa_pipescript template for option (1)) then copy the cont.dat file from that.
- Typically you will need to interact with the user to find acceptable mitigations (e.g. smaller image size).
- Then proceed as for fully manual runs above.