Requirements for SSA work-around scripts for pilot

Created by Mark Lacy, last modified by James Sheckard on May 15, 2019

VLA calibration workflow:

Pipeline rerun script:
- DA needs to be able to rerun the pipeline in the correct environment with added flag template file(s) and possibly a modified PPR.
  - Input: execution block ID, (optional) modified PPR, flagtemplate.txt
  - Results: new pipeline run using modified PPR and/or flagtemplate.txt (if supplied)
Archive ingest script:
- needs to enable the DA to add a QA report file (text) to the weblog, set an SRDP QA state (pass/fail/null) and ingest the calibration products and the flagtemplate.txt and ~~flagtargetstemplate.txt~~ file (if created by the DA) for a given EB into the archive.
  - Input: Pipeline run ID, QAFLag=[Pass/Fail/Null], qa_report.txt (optional) flagtemplate.txt ~~flagtargetstemplate.txt~~
  - Results: content of products directory for the pipeline run are ingested into the archive with the correct metadata corresponding to the relevant EB, QAFlag is registered in the archive database, QA report (text file) is added to the weblog and the weblog is archived. If added, flagtemplate.txt ~~and flagtargetstemplate.txt~~ are also ingested.
Cleanup script:
- run after archive ingest. Checks data successfully ingested into the archive, deletes pipeline run from disk.
  - Input: Pipeline run ID
  - Result: if EB was correctly ingested, proceed to delete the pipeline directory. If not, kick out with an error: "The data products are not in the archive, please wait and try again. If failures are repeated, please file an SSA ticket."

ALMA User-initiated imaging workflow:

Pipeline completion notification:
- Notification should be sent to the SRDP operations manager upon completion of a pipeline job, so that they may assign the QA to a DA.
  - Input: none
  - Output: email notification to SRDP operations manager upon completion of a user-initiated imaging job, containing pipeline run ID.
Pipeline rerun script:
- DA needs to be able to rerun the pipeline in the correct environment with optional added flag target template file and possibly a modified PPR.
  - Input: MOUS UID, (optional) modified PPR, flagtargetstemplate.txt
  - Results: new pipeline run using modified PPR and/or flagtargetstemplate.txt (if supplied)
Archive ingest script:
- needs to enable the DA to add a QA report file (text) to the weblog and ingest the imaging products for a given MOUS into the archive.
  - Input: Pipeline run ID, qa_report.txt (optional) flagtargetstemplate.txt
  - Results: content of products directory for the pipeline run are ingested into the archive with the correct metadata corresponding to the relevant MOUS, QAFlag is registered in the archive database as "Pass", QA report (text file) is added to the weblog and the weblog is archived. If added, flagtargetstemplate.txt is also ingested.
Cleanup script:
- run after archive ingest. Checks data successfully ingested into the archive, deletes pipeline run from disk.
  - Input: Pipeline run ID
  - Result: if MOUS was correctly ingested, proceed to delete the pipeline directory. If not, kick out with an error: "The data products are not in the archive, please wait and try again. If failures are repeated, please file an SSA ticket."

The work to put all of this in place is being tracked via: SSA-5599 - Getting issue details... STATUS

No labels

16 Comments

James Sheckard
Much of what is desired for the VLA currently exists in some form. I'll lay out my understanding of the VLA process and what the existing tools do, so we can refine what might need to change.
Once an observation has completed ingestion, the CIPL workflow is launched. That tool does the following:
1. Sets up the working directory & email the DAs to notify them that the process has begun
2. Fetches the data from NGAS
3. Runs CASA with the appropriate recipe
4. Based on the results of the CASA run:
  Success: move the products/working/rawdata directories to an identically named directory in the qa area & email the DAs to let them know it is ready for review
  Failure: notify the DAs to let them know about the failure so they can assess it.
The DAs perform their assessment of the data in the results of the pipeline run. My experience is largely with what happens to data in the qa area (which is typically of more interest than the failed runs which remain in the processing area). Those are either 'Passed' or 'Failed' with the appropriate scripts. Both cause a set of database updates to the DB & launch a workflow.
The qaPass command:
1. Initiates a workflow which:
  prepares the calibration TAR by collecting all files that are neither FITS nor an unzipped weblog from the products/ subdirectory (including any additions by the DAs).
  moves the working directory back to the processing area (so it will be automatically deleted, once enough time has passed)
  caches the calibrated measurement set for this observation (also automatically removed after a period of time)
  ingests metadata about the tar file and if appropriate, place the tar file into NGAS
  signals whether the workflow completed successfully
2. Upon receipt of the signal regarding the workflow's outcome:
  upon successful completion, the EB is marked 'Calibrated', and the project is re-indexed to display the new calibration & allow the restore option in the archive front-end.
  the case where the calibration ingestion workflow fails is not as well handled. Typically it is indicated by the calibration failing to display, or I catch it in my monitoring of the system.
The qaFail command:
1. marks the EB as 'Do Not Calibrate'
2. initiates a workflow to remove the working directory from the qa area
  if the workflow does not initiate, or fails for some reason, typically the DAs will manually remove the directory.
Now a more detailed discussion about the VLA requirements above:
1. re-run pipeline script
  This exists, but it does not currently get used. In all likelihood it has drifted out of date with the rest of the system. I'll explore that and get it back into working order.
  As it currently stands, it takes the directories (complete with DA modifications) and runs a modified version of the CIPL workflow (mostly data movement, notifications, and re-executing CASA)
2. Archive ingest script:
  The 'qaPass' command line tool is used for this aspect. Note in the description above: any additional files added to the calibration will be gathered in the process.
  There is already a capacity for a set of notes written by the DA to be included (and displayed in the weblog automatically). That likely covers the requirements for the qa report file.
3. Much of the working area used by the AAT/PPI (including the processing area, but NOT the qa area) is already monitored and old items deleted on a (DA-specifiable timescale). Since the observer can request their calibrated data for up to 3 weeks post-calibration, this system has worked rather well to date.
- Permalink
- May 14, 2019
James Sheckard
The ALMA Optimized Imaging looks to be headed toward a similar work pattern as that for the VLA calibrations. That will impact the design of the full workflow (after the basics are validated with the CLI). Is the VLA pattern the one SRDP is adopting here?
1. Pipeline Completion Notification
  The basic imaging CLI already allows for a notification email to be sent as requested. This uses an infrastructure built around the workflow system after the fact, and is somewhat limited. If more detailed feedback is required, the notification could be built into the full workflow (a la the CIPL case).
2. Pipeline Re-run Script
  This is certainly doable, and it may be possible to generalize the existing workflow & CLI (yet to be seen).
  The procedure for this process is subject to more complete requirements for the initial workflow.
3. Archive Ingest Script
  There is already a low-level script which will archive a set of FITS images and a tar bundle of 'associated files' from the products directory of a processing workflow.
  The workflow it initiates currently operates on the assumption of all the FITS files being equal members of the 'Image Set'. With the addition of the Products System ( SSA-5330 - Getting issue details... STATUS ), it will be necessary for SRDP to define which results are science products and which are ancillary.
4. Cleanup Script
  The processing area for ALMA data is handled the same was as that for the VLA. There is an automatic expiration of old data, on an easily-modified timescale (simply editing a text file).
  The existing 'clean-up' workflow could have a direct CLI interface, and be modified to remove any assumptions about the location upon which it works if that is required.
- Permalink
- May 14, 2019
Mark Lacy
Thanks Jim,
For the VLA workflow, it sounds like the current rerun script will do the job once you can confirm that it is working.
For the archive ingest, ideally we would want a new QA state or column in the database that would indicate SRDP QA Pass/Fail, I understand though that it would be extra work, and I seem to remember that we agreed with Stephan that provided the SRDP QA was included in the comments entered by the DA then SSA would parse those and add the field later.
So with those caveats I think we are in good shape.
- Permalink
- May 15, 2019
Mark Lacy
(I just noticed that I still had the flagtargetstemplate.txt listed in the VLA calibration workflow - we have since realized that we just need flagtemplate.txt (as target flags can be included in that) - so I have struck out that from the VLA requirements.)
- Permalink
- May 15, 2019
Mark Lacy
For the ALMA workflow:
Indeed the work pattern for ALMA is similar to the VLA one, but rather than a calibration being automatically kicked off upon receipt of new data, the ALMA pipeline is for imaging only, and is triggered by a user interaction with the AAT. We will provide a list of primary and ancillary products. We will also need the ability to add a QA report to the weblog, it sounds like this would be easy given that it is already possible for the VLA calibration workflow.
- Permalink
- May 15, 2019
James Sheckard
The re-run script for the VLA has a minor flaw: It looks like we overwrite the PPR before running CASA again. That should be fairly easy to fix.
I've started an Epic ticket to collect what's needed for these tools.
- Permalink
- May 15, 2019
James Sheckard
Question on the initial handling of SRDP QA Information for VLA Calibrations:
Right now, the database structures are not in place for this to be persisted, so we need to preserve the information in the Tar files for later extraction. I would suggest a simple format for the information to be appended to the existing qa_notes.txt (I think that's the name, Drew/Nick would know for sure). I realize this puts more onus on the DAs, but they're also more flexible than the software (since time is an issue).
From Science Product QA System Requirements it looks as though we'll want (at minimum)
qa_analyst:
qa_score:
qa_standard:
(qa_status: can be derived from the EB's calibration_status value (Calibrated = Pass, Do Not Calibrate = Fail, otherwise NULL)
(date: can be obtained from the ingestion_time of the tar file)
(comments: are any other contents of the file)
Is that correct? Are there more fields we want, or some fields we can default (qa_standard, perhaps?)
- Permalink
- May 17, 2019
Mark Lacy
That seems like a sensible plan - I will write a template and attach it to the workflow page at obsolete: QA Checklist - VLA Calibration. There are some subtleties we need to consider - right now there is no QA score evaluated, I suggest we just default to 1 for a pass and 0 for a fail. QA standard will be "SRDP Pilot" (I'll hard-wire that into the template). QA status will need to be set, as we may have a situation where it would still be useful to ingest the caltables with the "qa2pass" script, but the data will fail SRDP QA (for example, if too much is flagged). So I propose as a template:
qa_standard: SRDP Pilot <br>
qa_analyst: Your Name Here <br>
qa_status: Pass/Fail <br>
qa_comments: Add comments if needed
(Then other fields are defaulted as: qa_score=1 or 0 for pass or fail; date is the ingest date)
- Permalink
- May 17, 2019
Drew Medlin
I was thinking all the initial QA report stuff from the DAs would go into qa_notes.html as this all gets archived.
- Permalink
- Jun 10, 2019
Drew Medlin
I've even been using this as my plan when training the new DA (AL) and that we would have a text template as we do with VLASS QA for calibration and/or imaging pipelines. We can then paste this into qa_notes.html and fill in our findings per the checklist, along with our initials, date, and an overall finding/note section ... just like in VLASS.
- Permalink
- Jun 10, 2019
Mark Lacy
Agreed, this was also how my thinking was evolving, having seen that the content of qa_notes.html does get placed fairly prominently in the weblog, I don't think there's any need for a separate QA report like there is for ALMA.
- Permalink
- Jun 10, 2019
Stephan Witz
Given that agreement, where are we with the scripts as they stand? Our intention is to go after the workspace next (and the workspace is kind of the foundation of the QA system), so I'd rather not sink too much effort into the scripts now. Can we live with what we have?
- Permalink
- Jun 10, 2019
Mark Lacy
Hi Stephan Witz - for VLA, we are just missing the pipeline rerun script so we can rerun the VLA pipeline with manually-added flags, and possibly an edited flux.csv and/or a modified PPR. For ALMA imaging, we are in a much worse state, none of the scripts in the requirements above exist (though some might be fairly easy modifications to the VLA ones.
- Permalink
- Jun 10, 2019
1. Stephan Witz
  The ticket veered off into speculation about directory names, can you be more specific about what we missed?
  Permalink
  
  Jun 10, 2019
James Sheckard
VLA: The script interface for the pipeline re-run workflow is in place (and has been for years). That workflow doesn't do quite what's desired. The problems which exist are, to my knowledge, amenable to manual workaround while we hammer out requirements.

ALMA: There ARE command line tools for initiating both imaging and reimaging ALMA cubes, but they are currently restricted to the development system. There will shortly be direct launch wrapper around our 'go delete that from qa' workflow (again, in the development environment). Ingestion of ALMA cubes is somewhat complicated because the base ingestion functionality is undergoing change for the products system. I'm in the process of building the supporting infrastructure.
- Permalink
- Jun 10, 2019
Mark Lacy
The basic workflow is:
1) Pipeline runs for the first time, creates a directory, PPR and flux.csv file
2) Data analyst makes a flagtemplate.txt file (manually) listing the flags to be added, possibly edits the flux.csv file, and possibly edits the PPR.
3) (what we need the script for) the pipeline is rerun, with the flagtemplate.txt, flux.csv and edited PPR being used in the pipeline run.
How you chose to implement this is really up to you, Drew and Jim were discussing a specific implementation where the old working directory was moved (to e.g. old_working_n) and the pipeline rerun in the original directory (where it automatically picks up the flagtemplate.txt, flux.csv and edited PPR files if present and uses them). Alternatively (the ALMA and I think theVLASS model) a new pipeline directory could be created, and the three files picked up from a central repository or database.
- Permalink
- Jun 10, 2019

Space shortcuts

Page tree

VLA calibration workflow:

ALMA User-initiated imaging workflow:

16 Comments

James Sheckard

James Sheckard

Mark Lacy

Mark Lacy

Mark Lacy

James Sheckard

James Sheckard

Mark Lacy

Drew Medlin

Drew Medlin

Mark Lacy

Stephan Witz

Mark Lacy

Stephan Witz

James Sheckard

Mark Lacy