The data taken by the VLA and VLBA have various issues at times that need to be communicated to potential users of those data. These issues need to be documented in the metadata for those datasets such that the information is preserve for future utilization of those data. Recently the data issues and annotations from the legacy archive were migrated to the new NRAO archive and displayed with those data. However, there are still some obvious shortcomings in that DAs cannot annotated data themselves, they must make use of Jira tickets that take the time of the SSA developers to attach issue to metadata. Also, the legacy archive has a mechanism to automatically add annotations for some common issues, like missing BDFs. The new archive does not detect and annotate these issues.
We request the following:
- An annotation tool that can be used by the DAs to annotate data issues to individual VLA, VLBA, GBT, and ALMA datasets/correlations in the new archive.
- This tool should allow selection of multiple datasets for annotation of identical issues through various mechanisms.
- by a project code (for projects that have multiple datasets/correlations)
- by date range
- list of FSIDs
- by configuration
- by band
- The need for GBT and ALMA is to be able to alert users to uncorrected data issues from those observatories.
- e.g., ALMA renormalization issue, VLA restore bug
- The capability to remove data issues is also needed for when calibrations/data get fixed in the archive
- e.g., ALMA renormalization issue, VLA restore bug
- This is encapsulated in: WS-1392 - Getting issue details... STATUS
- This tool should allow selection of multiple datasets for annotation of identical issues through various mechanisms.
- The new archive needs to implement the automated annotations that the legacy archive has in place.
- These are expected to just be the detection of missing BDFs, but other functionality should also be replicated.
- e.g., see the data of the project 21A-147 observed on May 13, 2021 in legacy archive vs. new archive
- This work is encapsulated in: WS-1355 - Getting issue details... STATUS
- (added 03/28/2022) A second type of annotations are needed to describe data issues that prevented the pipeline from operating correctly.
- These should appear as an informational indicator (similar to the yellow '!' used for issues with the data themselves, but should appear associated with the Cals column)
- DAs should be able to enter free-format text that will be associated with the EB to document why the pipeline failed in this instance and corrective action a user could take
5 Comments
Frank Schinzel
Tickets relating to annotating data:
Edward Starr
Example use case:
For Project 22A-126, all 35 EBs had a resource setup that triggered a bug in MCAF software, causing an incomplete SysPower.bin. This caused a highly flagged requantizer gain table, which the pipeline applies by default, resulting in very high total flagging by the end of calibration, often up to 97%. The data is 8-bit and does not require the rq.tbl, so manual calibration is possible (or a modified pipeline). We decided to qaFail all EBs. The PI has been informed of the issue through a helpdesk ticket. We would like a way to warn future users of the problem. A hack solution would be to remove the calibration files, and archive the weblog alone with a qa note explaining the issue. We chose to not do this, since a standalone weblog with no cal files is not what we want in the archive in the long term. Additionally, the setup was spectral line and several EBs had zeros clipping >1%, which we might also note in the archive.
Drew Medlin
This is a potential place to also be able to upload a weblog to associate with these problems and the qa_notes.html file inside.
John Tobin
Do you mean a weblog as an example? Not the weblog associated with the dataset? This might still be the case where we need to link to a page outside the archive with more detailed information.
Drew Medlin
Weblog associated with the dataset. I thought it wouldn't likely be an option, but wanted to mention it just in case.