About the NRAO Archive Access Tool
The AAT/PPI (Archive Access Tool/Pipeline Processing Interface) is designed to be a replacement for the NRAO's current archive tool (https://archive.nrao.edu) that gives astronomers the ability to utilize the NRAO's processing clusters to manipulate observation data using CASA: this preliminary release provides the capability to run EVLA and ALMA observations through their respective calibration pipelines, additional capabilities will be forthcoming.
Additional features include:
- Two interfaces, basic and advanced.
- Responsive interface that works on mobile devices.
- Natural language (text based) searches of the observation's title, abstract, source names, authors and more, e.g. search on 'nova'.
- The 'my data' feature: login, press a button and see a list of your observations.
- If you link your ALMA and NRAO accounts, you can use either to access your observations.
- For ALMA and VLA you can download your SDM as a measurement set and filter by scan intent.
Testing and Feedback
We'd like testers to focus on authentication and authorization: before we allow outside access to this tool we'd like a high degree of confidence we won't be exposing proprietary data through it: we have tested it as well as we could but feel it could benefit from broader testing.
Any request for data during the proprietary period requires authentication and authorization: you need to log in and be attached to the observation, this goes for both download and reprocessing requests.
Any re-processing request requires authentication (we want to keep anonymous users from tying up cluster resources), if the data is in the proprietary period it also requires authorization. ALMA re-processing requires the user have an ALMA account, VLA reprocessing requires the user have an NRAO account
Successful requests result in the files being staged for download and the URL for them given to the user: this URL can't be guessed by other users, it is the key to the files. To share the files with other users, give them the URL. The URL will be active for at least five days after the request, though we will have to adjust that in the future based on storage space and demand.
- Users without special privileges should try to access sets they shouldn't be able to.
- Users should try to make sure they can access things they should be able to.
- Stale search indexes: the ALMA search index is re-built daily, so it should never be more than a day out of date: the VLA/VLBA/GBT search index is built less frequently and can be out of date by a week or more. We are working on it.
- Missing fields: the NRAO metadata database which we use to populate the search index spans four decades of observations, some of the observations have missing fields like array configuration or start and stop dates. We have no evidence of incorrect fields, just missing fields. There is no easy fix for the issue, we will have to extract each affected observation and update the database from the metadata in the files, we will start that after the software is released.
- Poor error feedback: there are many kinds of failures possible (observations that can't run through the calibration pipeline, observations that are missing some files, permissions errors writing files to user specified un-writable locations and so on), the AAT/PPI doesn't currently let the user know why something failed, and sometimes it won't let the user know it failed at all, requests never seem to complete. This is something we are actively working on.
- At this time only the calibration and flagging tables are downloadable after reprocessing, not the full calibrated measurement set.
- ALMA Cycle 0 and Cycle 1 observations are known to fail reprocessing, as are VLA observations before 2013, there is currently no feedback to the user that is the case. Users are advised to avoid reprocessing on these sets.
- GBT data delivery format: the SDFITS format for GBT data is not yet supported. GBT data can only be downloaded as a set of GBT-FITS files.
- The distributed applications that make up the AAT/PPI are fragile to database downtime, they don't recover well from it yet and have to be restarted to resume functioning: this is exacerbated by the fact that the system relies on several databases to work, not all of which are under the team's control. We are working to reduce the number of databases and to also make the system more robust.
Release Notes - 2.5.2 - 2017-05-04
- [SSA-3988] - Announcement banner / MOTD in the new archive
- [SSA-3991] - pre-populate download/reprocessing email address
- [SSA-3994] - filter by scan intent
- [SSA-3995] - front end touches
- [SSA-3998] - position input validation
Release Notes - 2.5.3 - 2017-05-12
- [SSA-4010] - Improve reliability of workflow-job signalling
- [SSA-4006] - problems under load/maui kills jobs
Release Notes - 2.5.4 - 2017-05-26
- [SSA-3878] - missing feature: NED/SIMBAD name resolution
- [SSA-3879] - missing feature: different formats for source position input
- [SSA-4003] - input position as HH:MM:SS.SSS