Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

About the NRAO Archive Access Tool

Table of Contents

  • Summary of Changes since Phase 2.0
  • About the NRAO Archive Access Tool

  • Searching
  • Authentication and Authorization

  • Reprocessing

  • Known Issues

    • Search Index and Database

    • Performance

    • Reprocessing Errors

  • Reporting Bugs (NRAO staff)

  • Notes to the Users Committee

Summay of Changes since Phase 2.0

  • Some planned changes
  • Some random changes

Introduction

This is the updated NRAO Science Data Archive. It is a tool to provide access to the vast repository of astronomical data we’ve collected over the years, both public data and data still in the proprietary period: data you have proposed for and observed or been given access to.

We are also offering the option to reprocess raw data and retrieve the results. Using the resources we have available, we can help you reduce your data.

This release is the culmination of the Phase 2.5 development effort. Everything present in Phases I and II is still present and operational. The focus in this release has been on 2 functional areas: data delivery and incorporation of workflow processing in Charlottesville.

Searching

The tool offers both a  basic interface and an  advanced interface form with a multitude of fields you can search on, including:

  • Telescope

    • ALMA array type (12m, 7m, etc)

    • VLA array configuration

    • ALMA maximum resolvable scale

    • GBT backends and receivers

  • Receiver band

  • Polarizations

  • Start/end date

  • Observation ID

  • Project code / title / abstract / PI / authors

  • Cone search source position with RA/Dec, coordinate system, and search radius

  • Full-width/half-max

  • Spatial/spectral resolution

  • Low/high frequency

  • Exposure time

In the  advanced interface each search field is  ANDed together, multiple selections within a field are  ORed , for example if you select Telescope: VLA, VLA Array Configurations: A, D, and Start Date: 2015-01-01, you are searching for any VLA observation that started on or after January 1, 2015 and was in A or D configurations.

The  basic interface offers a quick way to search the archive for specific topics, authors, source names and so on, and checks the text you provide against all of the fields above.

Authentication and Authorization

It has always been possible to submit download requests anonymously for non-proprietary data. You may now also log in and download or reprocess your proprietary data or data that you have been granted access to.

The archive also supports SSO level 2, so you may authenticate with either your ALMA or NRAO account. If the accounts are linked at ALMA, you will see the same stuff either way. 

Data Delivery

For Phase 2.5, downloaded data formats include those provided by the current NRAO Archive. This includes raw data formats for the ELVA, ALMA, VLBA, GBT, and legacy VLA. In addition, the ability to download the EVLA or ALMA data as a CASA measurement set (MS) is provided. In the creation of the MS, options are provided to:

  • apply flags generated during observing
  • choose online averaging (Spectral, Time Averaging)
  • Select scans for MS
  • Deliver the data as a tar file

In addition, AUI staff have the option of having the data delivered to a directory that they specify in addition to the default location in /lustre.

Workflow Processing in Charlottesville

A workflow for downloading and processing data hosted in Charlottesville was incorporated. Data hosted in the Charlottesville archive, including ALMA and GBT data, are fetched and processed via the Charlottesville workflow and the data is delivered to the Charlottesville file system.

Reprocessing

Reprocessing requests are now supported for raw data. ALMA requests and VLA requests for data taken after 2013 are functional (Jansky VLA sets from before 2013 had issues with intents  and can't be reprocessed without manual intervention.

Known Issues

Search Index and Database

The Archive seaches a combined index of ALMA and non-ALMA observations, this index is built from a database that has certain inconsistencies. We are in the middle of reimplementing the database and refining the procedure that builds it from the legacy database, meanwhile you can find results that are missing essential fields (start or stop dates, array configurations and so on). We have no evidence of observations with incorrect fields, just fields that are vacant.

Performance

Unlike the ALMA archive, the NRAO archive stages downloads before it allows access to them, meaning the files composing a file set are pulled from long term storage and assembled before presenting the option to download the files. As a consequence requests take longer than users of the ALMA system might expect.

Reprocessing Errors

The NRAO Archive offers primitive re-processing capabilities, meaning it can re-run raw data files through the ALMA or NRAO calibration pipeline and allow downloading the results. These features are very much a work in progress, one area that needs improvement is error reporting, the Archive currently allows users to reprocess data sets that are known to be bad, and there is no feedback to the user except that the request is complete and produced no files to download.

Data Delivery Formats

...

Table of Contents
minLevel3

Introduction

The AAT/PPI (Archive Access Tool/Pipeline Processing Interface) is designed to be a replacement for the NRAO's current archive tool (https://archive.nrao.edu) that gives astronomers the ability to utilize the NRAO's processing clusters to manipulate observation data using CASA: this preliminary release provides the capability to run EVLA and ALMA observations through their respective calibration pipelines, additional capabilities will be forthcoming.

Additional features include:

  • Two interfaces, basic and advanced.
  • Responsive interface that works on mobile devices.
  • Natural language (text based) searches of the observation's title, abstract, source names, authors and more, e.g. search on 'nova'.
  • The 'my data' feature: login, press a button and see a list of your observations.
  • If you link your ALMA and NRAO accounts, you can use either to access your observations.
  • For ALMA and VLA you can download your SDM as a measurement set and filter by scan intent.

Testing and Feedback

We'd like testers to focus on authentication and authorization: before we allow outside access to this tool we'd like a high degree of confidence we won't be exposing proprietary data through it: we have tested it as well as we could but feel it could benefit from broader testing.

Any request for data during the proprietary period requires authentication and authorization: you need to log in and be attached to the observation, this goes for both download and reprocessing requests.

Any re-processing request requires authentication (we want to keep anonymous users from tying up cluster resources), if the data is in the proprietary period it also requires authorization. ALMA re-processing requires the user have an ALMA account, VLA reprocessing requires the user have an NRAO account

Successful requests result in the files being staged for download and the URL for them given to the user: this URL can't be guessed by other users, it is the key to the files. To share the files with other users, give them the URL. The URL will be active for at least five days after the request, though we will have to adjust that in the future based on storage space and demand.

  • Users without special privileges should try to access sets they shouldn't be able to.
  • Users should try to make sure they can access things they should be able to.

Please report your feedback to Mark Lacy (mlacy@nrao.edu), Claire Chandler (cchandle@nrao.edu) and Stephan Witz (switz@nrao.edu).

Known Issues

  • Stale search indexes: the ALMA search index is re-built daily, so it should never be more than a day out of date: the VLA/VLBA/GBT search index is built less frequently and can be out of date by a week or more. We are working on it.
  • Missing fields: the NRAO metadata database which we use to populate the search index spans four decades of observations, some of the observations have missing fields like array configuration or start and stop dates. We have no evidence of incorrect fields, just missing fields. There is no easy fix for the issue, we will have to extract each affected observation and update the database from the metadata in the files, we will start that after the software is released.
  • Poor error feedback: there are many kinds of failures possible (observations that can't run through the calibration pipeline, observations that are missing some files, permissions errors writing files to user specified un-writable locations and so on), the AAT/PPI doesn't currently let the user know why something failed, and sometimes it won't let the user know it failed at all, requests never seem to complete. This is something we are actively working on.
  • At this time only the calibration and flagging tables are downloadable after reprocessing, not the full calibrated measurement set.
  • ALMA Cycle 0 and Cycle 1 observations are known to fail reprocessing, as are VLA observations before 2013, there is currently no feedback to the user that is the case. Users are advised to avoid reprocessing on these sets.
  • GBT data delivery format: the SDFITS format for GBT data is not yet supported. GBT data can only be downloaded as a set of GBT-FITS files.

Reporting Bugs (NRAO Staff)

...

  • The distributed applications that make up the AAT/PPI are fragile to database downtime, they don't recover well from it yet and have to be restarted to resume functioning: this is exacerbated by the fact that the system relies on several databases to work, not all of which are under the team's control. We are working to reduce the number of databases and to also make the system more robust.

Release Notes - 2.5.2 - 2017-05-04

HTML
<h4>        Feature
</h4>
<ul>
<li>[<a href='SSA-3988https://open-jira.nrao.edu/browse/SSA-3988'>SSA-3988</a>] -         Announcement banner / MOTD in the new archive
</li>
<li>[<a href='SSA-3991https://open-jira.nrao.edu/browse/SSA-3991'>SSA-3991</a>] -         pre-populate download/reprocessing email address
</li>
<li>[<a href='SSA-3994https://open-jira.nrao.edu/browse/SSA-3994'>SSA-3994</a>] -         filter by scan intent
</li>
<li>[<a href='SSA-3995https://open-jira.nrao.edu/browse/SSA-3995'>SSA-3995</a>] -         front end touches
</li>
<li>[<a href='SSA-3998https://open-jira.nrao.edu/browse/SSA-3998'>SSA-3998</a>] -         position input validation
</li>
</ul>
    
<h4>        Bug
</h4>
<ul>
<li>[<a href='SSA-3997https://open-jira.nrao.edu/browse/SSA-3997'>SSA-3997</a>] -         front end, floats and ints
</li>
<li>[<a href='SSA-3999https://open-jira.nrao.edu/browse/SSA-3999'>SSA-3999</a>] -         ArrayIndexOutOfBounds exception
</li>
</ul>


Release Notes - 2.5.3 - 2017-05-12

HTML
<h4>        Engineering Task
</h4>
<ul>
<li>[<a href='SSA-4010https://open-jira.nrao.edu/browse/SSA-4010'>SSA-4010</a>] -         Improve reliability of workflow-job signalling
</li>
</ul>
        
<h4>        Bug
</h4>
<ul>
<li>[<a href='SSA-4006https://open-jira.nrao.edu/browse/SSA-4006'>SSA-4006</a>] -         problems under load/maui kills jobs
</li>
</ul>
        


Release Notes - 2.5.4 - 2017-05-26

HTML
<h4>        Feature
</h4>
<ul>
<li>[<a href='SSA-3878https://open-jira.nrao.edu/browse/SSA-3878'>SSA-3878</a>] -         missing feature: NED/SIMBAD name resolution
</li>
<li>[<a href='SSA-3879https://open-jira.nrao.edu/browse/SSA-3879'>SSA-3879</a>] -         missing feature: different formats for source position input
</li>
<li>[<a href='SSA-4003https://open-jira.nrao.edu/browse/SSA-4003'>SSA-4003</a>] -         input position as HH:MM:SS.SSS
</li>
</ul>
    
<h4>        Bug
</h4>
<ul>
<li>[<a href='SSA-3983https://open-jira.nrao.edu/browse/SSA-3983'>SSA-3983</a>] -         RA/DEC notation defaults and range.
</li>
<li>[<a href='SSA-4033https://open-jira.nrao.edu/browse/SSA-4033'>SSA-4033</a>] -         local delivery permissions
</li>
</ul>