You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 41 Next »

About the NRAO Archive Access Tool

Introduction

This is the updated NRAO Science Data Archive. It is a tool to provide access to the vast repository of astronomical data we’ve collected over the years, both public data and data still in the proprietary period: data you have proposed for and observed or been given access to.

We are also offering the option to reprocess raw data and retrieve the results. Using the resources we have available, we can help you reduce your data.

This release is the culmination of the Phase 2.5 development effort. Everything present in Phases I and II is still present and operational. The focus in this release has been on 2 functional areas: data delivery and incorporation of workflow processing in Charlottesville.

Summary of Changes since Phase 2.0

  • Downloaded format can be specified as raw data or CASA measurement sets (EVLA and ALMA only)
  • Flags generated during observing can be applied and online averaging (Spectral, Time) can be selected
  • Downloaded files can be delivered as a single tar file or as a directory of individual files 
  • AUI staff can select a delivery directory for downloads
  • Download requests for ALMA and GBT data are processed and delivered in Charlottesville instead of streamed from Charlottesville to Socorro and then processed and delivered in Socorro

Searching

The tool offers both a  basic interface and an  advanced interface, the basic interface offers a quick way to search the archive for specific topics, authors, source names and so on, and checks the text you provide against all of the fields below:

  • Telescope

    • ALMA array type (12m, 7m, etc)

    • VLA array configuration

    • ALMA maximum resolvable scale

    • GBT backends and receivers

  • Receiver band

  • Polarizations

  • Start/end date

  • Observation ID

  • Project code / title / abstract / PI / authors

  • Cone search source position with RA/Dec, coordinate system, and search radius

  • Full-width/half-max

  • Low/high frequency

In the  advanced interface each search field is  ANDed together, multiple selections within a field are  ORed , for example if you select Telescope: VLA, VLA Array Configurations: A, D, and Start Date: 2015-01-01, you are searching for any VLA observation that started on or after January 1, 2015 and was in A or D configurations.

The  basic interface offers a quick way to search the archive for specific topics, authors, source names and so on, and checks the text you provide against all of the fields above.

Authentication and Authorization

It has always been possible to submit download requests anonymously for non-proprietary data. You may now also log in and download or reprocess your proprietary data or data that you have been granted access to.

The archive also supports SSO level 2, so you may authenticate with either your ALMA or NRAO account. If the accounts are linked at ALMA, you will see the same stuff either way. 

Data Delivery

For Phase 2.5, downloaded data formats include those provided by the current NRAO Archive. This includes raw data formats for the ELVA, ALMA, VLBA, GBT, and legacy VLA. In addition, the ability to download the EVLA or ALMA data as a CASA measurement set (MS) is provided. In the creation of the MS, options are provided to:

  • apply flags generated during observing
  • choose online averaging (Spectral, Time Averaging)
  • Deliver the data as a tar file

In addition, AUI staff have the option of having the data delivered to a directory that they specify in addition to the default location in /lustre.

Workflow Processing in Charlottesville

A workflow for downloading and processing data hosted in Charlottesville was incorporated. Data hosted in the Charlottesville archive, including ALMA and GBT data, are fetched and processed via the Charlottesville workflow and the data is delivered to the Charlottesville file system.

Reprocessing

Reprocessing requests are now supported for raw data. ALMA requests and VLA requests for data taken after 2013 are functional (Jansky VLA sets from before 2013 had issues with intents  and can't be reprocessed without manual intervention.

Known Issues

Search Index and Database

The Archive seaches a combined index of ALMA and non-ALMA observations, this index is built from a database that has certain inconsistencies. We are in the middle of reimplementing the database and refining the procedure that builds it from the legacy database, meanwhile you can find results that are missing essential fields (start or stop dates, array configurations and so on). We have no evidence of observations with incorrect fields, just fields that are vacant. The search index for ALMA data is rebuilt nightly and should be up to date within a day of observing, for non-ALMA data we have not yet automated the process and we re-build the search index about once a week.

Performance

Unlike the ALMA archive, the NRAO archive stages downloads before it allows access to them, meaning the files composing a file set are pulled from long term storage and assembled before presenting the option to download the files. As a consequence requests take longer than users of the ALMA system might expect.

Reprocessing Errors

The NRAO Archive offers primitive re-processing capabilities, meaning it can re-run raw data files through the ALMA or NRAO calibration pipeline and allow downloading the results. These features are very much a work in progress, one area that needs improvement is error reporting, the Archive currently allows users to reprocess data sets that are known to be bad, and there is no feedback to the user except that the request was cancelled.

Data Delivery Formats

The SDFITS format for GBT data is not yet supported. GBT data can only be downloaded as a set of GBT-FITS files.

Reporting Bugs (NRAO Staff)

Please report bugs and/or suggestions to Stephan Witz (switz@nrao.edu), Claire Chandler (cchandle@nrao.edu) and Mark Lacy (mlacy@nrao.edu).

  • No labels