Digital Object Identifiers (DOIs) are becoming the preferred method to link data from the scientific papers that use those data. DOIs are "minted" by trusted 3rd party organizations and provide a persistent identifier to a source of data. 

Our aim is to provide DOIs for products in the NRAO archive. The three sets below are the highest priority:

1) For "Collections" - final products from large projects. For example, VLASS or ALFALFA

2) For AUDI images

3) For VLA images (pipelined and user-driven).

Issues for discussion/consideration:

1) granularity - one DOI per FITS image, or per collection?

2) versioning - should we reissue a DOI if a new version of the product is archived? Another option is to add another link on the landing page - new DOI is probably cleanest.

The AAS Pubs board held a data linking meeting on 8/28/20 - we saw no consensus on the granularity or versioning issues, but some sense that aggregations of data links (DOIs or others) could be assigned a DOI. We also learned that DataCite is the primary issuer of DOIs (https://datacite.org), it is E2000/yr for membership (which can be split by consortia) plus E500 per individual organization, then E0.8 per DOI with bulk prices for >2000 DOIs/yr (later confirmed that this is in fact the only suitable DOI issuer - the others are for foreign language/origin or for publications rather than data). 

Draft Requirements:

  1. The NRAO archive database should contain a DOI field for image entries to enable searches by DOI.
  2. An API should be written for the archive that allows a search by DOI (e.g. https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html?searchQuery=%7B%22service%22:%22DOIOBS%22,%22inputText%22:%2210.17909/t9-9skz-qw10%22%7D).
  3. For user-supplied collections based on large projects, and NRAO collections (e.g. NVSS, legacy image archive) one DOI will be issued per collection, linked to the collection such that clicking on the DOI will direct the user to the collection web page on Confluence (which in turn should have a query link to find the data with that DOI in the archive). This DOI should be displayed prominently during archive searches, and on the Confluence page describing the collection's data release.  A new DOI will be issued if a new reduction of most or all of the dataset is supplied, but not in the case of small updates.
  4. For VLASS, several DOIs will be issued based on product collections (Quick Look, Single Epoch, Cumulative, per EDP set supplied), which links to pages on the VLASS Science website that contains query links to find the data in the archive by DOI. Again, these should be displayed prominently in archive searches and on the VLASS Science website.
  5. VLA pipeline images will receive one generic DOI per major pipeline version (e.g. data processed with 2020.x; 2021.x would get different DOIs, but not those with 2020.1 and 2020.2). These will be prominently shown in product searches. Each DOI will resolve to a landing page with a brief general description of the imaging pipeline, release notes for each version, and a link to a query by DOI that returns the archive listing for image products created from that pipeline version.
  6. User-requested images (AUDI and VUDI) will receive one DOI per primary FITS image created. These will be prominently shown in product searches and resolve to a landing page generated automatically for each image, containing a summary of the image metadata and a link to the files in the archive by DOI.
  7. DOI metadata following the DataCite schema (http://schema.datacite.org, for an example see http://archive.stsci.edu/doi/resolve/resolve.html?doi=10.17909/t9-9skz-qw10) should be provided when creating each DOI.
  8. Users will be strongly encouraged to cite the DOI in papers (for example, as a footnote, see Long et al. (2020) using the dataset keyword http://archive.stsci.edu/doi/faq.html#faq31 for AAS journals), along with any journal article describing the dataset. This description should be linked prominently from the archive webpages and other related pages (VLASS science, per-collection confluence pages and NRAO Library pages, possibly as a condition of publication page charge support).
  9. If a new version of an existing product or collection with an existing DOI is archived, a new DOI should be made for the new version.
  10. DOI links should be regularly tested to ensure they still resolve correctly, and updated if necessary.
  • No labels