Digital Object Identifiers (DOIs) are becoming the preferred method to link data from the scientific papers that use those data. DOIs are "minted" by trusted 3rd party organizations and provide a persistent identifier to a source of data.
Our aim is to provide DOIs for products in the NRAO archive. The three sets below are the highest priority:
1) For "Collections" - final products from large projects. For example, VLASS or ALFALFA
2) For AUDI images
3) For VLA images (pipelined and user-driven).
Issues for discussion/consideration:
1) granularity - one DOI per FITS image, or per collection?
2) versioning - should we reissue a DOI if a new version of the product is archived?
The AAS Pubs board held a data linking meeting on 8/28/20 - we saw no consensus on the granularity or versioning issues, but some sense that aggregations of data links (DOIs or others) could be assigned a DOI. We also learned that DataCite is the primary issuer of DOIs (https://datacite.org), it is E2000/yr for membership (which can be split by consortia) plus E500 per individual organization, then E0.8 per DOI with bulk prices for >2000 DOIs/yr.
Draft Requirements:
- For user-supplied collections based on large projects, and NRAO collections (e.g. NVSS, legacy image archive) one DOI will be issued per collection, linked to the collection such that clicking on the DOI will direct the user to the collection web page on Confluence (which in turn should have a query link to find the data in the archive). This DOI should be displayed prominently during archive searches, and on the Confluence page describing the collection's data release. A new DOI will be issued if a new reduction of the bulk of the dataset is supplied, but not in the case of small updates.
- For VLASS, several DOIs will be issued based on product collections (Quick Look, Single Epoch, Cumulative, per EDP set supplied), which links to pages on the VLASS Science website that contains query links to find the data in the archive. Again, these should be displayed prominently in archive searches and on the VLASS Science website.
- VLA pipeline images will receive one generic DOI per major pipeline version (e.g. data processed with 2020.x; 2021.x would get different DOIs, but not those with 2020.1 and 2020.2). These will be prominently shown in product searches. Each DOI will resolve to a landing page with a brief general description of the imaging pipeline, release notes for each version and a link to a query that returns the archive listing for image products created from that pipeline version.
- User-requested images (AUDI and VUDI) will receive one DOI per primary FITS image created. These will be prominently shown in product searches and resolve to a landing page generated automatically for each image, containing a summary of the image metadata and a link to the files in the archive.
- DOI metadata following the DataCite schema (http://schema.datacite.org, for an example see http://archive.stsci.edu/doi/resolve/resolve.html?doi=10.17909/t9-9skz-qw10) should be provided when creating each DOI.
- Users will be strongly encouraged to cite the DOI in papers (for example, as a footnote, see Long et al. (2020) using the dataset keyword http://archive.stsci.edu/doi/faq.html#faq31 for AAS journals), along with any journal article describing the dataset. This description should be linked prominently from the archive webpages and other related pages (VLASS science, per-collection confluence pages and NRAO Library pages).
- DOI links should be regularly tested to ensure they still perform the correct searches, and updated if necessary.