Overview

Ingestion of RealFast SDMs is an extension of the standard SDM ingestion process.  In short, the process involves: 

  1. Ingest the SDM for the EB as normal
  2. Fill in the RealFast collection metadata table
  3. Ingest the associated PNG candidate file(s)
  4. Link the new EB to its donor EB 

The details are covered in the following pages: 

SSA-5201 - Getting issue details... STATUS

Collection Support: realfast

Using the System

Assumptions: 

  • RealFast team has access to the AAT/PPI command line installation areas (/users/vlapipe/workflows/)
  • Vlapipe user (and group) have access (Read, Write, Execute) access.  
  • Everyone has (Read) access (to facilitate the AAT/PPI services access) 
  • The defined staging area (see below) is on the same filesystem as the data to be ingested

The process is contained within a special-purpose workflow, which can be initiated with the 'realfastIngest' command, installed under the vlapipe account.  

NOTE: There are separate installations for the Test and Production systems, both living in the vlapipe account's area.  Be careful to use the one you want. 

CLI Arguments
usage: realfastIngest [-h] [-P PROFILE] [-s SDM_PATH] [-p PNG_PATH]
                      sdmName [sdmName ...]

RealFast SDM Ingestion, version 3.9.0b2: Initiates an ingestion workflow for the SDM and ancillary 
    files for each execution block listed.

positional arguments:
  sdmName               FileSet Identifiers(s) to ingest

optional arguments:
  -h, --help            show this help message and exit
  -P PROFILE, --profile PROFILE
                        profile name to use, e.g. nmtest, mnprod
  -s SDM_PATH, --sdm_path SDM_PATH
                        Path to the RealFast SDM (overrides CAPO setting)
  -p PNG_PATH, --png_path PNG_PATH
                        Path to the candidate PNG files (overrides CAPO setting)

The two path arguments are provided for flexibility, but it is assumed that the default values in the CAPO profiles (dsoc-test, dsoc-prod/nmprod) are the typical location.  If those paths are correct, the command can then be invoked with:

Testing:

As vlapipe:

activate_profile dsoc-test

realfastIngest realfast_18B-320.sb38241161.eb38244520.59002.47251115741_1591099113820

Not as vlapipe: 

/users/vlapipe/workflows/dsoc-test/bin/realfastIngest realfast_18B-320.sb38241161.eb38244520.59002.47251115741_1591099113820


Production: 

As vlapipe:

activate_profile dsoc-prod

realfastIngest realfast_18B-320.sb38241161.eb38244520.59002.47251115741_1591099113820

Not as vlapipe: 

/users/vlapipe/workflows/dsoc-prod/bin/realfastIngest realfast_18B-320.sb38241161.eb38244520.59002.47251115741_1591099113820



This will initiate the process, and the SDM will shortly be available in the UI (a matter of ~10 minutes).  

The workflow will gather all the materials (SDM, PNG, files required for full ingestion) in once place (realfastStagePath), and initiate ingestion upon those files.   After successful ingestion of the metadata, and (if requested) of the files into NGAS, the workflow will trigger a reindex of the project.  


There are a set of values in the CAPO profiles for use with this workflow: 

RealFast CAPO Settings
edu.nrao.archive.workflow.config.collection.RealfastSettings.serviceUrl = https://webtest.aoc.nrao.edu/archiveServices/
#
edu.nrao.archive.workflow.config.collection.RealfastSettings.pngNameArgument = realfast_ancillaries?path=
edu.nrao.archive.workflow.config.collection.RealfastSettings.donorLocatorArgument = realfast_associate?path=
edu.nrao.archive.workflow.config.collection.RealfastSettings.collectionMetadataArgument = realfast_collection?path=
#
edu.nrao.archive.workflow.config.collection.RealfastSettings.ingestNGAS = false
edu.nrao.archive.workflow.config.collection.RealfastSettings.realfastStagePath = /lustre/aoc/cluster/pipeline/nmtest/stage_products
#
edu.nrao.archive.workflow.config.collection.RealfastSettings.realfastSdmPath = /lustre/aoc/sciops/pdemores/realfast_sdms
edu.nrao.archive.workflow.config.collection.RealfastSettings.realfastPngPath = /lustre/aoc/sciops/pdemores/realfast_sdms


Under The Hood

It should be noted that the realfastIngest command isn't doing any processing itself.  It only prepares the metadata and initiates the workflow.  It is possible to provide some limited feedback (a working directory name where some log files are kept, and a success/fail email) with a bit of additional work.  

What the workflow does in more detail: 

  1. Obtain the PNG file's partial name 
    1. (via service which reads the JSON under Annotation.xml)
  2. Find and link the required PNG file into a subdirectory of the staging area
  3. Obtain the donor SDM's SPL
    1. (via service which reads the JSON under Annotation.xml & queries the AAT)
  4. Obtain the collection metadata 
    1. (via service which reads the JSON under Annotation.xml)
  5. Write the collection metadata to a file in the staging area
  6. Link the SDM & BDFs into the staging area
  7. Write the Ingestion Manifest 
    1. SDM Science Product, with PNG ancillary product
    2. Associate Group with the donor SDM 
  8. Prepare ingestion artifacts
  9. Trigger ingestion 
    1. Ingestion sends a 'complete' signal upon success



  • No labels