The Alma OUS for the AAT-PPI

The Observation Unit Set (OUS) used by ALMA serves multiple purposes:

an organizational structure for separating multiple types of data which are generated from the high-level Science Goals defined in a proposal.
a grouping mechanism for processing data, allowing for hierarchical data processing (calibrate data from 2 separate configurations, then combine them for a single image)
a state-keeping mechanism for what has happened to each organizational level

For the time being, the EVLA mostly needs the latter two types of information (similar to Daniel's notion of the make-like workflow engine). SRDP will likely move us toward automatic decomposition of proposals into Scheduling Blocks (SBs), with the possibility for similar multiple-configuration proccessing. Fundamentally, consider an OUS-like structure to be a metadata analog to our filegroups: It links the 'prerequisites' for an archived product to the generating workflow and to the product itself:

TODO: MAKE A PRETTY PICTURE HERE

The OUS is a recursive structure. Any given OUS contains a link to the product(s) it produced, and the starting point for that processing: One or more EBs, or another OUS. Let's try and organize the VLASS single epoch processing in this structure:

Calibration: Calibration is performed on one or more EBs, and results in calibration tables which are archived. For a multiple EB calibration, the tables are not necessarily applicable to any individual EB. The calibration tables are linked to the OUS, providing the more appropriate relationship. Once the calibration tables are archived, the OUS is marked complete, indicating that the linked data can be restored. I this case, the OUS links directly to the EB(s) to be processed and the calibration tables.

Imaging: In order to perform imaging, you need a calibrated measurement set (which are not archived). The precursor for imaging, then would be one or more OUSs like that discussed above (which link the necessary pieces to create the calibrated measurement set). The resultant products here are the image set created.

Catalogs: From the images created above, we want to make a catalog to facilitate the creation of the coarse cubes. The precursor is then the structure created above, and we perform the (yet to be determined) processing upon those images which will create the catalog needed for the masking process in cube computation. It is unclear whether catalogs will be created in a 1:1 correspondence with images, or some other relationship, however we can have as many 'child' OUSs for the precursor information. However it is decided, these catalogs will be archived, and this OUS will be used as a prerequisite below.

Cubes: These require both a calibrated measurement set and a catalog in order to be created properly. Each cube would require both the catalog and the calibrated measurement set in order to perform the reduction. As I understand it, a cube itself is going to be an organizational structure (another grouping of images beyond the image set), but that is still under discussion.

In order for this to work, we will need to define metadata structures for the products we will ingest. We already have this for images, but we need to create corresponding structures for calibration tables, catalogs, and cubes.

Questions of Functionality:

What types of these OUSs are we going to need, and how can we keep the system extensible as the needs of the users expand?

We may need to handle some sort of 'incremental update' functionality as well. For SRDP, they may want some reactive capability: Consider a monitoring project. Overall, there will be 50 observations, evenly spaced, of a particular target. Those observations are all generated from the same SB, but we don't know what the EB's name will be beforehand. The monitoring project OUS might pre-create a child OUS which is looking for the next EB from this SB. Once an EB is created it is placed into the child OUS, and a new child is created, until we have all 50. At that point the parent OUS might have processing to do (make a time-lapse movie, for instance). How do we handle that type of functionality?

Caveat: Each Alma Science Goal is restricted to a single receiver (or is it a single total IFLO setup?). Any attempt to restructure EVLA data will likely need to take account of factors other than just configuration.

Backward Compatability:

ALMA creates a 3-layer structure of OUSs by default, but only uses the lowest level (also called a Member OUS or MOUS) for any automated processing. The upper layers were envisioned to be used in more complicated imaging or analysis tasks, but those are neither in use, nor likely to be used in the near future (~10 years). For the time being, all ALMA data is managed within a single OUS, which is operated upon multiple times (as of October 2017), first of all to calibrate and ingest the calibration tables, and secondly to perform imaging on that same group of data. Both sets of products are tied to the MOUS level of the structure. As we create a similar structure, we should be careful to acknowledge where our similarities with the ALMA structure are, so we can display that as relevant.

Multiple-products:

We already make available mulitple calibration tar files when they exist, but there is no information determining how they differ, or why a user might prefer one over the other. That type of information should be part of the metadata we provide. The recalibration would constitute another OUS, and should specify how it differs from the standard process (additional flagging, parameter changes to tasks, etc).

What's in a name?

The name 'Observation Unit Set' is not as descriptive as some would prefer. There is also quite a bit of resistance to importing ALMA concepts into EVLA processes. What's a better name for this structure? If we want one, we need to be quick. The next-gen PST requirements committee seems to be leaning toward the name 'Program'. Other ideas?

Initial Database Thoughts:

I've started a proposed layout of tables for the metadata database, but I haven't come to any satisfactory solution for handling the processing side of things. That portion of things seems like a combination of database + crawler process more than pure database structure.

observation_set Table

Column Name	Column Type	Comments
ous_id	integer	auto-generated ID value for this OUS
Structural Columns
project_id	integer	parent Project for the OUS if it is not the child of another OUS
parent_ous_id	integer	parent OUS for this OUS (null for top-level OUSs)
has_data	boolean	Indicates whether a join with the data linking table will be useful
has_products	boolean	Indicates whether a join with the products linking table will be useful

Processing Management Columns

Metadata Columns
ous_name	varchar	Human-readable name for this processing structure
purpose	integer

time_of_creation	datetime	When this was created (may play into proprietary period)
last_update	datetime	Last addition to this structure (typically last ingested products)
alma_label	varchar	Where relevant, match our system to the SOUS/GOUS/MOUS structure for ALMA data for user familiarity.

ous_to_execution_blocks

Column Name	Column Type	Comments
ous_id	integer	Foreign Key to the observation_set table
execution_block_id	integer	Foreign Key to the execution_blocks table

ous_to_products

Column Name	Column Type	Comments
ous_id	integer	Foreign Key to the observation_set table
product_type_id	integer	Foreign Key to the products_type table
product_id	integer	id value for the relevant product table

ous_to_workflows

Column Name

Column Name	Column Type	Comments
ous_id	integer	Foreign Key to the observation_set table
workflow_id	integer	Foreign Key to the workflows table

product_type

This table would be a slowly expanding list of our processing products. Currently we have calibration_tables (as yet undefined) and image_sets. Coming shortly are catalogs (at least an initial version for VLASS processing) followed by cubes.

Column Name	Column Type	Comments
product_type_id	integer	Chosen (factor of 2, or 10, etc) for combination purposes, as done for polarizations
product_name	varchar	Name of the type (Calibrations, Images, Catalogs, etc)
product_table	varchar	Table name for these types of products.

Page tree