Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Where pipeaccount is vlapipe or almapipe, depending on where you're working, and myprofile is the appropriate name from the discussion of the 3 environments/installations in the introduction.  

Caveats and Details

Resources:  What's Separate and What's Shared Among Environments

NM vs CV

Python Virtual Environments

'Test' is new

Naming convention change

...

Shared Resources Among Environments

Each deployment environment has a copy of the AAT/PPI code (with underlying software infrastructure), and its own copy of the AAT/PPI metadata database.  That separation reduces concerns about the size of the connection pool for the metadata database, and each environment having a separate messaging infrastructure insulates them from one another.  However, some resources are shared among all environments.

NGAS

There is only one official file-storage system in each location.  Thus all 3 environments draw from the same set of servers.  If multiple environments are in heavy use, it's possible to overload the system.

Lustre

The high speed shared filesystem is another shared resource.  Each environment has a particular sub-area in which it performs its processing.  Each of the areas is parallel to each other (much like the software installations underneath the *pipe accounts) on the associated site's lustre system (for instance, /lustre/naasc/web/almapipe/pipeline/naasc-test is where the Test environment works in CV).

Cluster

The computing resources are shared among environments, as each site only has one set of dedicated processing nodes.   While they can (and sometimes are) configured to submit to separate subsets of these clusters, the entire set of computers to which jobs are submitted is limited.  This is often particularly noticeable in CV during periods of heavy ALMA data processing (particularly around pipeline validation testing). 

NAASC Metadata Database

There is only a single database about ALMA data in CV.  All 3 of the AAT/PPI environments draw information from it using the same set of Read-Only credentials.  So far, however, there not been severe issues with connectivity. 

Site Differences (NM vs CV)

The entirety of the software making up the AAT/PPI is deployed upon that environment's server in NM, with only a subset deployed to its companion machine in CV.  There is a separate copy of the utilities deployed to the almapipe account, and there is an independent workflow system on the CV server.  However, as most of the AAT/PPI utilities are just sending system messages to provoke action, the fact that the messages processing system (amygdala) is located in NM may cause additional delay for some commands.  

In addition, some of the services which facilitate processing are deployed only to NM, so that information requests (like those made for image ingestion) may be slower. 

Python Virtual Environments

{needs more details}Self-contained sets of python libraries & associated commands.  This is how we deploy the utilities for the AAT/PPI. 

CAPO Files

The configuration details for the AAT/PPI are kept in text files in /home/casa/capo.  There are a set of properties for each site (NM vs CV) and environment (Development, Test, Production).  Included in that directory are a set of files for the VLASS project, containing configuration specific to their processing needs. 

These properties define information like server names, where to process data, and what version of CASA to use as a default.  They are editable by the vlapipe account, so those values which are of interest can be readily modified by the Data Analysts.  It is possible (even likely) that some of those variable names may change as the system evolves. 

The Renaming Of The Profiles

{needs more details}There are already simlinks, no official timeline, but be prepared for the replacement of nmprod with dsoc-prod, and the like.