Introductory Comments
The AAT/PPI has 3 parallel 'environments'. Each of these environments consists of a set of software which is deployed to dedicated servers in both NM and CV, and a set of configuration parameters (also called CAPO Profiles) for each deployment location for that environment. The 3 environments (and their associated profiles) are:
- Production (nmprod, vaprod)
- The current release of the AAT/PPI
- No redeployments without an announcement to the archive_issues list
- Web interface: archive-new.nrao.edu
- Test (dsoc-test, naasc-test)
- Used for integration testing & validation of changes (features or bug fixes)
- Also the home for pre-release candidates for release to production
- Typically stable on day+ timescales
- Major redeployments are announced on the archive_test list
- Web interface: archive-test.nrao.edu
- Development (nmtest, vatest)
- Developers' basic verification testing ground
- Often subject to sudden redeployments of subsystems
- Can have components from multiple branches of development deployed at one time
- Web interface: webtest.aoc.nrao.edu/portal/#/
- (Yes, I know, that name is unfortunate)
However, when it comes to the basic utilities for Data Analysts, the separation between these environments can be somewhat blurred: the tools are all installed next to each other within the vlapipe & almapipe account's home directories. This makes it easy to potentially use the wrong set of software (or mix software and settings from multiple environments) by mistake, and that's without accounting for bugs in the software itself. This document is meant to demonstrate a few suggested methods of accessing those tools in ways which might reduce the chance of confusion among the installations.
Methods of Using AAT/PPI Commands
The vlapipe and almapipe Accounts
As a convenience feature, these two accounts have had the activate_profile
& deactivate_profile
commands added to their Bash configuration. When given the name of a CAPO profile (nmprod
, dsoc-test
, etc) activate_profile
sets up a few environment variables and sets up your shell so that the corresponding set of utilities (and the correct supporting libraries) in your $PATH
, so that they're easy to use.
You can activate_profile
with a new argument to update to a new configuration (for instance, testing a new feature before release), or deactivate_profile
to clean up what was done and go back to the basic account settings.
Running A Few Commands From Another Account
When running just a few commands, it might be easier to simply directly call the appropriate utility without resorting to the relevant 'pipe' account. In that case, pay particular attention to the directory from which you run the command (either by changing directory there or typing out the path). The differences between /users/vlapipe/workflows/nmprod/bin/myCommand
and /users/vlapipe/workflows/nmtest/bin/myCommand
can (potentially) be not just the details of the software itself, but also any differences in the profile's parameters.
Specifying the Profile
You can provide most (if not all) the AAT/PPI utilities with a -P
option to define the CAPO profile you want to use. It's typically a good idea to do so, as you likely do not have the CAPO_PROFILE
environment variable set, and not all the utilities are capable of guessing the profile from their location.
Often, you'll want to ensure that the profile you provide matches that of the utility you're running, but there is one exception: In version 3.7.0 and later, restoreToCache
can be provided one of the VLASS profiles (vlass.test
, vlass.w7
, etc) to run the restore workflow with appropriate settings for that project.
Setting A Window Up For a Single Environment
For running a larger set of commands for a single environment, there is an alternative to using the vlapipe/almapipe accounts (which is particularly helpful if you need access to directories where those accounts don't have permissions). This is not as comprehensive as using activate_profile
, so it is recommended that you don't switch between environments using this method.
It is possible to approximate what is done via activate_profile
with:
export CAPO_PROFILE=myprofile
source ~pipeaccount/workflows/myprofile/bin/activate
Where pipeaccount is vlapipe or almapipe, depending on where you're working, and myprofile is the appropriate name from the discussion of the 3 environments/installations in the introduction.
Caveats and Details
Shared Resources Among Environments
Each deployment environment has a copy of the AAT/PPI code (with underlying software infrastructure), and its own copy of the AAT/PPI metadata database. That separation reduces concerns about the size of the connection pool for the metadata database, and each environment having a separate messaging infrastructure insulates them from one another. However, some resources are shared among all environments.
NGAS
There is only one official file-storage system in each location. Thus all 3 environments draw from the same set of servers. If multiple environments are in heavy use, it's possible to overload the system.
Lustre
The high speed shared filesystem is another shared resource. Each environment has a particular sub-area in which it performs its processing. Each of the areas is parallel to each other (much like the software installations underneath the *pipe accounts) on the associated site's lustre system (for instance, /lustre/naasc/web/almapipe/pipeline/naasc-test
is where the Test environment works in CV).
Cluster
The computing resources are shared among environments, as each site only has one set of dedicated processing nodes. While they can (and sometimes are) configured to submit to separate subsets of these clusters, the entire set of computers to which jobs are submitted is limited. This is often particularly noticeable in CV during periods of heavy ALMA data processing (particularly around pipeline validation testing).
NAASC Metadata Database
There is only a single database about ALMA data in CV. All 3 of the AAT/PPI environments draw information from it using the same set of Read-Only credentials. So far, however, there not been severe issues with connectivity.
Site Differences (NM vs CV)
The entirety of the software making up the AAT/PPI is deployed upon that environment's server in NM, with only a subset deployed to its companion machine in CV. There is a separate copy of the utilities deployed to the almapipe account, and there is an independent workflow system on the CV server. However, as most of the AAT/PPI utilities are just sending system messages to provoke action, the fact that the message processing system (amygdala) is located in NM may cause additional delay for some commands.
In addition, some of the services which facilitate processing are deployed only to NM, so that information requests (like those made for image ingestion) may be slower.
Python Virtual Environments
{needs more details}Self-contained sets of python libraries & associated commands. This is how we deploy the utilities for the AAT/PPI.
CAPO Files
The configuration details for the AAT/PPI are kept in text files in /home/casa/capo
. This directory is replicated to CV via rsync on a cron job, so it is not immediately updated after an edit. There are a set of properties for each site (NM vs CV) and environment (Development, Test, Production). Included in that directory are a set of files for the VLASS project, containing configuration specific to their processing needs.
These properties define information like server names, where to process data, and what version of CASA to use as a default. They are editable by the vlapipe account, so those values which are of interest can be readily modified by the Data Analysts. It is possible (even likely) that some of those variable names may change as the system evolves.
The Renaming Of The Profiles
{needs more details}There are already symlinks, no official timeline, but be prepared for the replacement of nmprod with dsoc-prod, and nmtest with dsoc-dev the like.