You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

A restore of ALMA data only makes sense at the MOUS level.  Individual EBs are not calibrated, so the calibration products are all linked to the MOUS.  There is currently no automated processing performed at the GOUS or SGOUS, and the infrastructure for sub-MOUS restores is incomplete. 

ALMA products are provided with a *.scriptForPI.py which attempts to cover all the possibilities of how one might go about performing a restore, and relies heavily on assumptions about directory layouts and the possible different naming schemes for files within the directory structure.  The aim of this document is to reduce an MOUS level restore to it's basics:  a standard CASA products/rawdata/working layout with the critical data placed appropriately and a PPR created to guide the pipeline.

NOTE: Still a work in progress with temporary tools until I refine the methodology. 


Can We Restore?

I'm assuming I start with an MOUS uid (for instance: uid://A001/X1284/X265f) and access to the ALMA metadata database.  We first need to know if it's is calibrated in the first place.  If: SELECT COUNT(*) FROM ASA_PRODUCT_FILES WHERE ASA_MOUS_UID = 'uid://A001/X1284/X265f' AND FILE_CLASS = 'calibration';  is greater than zero, we're in luck. 

Setting up:

On a machine which can see CV's Lustre (Hopper and aatweb-dev are ones we already use for workflow basics), go to an working area (I'm using /lustre/naasc/web/almapipe/pipeline/vatest/testing) and create yourself a new directory (and the standard subdirectories for CASA processing).

Next we need to constituent ASDMs (or EBs):  SELECT DISTINCT ASDM_UID FROM ASA_SCIENCE WHERE MEMBER_OUSS_ID='uid://A001/X1284/X265f';  Which gives us four EBs: uid://A002/Xc7111c/X86a7, uid://A002/Xc7111c/X409e, uid://A002/Xc72427/X2dfe, uid://A002/Xc7111c/X3979 .  Using those ASDM_UIDs you can download the raw data using asdmExportLight -d rawdata {uid_here}. 

To download the calibration products, I have modifed a script used at JAO in Santiago to query and run wget to obtain the files directly from the local NGAS system (naasc_listfiles.py).  The script needs access to cx_Oracle, so activate the vatest profile before you run it. In the products subdirectory, run that script with the MOUS uid from above (~almapipe/workflows/nmtest/bin/python3.6 ~almapipe/pipeline/vatest/testing/resources/naasc_listfiles.py --get uid://A001/X1284/X265f).  This will also obtain all the script files (including the initial PPR and pipeline manifest (the latter is required for a proper restore).

Two Roads Diverge:

Easy Way:

If we obtain the scripts (very similar to obtaining the calibration information above, just change the FILE_CLASS to 'scripts') which accompany the ALMA products, we could simply run their .hifa_cal.casa_piperestorescript.py ( /home/casa/packages/RHEL6/release/casa-release-5.1.1-5/bin/casa --pipeline -c member.uid___A001_X   1284_X265f.hifa_calimage.casa_piperestorescript.py).  This fires right up and gets to work, but failed on the 3rd EB (of 4!) due to a typo in my download of the data. 


For Greater Flexibility:

The restore script is very useful, but for the AAT-PPI, we're trying to stick to a single method of interacting with the casa pipeline:  Creating a PPR.  We'll need more information for the PPR:

  1. Project Code
    1. SELECT DISTINCT PROJECT_CODE FROM ASA_PROJECT JOIN ASA_SCIENCE ON ASA_PROJECT.PROJECT_UID = ASA_SCIENCE.PROJECT_UID WHERE  MEMBER_OUSS_ID='';
  2. ObsProject uid
  3. ObsUnitSet partId             
    1. 2) and 3) can be gotten together: SELECT OBSUNITSETUID, OBSUNITSETPARTID FROM AQUA_OUS WHERE OUS_STATUS_ENTITY_ID = '';
  4. ProjectStatusRef
    1. SELECT OBS_PROJECT_STATUS_ID FROM OBS_UNIT_SET_STATUS WHERE STATUS_ENTITY_ID='';
  5. OUSStatusRef
    1. the entity ID for this is just the MOUS id we're given
  6. EB → Session Mappings: 
    1. This one is nontrivial, first we extract the MOUS Status XML file:
      1. SELECT XML FROM OUBS_UNIT_SET_STATUS WHERE STATUS_ENTITY_ID=''; 
    2. Then we need to walk the XML for each SESSION segment create a list of EBs for that session.  
    3. Then arrange the data into the PPR matching sessions (to the EBs (multiple separated by '|')
  7. SchedBlockRef
    1. ASA_SCIENCE + MOUS uid
    2. If there's more than one, then we have an issue, but it should be rare
  8. SBStatusRef
    1. SCHED_BLOCK_STATUS + SB uid

NOTE: Running it this way requires at minimum the 3 SCIPIPE evironment variables (ROOT, LOG, SCRIPT) which I have notes about elsewhere. 

NOTE:  It may be possible to skip some of that information (2-5).  We have been ignoring the ProjectStructure section of the PPR for years with no difficulties, but not currently.  The way I have found that works is to incorporate the ProjectStructure section, and use the runpipeline.py (instead of runvlapipeline.py) in conjuction with the full Project/SOUS/GOUS/MOUS directory structure.  More experimentation (and perhaps understanding of the pipeline infrastructure system) is necessary before we can characterize this completely. 

I've created a first-pass PPR for our test data set, but it has some flaw I'm not seeing yet (the pipeline rejects the ProcessingRequests).   PPR_jls.xml

I have one that works as completely as the piperestorescript.py : failing on an observatory position.  PPR_jls_worked.xml


Now we need some further experimentation, and some idea of how to handle any variation in parameters to the hifa_restoredata call. 

  • No labels