You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

A restore of ALMA data only makes sense at the MOUS level.  Individual EBs are not calibrated, so the calibration products are all linked to the MOUS.  There is currently no automated processing performed at the GOUS or SGOUS, and the infrastructure for sub-MOUS restores is incomplete. 

ALMA products are provided with a *.scriptForPI.py which attempts to cover all the possibilities of how one might go about performing a restore, and relies heavily on assumptions about directory layouts and the possible different naming schemes for files within the directory structure.  The aim of this document is to reduce an MOUS level restore to it's basics:  a standard CASA products/rawdata/working layout with the critical data placed appropriately and a PPR created to guide the pipeline.

NOTE: Still a work in progress with temporary tools until I refine the methodology. 


Can We Restore?

I'm assuming I start with an MOUS uid (for instance: uid://A001/X1284/X265f) and access to the ALMA metadata database.  We first need to know if it's is calibrated in the first place.  If: SELECT COUNT(*) FROM ASA_PRODUCT_FILES WHERE ASA_MOUS_UID = 'uid://A001/X1284/X265f' AND FILE_CLASS = 'calibration';  is greater than zero, we're in luck. 

Setting up:

On a machine which can see CV's Lustre (Hopper and aatweb-dev are ones we already use for workflow basics), go to an working area (I'm using /lustre/naasc/web/almapipe/pipeline/vatest/testing) and create yourself a new directory (and the standard subdirectories for CASA processing).

Next we need to constituent ASDMs (or EBs):  SELECT DISTINCT ASDM_UID FROM ASA_SCIENCE WHERE MEMBER_OUSS_ID='uid://A001/X1284/X265f';  Which gives us four EBs: uid://A002/Xc7111c/X86a7, uid://A002/Xc7111c/X409e, uid://A002/Xc72427/X2dfe, uid://A002/Xc7111c/X3979 .  Using those ASDM_UIDs you can download the raw data using asdmExportLight -d rawdata {uid_here}. 

To download the calibration products, I have modifed a script used at JAO in Santiago to query and run wget to obtain the files directly from the local NGAS system (naasc_listfiles.py).  The script needs access to cx_Oracle, so activate the vatest profile before you run it. In the products subdirectory, run that script with the MOUS uid from above (~almapipe/workflows/nmtest/bin/python3.6 ~almapipe/pipeline/vatest/testing/resources/naasc_listfiles.py --get uid://A001/X1284/X265f).  This will also obtain all the script files (including the initial PPR and pipeline manifest (the latter is required for a proper restore).

Two Roads Diverge:

Easy Way:

If we obtain the scripts (very similar to obtaining the calibration information above, just change the FILE_CLASS to 'scripts') which accompany the ALMA products, we could simply run their .hifa_cal.casa_piperestorescript.py ( /home/casa/packages/RHEL6/release/casa-release-5.1.1-5/bin/casa --pipeline -c member.uid___A001_X   1284_X265f.hifa_calimage.casa_piperestorescript.py).  This fires right up and gets to work, but failed on the 3rd EB (of 4!) due to a typo in my download of the data. 


For Greater Flexibility:

The restore script is very useful, but for the AAT-PPI, we're trying to stick to a single method of interacting with the casa pipeline:  Creating a PPR.  We'll need more information for the PPR:

  1. Project Code
    1. SELECT DISTINCT PROJECT_CODE FROM ASA_PROJECT JOIN ASA_SCIENCE ON ASA_PROJECT.PROJECT_UID = ASA_SCIENCE.PROJECT_UID WHERE  MEMBER_OUSS_ID='';
  2. ObsProject uid
  3. ObsUnitSet partId             
    1. 2) and 3) can be gotten together: SELECT OBSUNITSETUID, OBSUNITSETPARTID FROM AQUA_OUS WHERE OUS_STATUS_ENTITY_ID = '';
  4. ProjectStatusRef
    1. SELECT OBS_PROJECT_STATUS_ID FROM OBS_UNIT_SET_STATUS WHERE STATUS_ENTITY_ID='';
  5. OUSStatusRef
    1. the entity ID for this is just the MOUS id we're given
  6. EB → Session Mappings: 
    1. This one is nontrivial, first we want a list of EBs:
      1. SELECT DISTINCT ASDM_UID FROM ASA_SCIENCE WHERE MEMBER_OUSS_UID='';
    2. Then, we extract the MOUS Status XML file:
      1. SELECT XML FROM OUBS_UNIT_SET_STATUS WHERE STATUS_ENTITY_ID=''; 
    3. Then we need to walk the XML for each SESSION segment create a list of EBs for that session.  
    4. Then arrange the data into the PPR matching sessions (to the EBs (multiple separated by '|')
  7. SchedBlockRef
    1. SELECT DISTINCT SCHEDBLOCK_UID FROM ALMA.ASA_SCIENCE WHERE MEMBER_OUSS_ID = '';
    2. If there's more than one, then we have an issue, but the DOMAIN_ENTITY_STATE of the SCHED_BLOCK_STATUS table should be able to tell us which we want.
  8. SBStatusRef
    1. SELECT STATUS_ENTITY_ID FROM SCHED_BLOCK_STATUS WHERE DOMAIN_ENTITY_ ID='';

NOTE: Running it this way requires at minimum the 3 SCIPIPE evironment variables (ROOT, LOG, SCRIPT) which I have notes about elsewhere. 

NOTE:  It may be possible to skip some of that information (2-5).  We have been ignoring the ProjectStructure section of the PPR for years with no difficulties, but not currently.  The way I have found that works is to incorporate the ProjectStructure section, and use the runpipeline.py (instead of runvlapipeline.py) in conjuction with the full Project/SOUS/GOUS/MOUS directory structure.  More experimentation (and perhaps understanding of the pipeline infrastructure system) is necessary before we can characterize this completely. 

I've created a first-pass PPR for our test data set, but it has some flaw I'm not seeing yet (the pipeline rejects the ProcessingRequests).   PPR_jls.xml

I have one that works as completely as the piperestorescript.py : failing on an observatory position.  PPR_jls_worked.xml


Now we need some further experimentation, and some idea of how to handle any variation in parameters to the hifa_restoredata call. 


A New Test With uid://A001/X12a3/X80e

EBs:

Product Files (calibration & script FILE_CLASS):

  • member.uid___A001_X12a3_X80e.calimage.pipeline_manifest.xml
  • member.uid___A001_X12a3_X80e.hifa_calimage.auxproducts.tgz
  • member.uid___A001_X12a3_X80e.hifa_calimage.casa_piperestorescript.py
  • member.uid___A001_X12a3_X80e.hifa_calimage.casa_pipescript.py
  • member.uid___A001_X12a3_X80e.session_2.auxcaltables.tgz
  • member.uid___A001_X12a3_X80e.session_2.caltables.tgz
  • uid___A002_Xcf92df_X3453.ms.calapply.txt
  • uid___A002_Xcf92df_X3453.ms.flagversions.tgz
  • uid___A002_Xcf92df_X3453_target.ms.auxcalapply.txt
  • member.uid___A001_X12a3_X80e.scriptForPI.py

From the looks of this, we just have a single session (2), with 2 ebs.

Directory Structure:

  • 2017.1.01347.S
    • SOUS_uid___A001_X12a3_X80c
      • GOUS_uid___A001_X12a3_X80d
        • MOUS_uid___A001_X12a3_X80e

Info for the PPR (using Alma_Restore_PPR_Queries.sql.html):

  • Project Code: 2017.1.01347.S
  • ObsProject UID: uid://A001/X1221/Xdf8
  • ObsProject PartID: X659301709
  • ObsProjectStatusUID: uid://A001/X1221/Xdfc
  • OUSStatusRef: same as our base information
  • Scheduling Block UID: uid://A001/X12a3/X804
  • Scheduling Block Status UID: uid://A001/X12a3/X80f
  • Status XML:
    • <?xml version="1.0" encoding="UTF-8"?>
      <ouss:OUSStatus xmlns:val="Alma/ValueTypes" xmlns:orv="Alma/ObsPrep/ObsReview" xmlns:ouss="Alma/Scheduling/OUSStatus" xmlns:oat="Alma/ObsPrep/ObsAttachment" xmlns:ps="Alma/Scheduling/ProjectStatus" xmlns:sbs="Alma/Scheduling/SBStatus" xmlns:prj="Alma/ObsPrep/ObsProject" xmlns:ent="Alma/CommonEntity" xmlns:sbl="Alma/ObsPrep/SchedBlock" xmlns:prp="Alma/ObsPrep/ObsProposal" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" schemaVersion="13" revision="23" almatype="APDM::OUSStatus" xsi:type="ouss:OUSStatus">
      <ps:Status State="Delivered"/>
      <ps:TimeOfUpdate>2018-07-10 11:00:46.672</ps:TimeOfUpdate>
      <ps:executionsRemaining>-1</ps:executionsRemaining>
      <ps:successfulExecutions>2</ps:successfulExecutions>
      <ps:failedExecutions>0</ps:failedExecutions>
      <ps:secondsRemaining>7200</ps:secondsRemaining>
      <ps:successfulSeconds>0</ps:successfulSeconds>
      <ps:failedSeconds>0</ps:failedSeconds>
      <ps:hasExecutionCount>true</ps:hasExecutionCount>
      <ps:hasTimeLimit>true</ps:hasTimeLimit>
      <ps:bookkeepingInitialized>false</ps:bookkeepingInitialized>
      <ps:TotalUsedTimeInSec>0</ps:TotalUsedTimeInSec>
      <ps:ContainingObsUnitSetRef entityId="uid://A001/X12a3/X80d" entityTypeName="OUSStatus" documentVersion="1"/>
      <ps:ProjectStatusRef entityId="uid://A001/X1221/Xdfc" entityTypeName="ProjectStatus" documentVersion="1"/>
      <ouss:OUSStatusEntity entityId="uid://A001/X12a3/X80e" entityIdEncrypted="-- id encryption not implemented --" entityTypeName="OUSStatus" schemaVersion="13"/>
      <ouss:ObsUnitSetRef entityId="uid://A001/X1221/Xdf8" partId="X659301709" entityTypeName="ObsProject"/>
      <ouss:NumberSBsCompleted>2</ouss:NumberSBsCompleted>
      <ouss:SBStatusRef entityId="uid://A001/X12a3/X80f" entityTypeName="SBStatus" documentVersion="1"/>
      <ouss:Session entityPartId="X00000000" almatype="APDM::Session">
      <ouss:StartTime>2018-06-06 12:41:10.826</ouss:StartTime>
      <ouss:EndTime>2018-06-06 13:07:22.036</ouss:EndTime>
      <ouss:ExecBlockRef>
      <val:ExecBlockId>uid://A002/Xce574d/Xa33e</val:ExecBlockId>
      </ouss:ExecBlockRef>
      <ouss:SBStatusRef entityId="uid://A001/X12a3/X80f" entityTypeName="SBStatus"/>
      </ouss:Session>
      <ouss:Session entityPartId="X00000001" almatype="APDM::Session">
      <ouss:StartTime>2018-07-10 10:34:32.962</ouss:StartTime>
      <ouss:EndTime>2018-07-10 11:00:46.858</ouss:EndTime>
      <ouss:ExecBlockRef>
      <val:ExecBlockId>uid://A002/Xcf92df/X3453</val:ExecBlockId>
      </ouss:ExecBlockRef>
      <ouss:SBStatusRef entityId="uid://A001/X12a3/X80f" entityTypeName="SBStatus"/>
      </ouss:Session>
      </ouss:OUSStatus>


Ok, so I was wrong.  Looks like there's two sessions, but that would indicate that we might be missing products in our list. 

But notice, the SBStatusRef and ExecBlockIds match what we found before, which might be a good consistency check if we can parse this XML reasonably.

And here's the PPR I created:

<?xml version="1.0" encoding="UTF-8"?>                                                                                                              
<SciPipeRequest xmlns:ent="Alma/CommonEntity"                                                                                                       
    xmlns:val="Alma/ValueTypes" xmlns:prp="Alma/ObsPrep/ObsProposal"                                                                                
    xmlns:orv="Alma/ObsPrep/ObsReview"                                                                                                              
    xmlns:ps="Alma/ObsPrep/ProjectStatus"                                                                                                           
    xmlns:oat="Alma/ObsPrep/ObsAttachment"                                                                                                          
    xmlns:prj="Alma/ObsPrep/ObsProject"                                                                                                             
    xmlns:sbl="Alma/ObsPrep/SchedBlock"                                                                                                             
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="SciPipeRequest">                                                                
    <SciPipeRequestEntity entityId="UID_UNASSIGNED"                                                                                                 
        entityTypeName="SciPipeRequest" datamodelVersion="0.1"/>                                                                                    
    <ProjectSummary>                                                                                                                                
        <ProposalCode>2017.1.01347.S</ProposalCode>                                                                                                 
        <ProposalTitle>This is our test data set</ProposalTitle>                                                                                    
        <Observatory>ALMA Joint Observatory</Observatory>                                                                                           
        <Telescope>ALMA</Telescope>                                                                                                                 
        <ProcessingSite>NAASC</ProcessingSite>                                                                                                      
        <Operator>Jim Sheckard</Operator>                                                                                                           
        <Mode>CSV</Mode>                                                                                                                            
        <Version>Undefined</Version>                                                                                                                
        <CreationTime>2018-08-07T13:18:00.000</CreationTime>                                                                                        
    </ProjectSummary>                                                                                                                               
    <ProjectStructure>                                                                                                                              
       <ObsUnitSetRef entityId="uid://A001/X1221/Xdf8"                                                                                              
            partId="X659301709" entityTypeName="ObsProject"/>                                                                                       
        <ObsUnitSetTitle>Undefined</ObsUnitSetTitle>                                                                                                
        <ObsUnitSetType>Member</ObsUnitSetType>                                                                                                     
        <ProjectStatusRef entityId="uid://A001/X1221/Xdfc"                                                                                          
            entityTypeName="ProjectStatus" documentVersion="1"/>                                                                                    
        <OUSStatusRef entityId="uid://A001/X12a3/X80e" entityTypeName="OUSStatus"/>                                                                 
    </ProjectStructure>                                                                                                                             
    <ProcessingRequests>                                                                                                                            
      <ProcessingRequest>                                                                                                                           
        <RootDirectory>/lustre/naasc/web/almapipe/pipeline/vatest/testing</RootDirectory>                                                           
        <ProcessingIntents>                                                                                                                         
          <Intents>                                                                                                                                 
            <Keyword>PROCESS</Keyword>                                                                                                              
            <Value>true</Value>                                                                                                                     
          </Intents>                                                                                                                                
          <Intents>                                                                                                                                 
            <Keyword>SESSION_1</Keyword>                                                                                                            
            <Value>uid://A002/Xce574d/Xa33e</Value>                                                                                                 
          </Intents>                                                                                                                                
          <Intents>                                                                                                                                 
            <Keyword>SESSION_2</Keyword>                                                                                                            
            <Value>uid://A002/Xcf92df/X3453</Value>                                                                                                 
          </Intents>                                                                                                                                
          <Intents>                                                                                                                                 
            <Keyword>INTERFEROMETRY_STANDARD_OBSERVING_MODE</Keyword>                                                                               
            <Value>Undefined</Value>                                                                                                                
          </Intents>                                                                                                                                
        </ProcessingIntents>                                                                                                                        
        <ProcessingProcedure>                                                                                                                       
          <ProcedureTitle>hifa_restore_jls</ProcedureTitle>                                                                                         
          <ProcessingCommand>                                                                                                                       
            <Command>hifa_restoredata</Command>
            <ParameterSet/>
          </ProcessingCommand>
        </ProcessingProcedure>
        <DataSet>
          <SchedBlockSet>
            <SchedBlockIdentifier>
              <RelativePath>jls_ous_restore_v1_myppr</RelativePath>
              <SchedBlockRef entityId="uid://A001/X12a3/X804"
                             entityTypeName="SchedBlock" documentVersion="1"/>
              <SBStatusRef entityId="uid://A001/X12a3/X80f" entityTypeName="SBStatus"/>
              <SBTitle>Undefined</SBTitle>
              <AsdmIdentifier>
                <AsdmRef>
                  <ExecBlockId>uid://A002/Xce574d/Xa33e</ExecBlockId>
                </AsdmRef>
                <AsdmDiskName>uid___A002_Xce574d_Xa33e</AsdmDiskName>
              </AsdmIdentifier>
              <AsdmIdentifier>
                <AsdmRef>
                  <ExecBlockId>uid://A002/Xcf92df/X3453</ExecBlockId>
                </AsdmRef>
                <AsdmDiskName>uid___A002_Xcf92df_X3453</AsdmDiskName>
              </AsdmIdentifier>
            </SchedBlockIdentifier>
          </SchedBlockSet>
        </DataSet>
      </ProcessingRequest>
    </ProcessingRequests>
    <ResultsProcessing>
        <ArchiveResults>false</ArchiveResults>
        <CleanUpDisk>false</CleanUpDisk>
        <UpdateProjectLifeCycle>false</UpdateProjectLifeCycle>
        <NotifyOperatorWhenDone>false</NotifyOperatorWhenDone>
        <PipelineOperatorAdress>Unknown</PipelineOperatorAdress>
    </ResultsProcessing>
</SciPipeRequest>



But I'm still a bit nervous about the 1session/2session thing.  Doing a bit more looking, the manifest only mentions one of the two exec blocks, not both.  I quick peek at the AQUA_EXECBLOCK table shows one as SemiPass in QA0, so that may have something to do with it.  


Even Better: the pipeline-generated restore script:

from recipes.almahelpers import fixsyscaltimes # SACM/JAO - Fixes
__rethrow_casa_exceptions = True
h_init()
try:
    hifa_importdata (dbservice=False, bdfflags=False, vis=['../rawdata/uid___A002_Xcf92df_X3453'], session=['session_2'], ocorr_mode='ca')
    fixsyscaltimes(vis = 'uid___A002_Xcf92df_X3453.ms')# SACM/JAO - Fixes
    h_save() # SACM/JAO - Finish weblog after fixes
    h_init() # SACM/JAO - Restart weblog after fixes
    hifa_restoredata (vis=['uid___A002_Xcf92df_X3453'], session=['session_2'], ocorr_mode='ca')
finally:
    h_save()


So we would need to need to add two additional procedures.  How can we know when this is necessary? 

How common is this?

Catarina did a good job with this curveball.  Was it intentional?




  • No labels