Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  •  Once SSA give them the all-clear, stakeholders (John Tobin, Mark Lacy ) test critical user-facing functions of the AAT/PPI under RHEL7
  •  User-facing things that need to be tested on the AAT/PPI production system (https://archive-new.nrao.edu):
    •  Downloads of VLA EBs: SDM-only (ML - job 316460859 OK), SDM (DM-Job 316336595 OK), basic MS (DM - test OK, but tar'd even when I ask for untar'd!), CMS - ops tests on SRDP-348 (pass)!DM FAIL, no CASA 5.6.2 SSA-5935) , SRDP-356 (pass).
    •  Downloads of VLA calibrations. - ML- job 316444531 started for non-prop data, worked, so calling this good. X - ML failed to be able to download my own proprietary data (but dd not try before the upgrade) - JT -I have not had trouble yet. DM: no issue for me, either.  CASA 5.6.2 issue - SSA-5935
    •  Downloads of VLASS images (ML- Job 316404900)
    •  Downloads of VLBA UVFits files (SRDP-415)
    •  AUDI imaging (JT-309617329) - produces a cube and moves to the image-qa; not sure about beyond that
    •  ALMA restored MS download (Jt- 309621798 )
    •  ASDM Download (JT-309629153 )
    •  ALMA basic MS download  (JT -309625789) (ML - my job 316420925 failed)
    •  Searches work
    •  Proprietary periods respected for VLA and ALMA - ML - I cannot my proprietary VLA dataset 19A-306 (authentication through NRAO). SSA-5934 submitted - SW: I can't access 19A-306 via my test account, which is proper
  •  User-facing things that need to be tested on the legacy archive production system (https://archive.nrao.edu):
    •  Downloads of VLA EBs: basic MS (DM - pass for VLA SDM FAIL. nothing written to destination path but test ping successful. Tried with two latest 19A-020 EBs).
  •  Stakeholders (Mark LacyDrew Medlin) test critical operations-facing functions of the AAT/PPI under RHEL7
  •  VNC functionality to cluster nodes (DM - currently failing with some people's default xstartup, workaround appears to exist, can move forward.)
  •  Operations-facing things that need to be tested:
    •  EB ingestion (SW: I'd note the system has been up for a day now, so we can test anything that should have come in last night)
    •  CIPL being triggered (manual CIPL start works, so does the auto-trigger - DM . Update: I set the workflow to RUN to trigger something, will remove it if it starts)
    •  CIPL working (DM pipeline manually started and works to completion, including moving of files)
    •  calibration ingestion (QAPass) (DM, ingested test run of 19A-020 from 2019-10-27, ingested cal file shows, can be downloaded, is same size as original) with qa_notes.html indicating the results shouldn't be used, will replace with real run later)
  •  GO/NO decision, 3pm MDT October 30th (Drew Medlin, John Tobin, Mark Lacy, Stephan Witz, Amy Kimball ):
    •  If GO: Stephan Witzto with with CIS to undo the DNS change and MOTD banners, re-nable CIPL triggers. SW: Decision os GO, in the morning we'll walk back the DNS changes. undo the banners and send the all-clear.
    •  If NO-GO: SSA to iterate with stakeholders until result is GO, note that this may mean putting out a bugfix release of AAT/PPI 3.6 and doing the same for VLASS

Stakeholder tests/actions

VLASS actions

    •  update CASA versions in production Manager for active product types (QL imaging, SE calibration) to CASA RHel7 versions

VLASS tests

    •  Run QL calibration job (test epoch on VLASS production manager)  AEK: partial SUCCESS
      • 1st try FAIL (workflows not running): Test epoch product #34132 Test.ql.cal.vlassRF-sqdeg-3C286_rise: VLASS Manager "job" successfully created and submitted (execution # 59093), but no cluster jobs running. First cluster job should create working directory. 
      • 2nd try FAIL (workflow had been built around RHel6): same product but new "job" and execution (#59094). cluster job 707.nmpost-serv-1.aoc.nrao.edu-PrepareWorkingDirectoryJob.vlass.w7.ab510b59-c86e-425b-9a40-ac3985f21b49 successful but status in Manager of downloadDataFormat, jobid, queue, etc. etc. etc. are all "undefined". Next step in execution is cluster job 710.nmpost-serv-1.aoc.nrao.edu-get-files.sh.vlass.w7.8d88844c-d361-4bef-a5b5-82c1f27c453d which appears as "started" in VLASS Manager but doesn't appear on cluster and seems to be hanging
      • 3rd try SUCCESS SO FAR (jobs launched successfully; calibration step will take several hours)PARTIAL SUCCESS: execution completed correctly but status of cluster job and vlass-job execution did not update in Manager  
    •  Run QL imaging job (production manager)  AEK: partial SUCCESS:  job completed successfully but not all steps tracked correctly in Manager
      •  Launched successfully: execution 59096, VLASS1.2_T17t10.J064711+243000
    •  A&A / R&A QL imaging jobs  (the original attempts to A&A and R&A later completed after workflows turned on)
      •  AEK: attempted to A&A QL image (execution 59091) SUCCESS; execution/job status changed in Manager but nothing happened on disk in spool or cache
      •  AEK: attempted to R&A QL image (execution 59092) SUCCESS; execution/job status changed in Manager but nothing happened on disk in spool or cache
    •  Create scheduling block and products (test epoch)

...