...
- Once SSA give them the all-clear, stakeholders (John Tobin, Mark Lacy ) test critical user-facing functions of the AAT/PPI under RHEL7
- User-facing things that need to be tested on the AAT/PPI production system (https://archive-new.nrao.edu):
- Downloads of VLA EBs: SDM-only (ML - job 316460859 OK), SDM (DM-Job 316336595 OK), basic MS (DM - test OK, but tar'd even when I ask for untar'd!), CMS - ops tests on SRDP-348 (pass)!DM FAIL, no CASA 5.6.2 SSA-5935) , SRDP-356 (pass).
- Downloads of VLA calibrations. - ML- job 316444531 started for non-prop data, worked, so calling this good. X - ML failed to be able to download my own proprietary data (but dd not try before the upgrade) - JT -I have not had trouble yet. DM: no issue for me, either. CASA 5.6.2 issue - SSA-5935
- Downloads of VLASS images (ML- Job 316404900)
- Downloads of VLBA UVFits files (SRDP-415)
- AUDI imaging (JT-309617329) - produces a cube and moves to the image-qa; not sure about beyond that
- ALMA restored MS download (Jt- 309621798 )
- ASDM Download (JT-309629153 )
- ALMA basic MS download (JT -309625789) (ML - my job 316420925 failed)
- Searches work
- Proprietary periods respected for VLA and ALMA - ML - I cannot my proprietary VLA dataset 19A-306 (authentication through NRAO). SSA-5934 submitted - SW: I can't access 19A-306 via my test account, which is proper
- Downloads of VLA EBs: SDM-only (ML - job 316460859 OK), SDM (DM-Job 316336595 OK), basic MS (DM - test OK, but tar'd even when I ask for untar'd!), CMS - ops tests on SRDP-348 (pass)!DM FAIL, no CASA 5.6.2 SSA-5935) , SRDP-356 (pass).
- User-facing things that need to be tested on the legacy archive production system (https://archive.nrao.edu):
- Downloads of VLA EBs: basic MS (DM - pass for VLA SDM
FAIL. nothing written to destination path but test ping successful. Tried with two latest 19A-020 EBs).
- Downloads of VLA EBs: basic MS (DM - pass for VLA SDM
- Stakeholders (Mark Lacy, Drew Medlin) test critical operations-facing functions of the AAT/PPI under RHEL7
- VNC functionality to cluster nodes (DM - currently failing with some people's default xstartup, workaround appears to exist, can move forward.)
- Operations-facing things that need to be tested:
- EB ingestion (SW: I'd note the system has been up for a day now, so we can test anything that should have come in last night)
- CIPL being triggered (manual CIPL start works, so does the auto-trigger - DM
Update: I set the workflow to RUN to trigger something, will remove it if it starts) - CIPL working (DM pipeline manually started and works to completion, including moving of files)
- calibration ingestion (QAPass) (DM, ingested test run of 19A-020 from 2019-10-27, ingested cal file shows, can be downloaded, is same size as original) with qa_notes.html indicating the results shouldn't be used, will replace with real run later)
- GO/NO decision, 3pm MDT October 30th (Drew Medlin, John Tobin, Mark Lacy, Stephan Witz, Amy Kimball ):
- If GO: Stephan Witzto with with CIS to undo the DNS change and MOTD banners, re-nable CIPL triggers. SW: Decision os GO, in the morning we'll walk back the DNS changes. undo the banners and send the all-clear.
- If NO-GO: SSA to iterate with stakeholders until result is GO, note that this may mean putting out a bugfix release of AAT/PPI 3.6 and doing the same for VLASS
Stakeholder tests/actions
VLASS actions
- update CASA versions in production Manager for active product types (QL imaging, SE calibration) to CASA RHel7 versions
- update CASA versions in production Manager for active product types (QL imaging, SE calibration) to CASA RHel7 versions
VLASS tests
- Run QL calibration job (test epoch on VLASS production manager) AEK: partial SUCCESS
- 1st try FAIL (workflows not running): Test epoch product #34132 Test.ql.cal.vlassRF-sqdeg-3C286_rise: VLASS Manager "job" successfully created and submitted (execution # 59093), but no cluster jobs running. First cluster job should create working directory.
- 2nd try FAIL (workflow had been built around RHel6): same product but new "job" and execution (#59094). cluster job 707.nmpost-serv-1.aoc.nrao.edu-PrepareWorkingDirectoryJob.vlass.w7.ab510b59-c86e-425b-9a40-ac3985f21b49 successful but status in Manager of downloadDataFormat, jobid, queue, etc. etc. etc. are all "undefined". Next step in execution is cluster job 710.nmpost-serv-1.aoc.nrao.edu-get-files.sh.vlass.w7.8d88844c-d361-4bef-a5b5-82c1f27c453d which appears as "started" in VLASS Manager but doesn't appear on cluster and seems to be hanging
- 3rd try SUCCESS SO FAR (jobs launched successfully; calibration step will take several hours)PARTIAL SUCCESS: execution completed correctly but status of cluster job and vlass-job execution did not update in Manager
- Run QL imaging job (production manager) AEK: partial SUCCESS: job completed successfully but not all steps tracked correctly in Manager
- Launched successfully: execution 59096, VLASS1.2_T17t10.J064711+243000
- A&A / R&A QL imaging jobs (the original attempts to A&A and R&A later completed after workflows turned on)
- AEK: attempted to A&A QL image (execution 59091) SUCCESS; execution/job status changed in Manager but nothing happened on disk in spool or cache
- AEK: attempted to R&A QL image (execution 59092) SUCCESS; execution/job status changed in Manager but nothing happened on disk in spool or cache
- Create scheduling block and products (test epoch)
- Run QL calibration job (test epoch on VLASS production manager) AEK: partial SUCCESS
...