This is a proof of concept for running HERA calibration jobs at CHTC using HTCondor and no shared filesystem.
Mar. 26, 2020 krowe: I was able to run this through HTCondor at CHTC via makeflow -T condor small.mf
While I think it worked and produced a file (zen.2458098.44615.HH.autos.uvh5) makeflow itself never returned. It just seemed to hang even thought the HTCondor job finished.
wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh
small.mf
This file was idr2_2.mf but was renamed small.mf and then all but the first targets were removed. Then, this was done to make it work at CHTC
< zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log: /home/nu_kscott/hera/test1/wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh do_EXTRACT_AUTOS.sh _common.sh ../share/makeflow_sample/raw_data/zen.2458098.44615.HH.uvh5 hera_calibration_packages.tar.gz extract_autos.py
< ./wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh > zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log 2>&1
---
> zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.out: /lustre/aoc/projects/hera/krowe/hera_opm/pipelines/h1c/idr2/v2/task_scripts/do_EXTRACT_AUTOS.sh
> /lustre/aoc/projects/hera/krowe/makeflow_sample/makeflow/wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh > /lustre/aoc/projects/hera/krowe/makeflow_sample/makeflow/zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log 2>&1
wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh
2,3c2,3
< source ~/.bashrc
< conda activate base
---
> #source ~/.bashrc
> #conda activate base
5,12c5,25
< cd /lustre/aoc/projects/hera/krowe/makeflow_sample/raw_data
< timeout 24h /lustre/aoc/projects/hera/krowe/hera_opm/pipelines/h1c/idr2/v2/task_scripts/do_EXTRACT_AUTOS.sh zen.2458098.44615.HH.uvh5
< if [ $? -eq 0 ]; then
< cd /lustre/aoc/projects/hera/krowe/makeflow_sample/makeflow
< touch zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.out
< else
< mv /lustre/aoc/projects/hera/krowe/makeflow_sample/makeflow/zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log /lustre/aoc/projects/hera/krowe/makeflow_sample/makeflow/zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log.error
< fi
---
> wget --no-verbose http://proxy.chtc.wisc.edu/SQUID/chtc/python37.tar.gz
> tar xfz python37.tar.gz
> tar xfz hera_calibration_packages.tar.gz
> date
> export PYTHONPATH=${PWD}/hera_calibration_packages
> export PATH=.:${PWD}/python/bin:${PATH}
>
> #CHTC's python37 tarball has bin/python3 and not bin/pyton
> (cd python/bin ; ln -s python3 python)
>
> #cd /lustre/aoc/projects/hera/krowe/makeflow_sample/raw_data
>
> timeout 24h ./do_EXTRACT_AUTOS.sh zen.2458098.44615.HH.uvh5
>
> # Perhaps wait and see where makeflow/condor put output and error.
> #if [ $? -eq 0 ]; then
> # cd /lustre/aoc/projects/hera/krowe/makeflow_sample/makeflow
> # touch zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.out
> #else
> # mv /lustre/aoc/projects/hera/krowe/makeflow_sample/makeflow/zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log /lustre/aoc/projects/hera/krowe/makeflow_sample/makeflow/zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log.error
> #fi
extract_autos.py
This script was installed as part of hera_cal.git in the conda environment so it assumes it knows where python is. But, since this needs to run on the execution host, we need to change it to run the python37 we install via http://chtc.cs.wisc.edu/python-jobs.shtml where we also symlink python to python3.
1c1
< #!/home/nu_kscott/hera/share/miniconda3/bin/python
---
> #!/usr/bin/env python