...
It runs conda activate base
Use a python-packages tarball instead of conda on the execution hosts.
Remove source ~/.bashrc
Remove conda activate base
Add wget http://proxy.chtc.wisc.edu/SQUID/chtc/python37.tar.gz
Add tar xfz python37.tar.gz
- Add (cd python bin ; ln -s python3 python) so the default python is python3
Add tar xfz hera_calibration_packages.tar.gz
Add export PYTHONPATH=${PWD}/hera_calibration_packages
Add export PATH=${PWD}/python/bin:${PATH}
- Add rm -f python37.tar.gz to the end of the script
It changes directory to /lustre/aoc/projects/hera/krowe/makeflow_sample/raw_data
This line should just be removed
It runs /lustre/aoc/projects/hera/krowe/hera_opm/pipelines/h1c/idr2/v2/task_scripts/do_EXTRACT_AUTOS.sh zen.2458098.44615.HH.uvh5
Instead run ./do_EXTRACT_AUTOS.sh zen.2458098.44615.HH.uvh5 zen.2458098.44615.HH.uvh5
zen.2458098.44615.HH.uvh5 will need to be copied to the scratch area on the execution host.
- We could just make it a dependency in the .mf file, but it is too large for HTCondor's transfer mechanism.
- http://chtc.cs.wisc.edu/file-avail-largedata
do_EXTRACT_AUTOS.sh requires ${src_dir}/_common.sh
Copy ${src_dir}/_common.sh to the working directory and make it a dependency in the .mf file
do_EXTRACT_AUTOS.sh calls extract_autos.py which is from hera_cal.git
- Copy extract_autos.py to the working directory and make it a dependency in the .mf file
- Copy extract_autos.py to the working directory and make it a dependency in the .mf file
On success it touches the .out file
- This is unnecessary. The .out file can just be the target for the rule in the .mf file
- This is unnecessary. The .out file can just be the target for the rule in the .mf file
On error it moves the .log file to .EXTRACT_AUTOS.log.error
- Is this really necessary?
- Is this really necessary?
I am confused as to the difference between the .log, .out and .log.error files.
...
idr2.2.mf
Even though condor copies do_EXTRACT_AUTOS.sh to the scratch area, it doesn’t run it. Instead, it runs the full-path version (/lustre/aoc/projects/hera/krowe/…) This is because the .mf file runs /lustre/.../wrapper_zen.2458098.44615.HH.autos.uvh5.EXTRACT.AUTOS.sh which in turn runs /lustre/.../do_EXTRACT_AUTOS.sh To fix this, the .mf file will need to be generated differently.
Makeflow uses the classic Make syntax like so
target : prerequisites
recipe
where it is expected that the recipe will update the target. So, I think what you want is the wrapper script to output into a file (perhaps .log) that is the target. E.g.
zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log: /home/nu_kscott/hera/test1/wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh do_EXTRACT_AUTOS.sh _common.sh ../share/makeflow_sample/raw_data/zen.2458098.44615.HH.uvh5 hera_calibration_packages.tar.gz extract_autos.py
./wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh > zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log 2>&1
What is different?
The target (zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log) was changed from the .out file to the .log file. This is common in makefiles where the target depends on the result of that target. The .out file is created by the wrapper_*.sh script in the recipe. Honestly, using either the .log or .out file is ok.
The prerequisites contain all the files we want HTCondor to copy to the execution host. hera_calibration_packages.tar.gz is about 250MB and zen.2458098.44615.HH.autos.uvh5 is about 5GB. They are both too big for HTCondor's transfer mechanism and while it works in testing, we will need to either make them smaller or transfer them some other way. http://chtc.cs.wisc.edu/file-avail-largedata
I replaced the absolute path to wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh with ./ to cause makeflow to tell HTCondor to transfer this file to the scratch area on the execution host and execute it from that scratch area.
I removed the absolute path from zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log so it can be written to the scratch area and HTCondor will copy it back from the execution host.
Build python-packages tarball. python-packages
If over 100MB have it installed in CHTC’s squid proxy. Using xz instead of gz compression reduces the file from 240MB to 178MB but increases the time to uncompress it from 65s to 215s.
Python Package Size
The size of the python packages file (hera_calibration_packages.tar.gz) is about 250MB. According to http://chtc.cs.wisc.edu/file-avail-largedata, CHTC would like this to be under 100MB in order to use the HTCondor File Transfer mechanism. So, we either need to reduce this file size with a combination of removing packages and/or better compression or ask CHTC to add it to their SQUID web proxy.
raw_data Size
The sample uvh5 data files I have seen are about 5GB in size. This is way to large for the HTCondor File Transfer mechanism according to http://chtc.cs.wisc.edu/file-avail-largedata. Is there a way these files can be split into just what each individual job needs? If not, then they will have to live on the Large Data Staging filesystem which will limit the pool of available execution hosts.