Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • It runs conda activate base

    • Use a python-packages tarball instead of conda on the execution hosts.

      • Remove source ~/.bashrc

      • Remove conda activate base

      • Add wget http://proxy.chtc.wisc.edu/SQUID/chtc/python37.tar.gz

      • Add tar xfz python37.tar.gz

      • Add (cd python bin ; ln -s python3 python) so the default python is python3
      • Add tar xfz hera_calibration_packages.tar.gz

      • Add export PYTHONPATH=${PWD}/hera_calibration_packages

      • Add export PATH=${PWD}/python/bin:${PATH}

      • Add rm -f python37.tar.gz to the end of the script
  • It changes directory to /lustre/aoc/projects/hera/krowe/makeflow_sample/raw_data

    • This line should just be removed

  • It runs /lustre/aoc/projects/hera/krowe/hera_opm/pipelines/h1c/idr2/v2/task_scripts/do_EXTRACT_AUTOS.sh zen.2458098.44615.HH.uvh5

    • Instead run ./do_EXTRACT_AUTOS.sh zen.2458098.44615.HH.uvh5 zen.2458098.44615.HH.uvh5

    • zen.2458098.44615.HH.uvh5 will need to be copied to the scratch area on the execution host.

    • do_EXTRACT_AUTOS.sh requires ${src_dir}/_common.sh

      • Copy ${src_dir}/_common.sh to the working directory and make it a dependency in the .mf file

    • do_EXTRACT_AUTOS.sh calls extract_autos.py which is from hera_cal.git

      • Copy extract_autos.py to the working directory and make it a dependency in the .mf file
  • On success it touches the .out file

    • This is unnecessary.  The .out file can just be the target for the rule in the .mf file
  • On error it moves the .log file to .EXTRACT_AUTOS.log.error

    • Is this really necessary?

I am confused as to the difference between the .log, .out and .log.error files.


...

idr2.2.mf


Even though condor copies do_EXTRACT_AUTOS.sh to the scratch area, it doesn’t run it.  Instead, it runs the full-path version (/lustre/aoc/projects/hera/krowe/…)  This is because the .mf file runs /lustre/.../wrapper_zen.2458098.44615.HH.autos.uvh5.EXTRACT.AUTOS.sh which in turn runs /lustre/.../do_EXTRACT_AUTOS.sh  To fix this, the .mf file will need to be generated differently.



Makeflow uses the classic Make syntax like so


target : prerequisites


        recipe


where it is expected that the recipe will update the target.  So, I think what you want is the wrapper script to output into a file (perhaps .log) that is the target.  E.g.

zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log: /home/nu_kscott/hera/test1/wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh do_EXTRACT_AUTOS.sh _common.sh ../share/makeflow_sample/raw_data/zen.2458098.44615.HH.uvh5 hera_calibration_packages.tar.gz extract_autos.py

        ./wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh > zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log 2>&1

What is different?

  • The target (zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log) was changed from the .out file to the .log file.  This is common in makefiles where the target depends on the result of that target.  The .out file is created by the wrapper_*.sh script in the recipe. Honestly, using either the .log or .out file is ok.

  • The prerequisites contain all the files we want HTCondor to copy to the execution host.  hera_calibration_packages.tar.gz is about 250MB and zen.2458098.44615.HH.autos.uvh5 is about 5GB.  They are both too big for HTCondor's transfer mechanism and while it works in testing, we will need to either make them smaller or transfer them some other way.  http://chtc.cs.wisc.edu/file-avail-largedata

  • I replaced the absolute path to wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh with ./ to cause makeflow to tell HTCondor to transfer this file to the scratch area on the execution host and execute it from that scratch area.

  • I removed the absolute path from zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log so it can be written to the scratch area and HTCondor will copy it back from the execution host.

  • Build python-packages tarball.  python-packages

    • If over 100MB have it installed in CHTC’s squid proxy.  Using xz instead of gz compression reduces the file from 240MB to 178MB but increases the time to uncompress it from 65s to 215s.


Python Package Size

The size of the python packages file (hera_calibration_packages.tar.gz) is about 250MB.  According to http://chtc.cs.wisc.edu/file-avail-largedata, CHTC would like this to be under 100MB in order to use the HTCondor File Transfer mechanism.  So, we either need to reduce this file size with a combination of removing packages and/or better compression or ask CHTC to add it to their SQUID web proxy.


raw_data Size

The sample uvh5 data files I have seen are about 5GB in size.  This is way to large for the HTCondor File Transfer mechanism according to http://chtc.cs.wisc.edu/file-avail-largedata.  Is there a way these files can be split into just what each individual job needs?  If not, then they will have to live on the Large Data Staging filesystem which will limit the pool of available execution hosts.