Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The target (zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log) was changed from the .out file to the .log file.  This is common in makefiles where the target depends on the result of that target.  The .out file is created by the wrapper_*.sh script in the recipe. Honestly, using either the .log or .out file is ok.

  • The prerequisites contain all the files we want HTCondor to copy to the execution host.  hera_calibration_packages.tar.gz is about 250MB and zen.2458098.44615.HH.autos.uvh5 is about 5GB.  They are both too big for HTCondor's transfer mechanism and while it works in testing, we will need to either make them smaller or transfer them some other way.  http://chtc.cs.wisc.edu/file-avail-largedata

  • I replaced the absolute path to wrapper_zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.sh with ./ to cause makeflow to tell HTCondor to transfer this file to the scratch area on the execution host and execute it from that scratch area.

  • I removed the absolute path from zen.2458098.44615.HH.uvh5.EXTRACT_AUTOS.log so it can be written to the scratch area and HTCondor will copy it back from the execution host.

  • Build python-packages tarball.  python-packages

    • If over 100MB have it installed in CHTC’s squid proxy.  Using xz instead of gz compression reduces the file from 240MB to 178MB but increases the time to uncompress it from 65s to 215s.


Python Package Size

The size of the python packages file (hera_calibration_packages.tar.gz) is about 250MB.  According to http://chtc.cs.wisc.edu/file-avail-largedata, CHTC would like this to be under 100MB in order to use the HTCondor File Transfer mechanism.  So, we either need to reduce this file size with a combination of removing packages and/or better compression or ask CHTC to add it to their SQUID web proxy.


raw_data Size

The sample uvh5 data files I have seen are about 5GB in size.  This is way to large for the HTCondor File Transfer mechanism according to http://chtc.cs.wisc.edu/file-avail-largedata.  Is there a way these files can be split into just what each individual job needs?  If not, then they will have to live on the Large Data Staging filesystem which will limit the pool of available execution hosts.