...
- Update on software store for CASA either on shared Ceph storage or admin software storage
- Staging area for datasets 100MB - TBs. This is where we could try keeping the cfcache assuming doing so doesn't overwhelm the filesystem.
- /staging/nu_jrobnett
Requirements = (Target.HasCHTCStaging == true)
- Quota: 100GB, 100K files
- Squid area for 100MB - 1GB input or shared software. This is where we could keep casa.tgz and then have the execution host retrieve it via HTTP.
- /squid/nu_jrobnett
- only accessable via this path on the submit hosts. Execution hosts will need to access it via HTTP.
transfer_input_files = http://proxy.chtc.wisc.edu/SQUID/nu_jrobnett/casa.tgz
Software area We can use this in run-time applications. Think of it like /usr/local.
/software/nu_jrobnett/casa/casa-pipeline-release-5.6.1-8.el7
- export PATH=/opt/local/bin:/software/nu_jrobnett/casa/casa-pipeline-release-5.6.1-8.el7/bin:${PATH}
- Quota: 5GB, 100K files
- Staging area for datasets 100MB - TBs. This is where we could try keeping the cfcache assuming doing so doesn't overwhelm the filesystem.
- Transfer mechanism: I asked if the transfer mechanism works like rsync and the answer is sort of. They claim that only files with an mtime newer than when the transfer_input_files finished will be trasnfered back to the submit host. I am not sure I agree with this. While running a dag, the files in my working directory (which is in both transfer_input_files and transfer_output_files) seem to always have an mtime around the most recent step in the DAG suggesting that the entire working directory is copied from the execution host to the submit host at the end of each DAG step. Perhaps this means the transfer mechanism only looks at the mtime of the files/dirs specified in transfer_output_files and doesn't descend into the directories.
- Flocking: When we flock to CHTC what are the two points that transfer_input_files? Is it our submit host and CHTC's execution host?
...