...
- Transfer mechanism: Documentation implies that only files with an mtime newer than when the transfer_input_files finished will be transferred back to the submit host. While running a dag, the files in my working directory (which is in both transfer_input_files and transfer_output_files) seem to always have an mtime around the most recent step in the DAG suggesting that the entire working directory is copied from the execution host to the submit host at the end of each DAG step. Perhaps this means the transfer mechanism only looks at the mtime of the files/dirs specified in transfer_output_files and doesn't descend into the directories.
- Flocking: When we flock to CHTC what are the two points that is the data path for transfer_input_files? Is it our submit host and CHTC's execution host, is CHTCs submit host involved ?
- Public_input_files: How is this different than transfer_input files and when would one want to use it instead of files or URLs with transfer_input_files?
- How can we we data path for transfer_input_files faster to our clients given multiple networks. Currently we assume it will use the 1Gb link but we have IB links. Should we upgrade to 10Gb? Is there a way for condor to use the IB link just for transferring files, is that hostname based ? Other ideas?
- Issues running single-node OpenMPI jobsAre there known issues with distributed via NFS or Lustre w.r.t tmpdir or other, e.g. OpenMPI complains about tmpdir being on network FS?
- What does DAGMan do when a process returns an error code (like 1)? Is there a way DAGMan can be told to ignore errors?
...