Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

    • A good analogy is Torque does a su - _username_ while HTCondor just does a su _username_
    • WORKAROUND: setting getenv = True which is like the -V option to qsub, may help. It doesn't source rc files but does inherit your current environment. This may be a problem if your current environment is not what you want on the cluster node. Perhaps the cluster node is a different OS or architecture.
    • ANSWER: condor doesn't execute things with a shell.  You could set your executable as /bin/bash and then have the arguments be the executable you used to have.  I just changed our stuff to staticly set $HOME and I think that is good enough.

  • Flocking: Suppose I have two hosts in the same pool.  testpost-master is a submit-host and testpost-serv-1 is both a submit-host and the central-manager.  testpost-serv-1 is configured to flock to CHTC but testpost-master is not. Is it possible to submit a job on testpost-master that will flock to CHTC by somehow leveraging testpost-serv-1?  In other words, do I have to setup flocking and an external IP on every submit host?
    • ANSWER: there isn't a good way to do this.  So eventually we will need to make testpost-master flock to CHTC and possibly remove the ability of testpost-serv-1 to flock.

  • It seems the transfer mechanism won't transfer symlinks to directories (e.g. data/vlass.ms → /lustre/aoc/...) Is there a way around this?
    • ANSWER: there is no flag to chase symlinks at the moment.  The top level dir (e.g. data) could be a symlink  may work if transfer_input_files=data/

  • DAG log time stamps,  is there a way to differentiate data import/export time and process run time.
    • Look in the job log file not the dag log file
    • 040 (150.000.000) 2020-06-15 13:05:45 Started transferring input files
              Transferring to host: <10.64.10.172:9618?addrs=10.64.10.172-9618&alias=nmpost072.aoc.nrao.edu&noUDP&sock=slot1_1_72656_7984_60>
      ...
      040 (150.000.000) 2020-06-15 13:06:04 Finished transferring input files



  • Rank and Premption: Can we use Rank to set "preferences" without requiring job preemption?

...