Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Perhaps use requirements.  Greg will send an example
    • SOLUTION:
    • DAG:
      • JOB step05 step05.htc
      • #VARS step05 SITE="chtc"
      • #VARS step05 SITE="aws"
    • Submit:
      • +NRAOAttr = "$(SITE)"
      • Requirements = My.NRAOAttr == "chtc" ? PoolName == "CHTC" : PoolName =!= "CHTC"
      • Requirements = My.NRAOAttr == "chtc" ? (Target.HasCHTCStaging == true) : (Target.HasCHTCStaging =!= true)

      • myannex = "krowe-annex"
      • +MayUseAWS = True
      • Requirements = My.NRAOAttr == "aws" ? AnnexName == $(myannex) : AnnexName =!= $(myannex)

    • I would set myannex in the DAG but when I do that it tries to find an AnnexName of "krowe - annex" (note spaces)
  • Is there a recommended way to start annexes from a DAG?  We have been using PRE scripts but sometimes it seems to fail.
    • CHTC is working on a BEGIN syntax (provision) that will block a DAG node from starting until the annex is ready.
    • We could have the script not return until the annex is ready.
    • We could also have the job require a specific name that the create_annex creates.

Answered Questions:

*

Answered Questions:

  • JOB ID question from Daniel
    • When I submit a job, I get a job ID back. My plan is to hold onto that job ID permanently for tracking. We have had issues in the

    JOB ID question from Daniel
    • When I submit a job, I get a job ID back. My plan is to hold onto that job ID permanently for tracking. We have had issues in the past with Torque/Maui because the job IDs got recycled later and our internal bookkeeping got mixed up. So my questions are:

       - Are job IDs guaranteed to be unique in HTCondor?
       - How unique are they—are they _globally_ unique or just unique within a particular namespace (such as our cluster or the submit node)?

    • A Job ID (ClusterID.ProcID)
    • DNS name of the schedd and ctime of the job_queued.log file.
    • It is unique to a schedd.
    • We should talk with Daniel about this.  They should craft their own ID.  It could be seeded with a JobID but should not depend on just it.
  • UpgradingHTCondor without killing jobs?
    • schedd can be upgraded and restarted without loosing state assuming the restart is less than the timeout.
    • currently restarting execute services will kill jobs.  CHTC is working on improving this.
    • negotiator and collector can be restarted without killing jobs.
    • CHTC works hard to ensure 8.8.x is compatible with 8.8.y or 8.9.x is compatible with 8.9.y.
  • Leaving data on execution host between jobs (data reuse)
    • Todd is working on this now.
  • Ask about installation of CASA locally and ancillary data (cfcache)
    • CHTC has a Ceph filesystem that is available to many of their execution hosts (notibly the larger ones)
    • There is another software filesystem where CASA could live that is more used for admin usage but might be available to us.
    • We could download the tarball each time over HTTP.  CHTC uses a proxy server so it would often be cached.
  • Environment:  Is there a way to have condor "login" when a job starts thus sourcing /etc/proflie and the user's rc files? Currently, not even $HOME is set.
    • A good analogy is Torque does a su - _username_ while HTCondor just does a su _username_
    • WORKAROUND: setting getenv = True which is like the -V option to qsub, may help. It doesn't source rc files but does inherit your current environment. This may be a problem if your current environment is not what you want on the cluster node. Perhaps the cluster node is a different OS or architecture.
    • ANSWER: condor doesn't execute things with a shell.  You could set your executable as /bin/bash and then have the arguments be the executable you used to have.  I just changed our stuff to staticly set $HOME and I think that is good enough.

...

  • How can we have the .dag.* files written to a different directory?  -usedagdir doesn't help.
    • ANSWER: There isn't a way to tell condor_submit_dag where to put the logs
  • Is there a for-loop structure available to DAG scripts or a range mechanic?
    • No
  • If 8.9.9 requires Globus from EPEL then it may have trouble being installed on a Globus endpoint because the EPEL version of Globus conflicts with the Globus.org version.
    • I told them about it. I have not tried installing HTCondor-8.9.9 so I am only guessing it will be a problem.



  • Is there a recommended way to start annexes from a DAG?  We have been using PRE scripts but sometimes it seems to fail.
    • SOLUTIONS:
      • CHTC is working on a BEGIN syntax (provision) that will block a DAG node from starting until the annex is ready.
      • We could have the script not return until the annex is ready.
      • We could also have the job require a specific name that the create_annex creates.