Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • What limits are there to transfer_input_files?  I would sometimes get Failed to transfer files when the number of files was around 10,000
    • ANSWER: There is a memory limit because of ClassAds but in general there isn't a defined limit.
  • Is there a way to generate the dag.dot file without having to submit the job?
    • The -no_submit option doesn't create the .dot file
    • Is adding NOOP to all the JOB commands the right thing to do?  The DAG still gets submitted but then quickly ends.
    • ANSWER: You need to submit the DAG.  NOOP is the current solution
  • Is there a way to start a dag at a given point?  E.g. if there are 5 steps in the dag, can you start the job at step 3?
    • Is the answer again to add NOOP to the JOB commands you don't want to run?
    • ANSWER: Splices may work here but NOOP is certainly a tool to use.
  • I see at CHTC jobs now require a request_disk setting.  How does one launch interactive jobs?
    • ANSWER: This is a bug.
  • For our initial tests, we want to flock jobs to CHTC that transfer about 60GB input and output.  Eventually we will reduce this significantly but for now what can we do?
    • Globus or rsync to move data?  If Globus, how to do so in an automated way (E.g. no password)?
    • ANSWER: Using the Transfer Mechanism from our submit server to CHTC's execution host is ok with CHTC because it doesn't interfere with their submit host.  Outbound from their execute hosts is also allowed (scp, rsync, etc).
  • Use +Longjobs Longjob = true attribute for 72+ hour jobs.  This is a CHTC-specific knob.

...

  • How can I find out what hosts are available for given requirements (LongJobsLongJob, memory, staging)
    • condor_status -compact -constraint "HasChtcStaging==true" -constraint 'DetectedMemory>500000' -constraint "CanRunLongJobs isnt Undefined"
    • Answer: yes this is correct but it doesn't show what other jobs are waiting on the same resources.  Which is fine.
  • It looks to me like most hosts at CHTC are setup to run LongJobs.  The following shows a small list of about 20 hosts so I assume all others can run long jobs.  Is the correct?
    • condor_status -compact -constraint "CanRunLongJobs is Undefined"
    • JongJobs is for something like 72 hours.  So it might be best to not set it unless we really need it like step23.
    • Answer: yes this is correct but it doesn't show what other jobs are waiting on the same resources.  Which is fine.

...

  • Given a DAG where some steps are to always  run at AOC and some are to always  run in CHTC how can we dictate this.  Right now local jobs flock  to CHTC if local resources are full.  Can we make local jobs idle instead of flock?
    • ANSWER: Use PoolNames.  I need to make a testpost PoolName.
    • PoolName = "TESTPOST"
    • STARTD_ATTRS = $(STARTD_ATTRS) PoolName


  • It seems that when using DAGs the recommended method is to define variables in the DAG script instead of submit scripts.  This makes sense as it allows for only one file, the DAG script, that needs to be edited to make changes.  But, is there a way to get variables into the DAG script from the command line or environment or an include_file or something?
    • ANSWER: There is an INCLUDE syntax but there is no command-line or environment variable way to get vars into a DAG.
  • We are starting to run 10s of jobs in CHTC requiring 40GB as part of a local DAG.  Are there any options we can set to improve their execution chance.  What memory footprint (32, 20, 16, 8GB) would significantly improve their chances.
    • ANSWER: only use +LongJobs LongJob if the job needs more than 72 hour, which is the default "walltime".

...