Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

07/19/21 11:45:01 The DaemonShutdown expression "State == "Unclaimed" && Activity == "Idle" && (MyCurrentTime - EnteredCurrentActivity) > 600" evaluated to TRUE: starting graceful shutdown


testpost-cm-vml krowe >condor_status
Name OpSys Arch State Activity LoadAv Mem

slot1@testpost001.aoc.nrao.edu LINUX X86_64 Unclaimed Idle 0.000 193
slot1_1@testpost001.aoc.nrao.edu LINUX X86_64 Claimed Busy 0.000
slot1@testpost002.aoc.nrao.edu LINUX X86_64 Unclaimed Idle 0.000 144
slot1_1@testpost002.aoc.nrao.edu LINUX X86_64 Claimed Busy 0.810 49
slot1@testpost003.aoc.nrao.edu LINUX X86_64 Unclaimed Idle 0.000 193

HTCondor and Slurm

NRAO has effectively two use cases:  1) Operations triggered jobs.  These are well formulated pipeline jobs, they're still fairly monolithic and long running (many hours to few days).   2) User triggered jobs, these are of course not well formulated.  We will be moving the operations jobs to htcondor.   We plan to move the user triggered jobs to SLURM form Torque.   There's enough noise in the two job loads that we don't want to have strict host carve outs for type 1 and type 2 jobs.  What we anticipate doing is having a set of nodes known only to htcondor for the bulk of operations and a set of hosts controlled by SLURM for the user facing jobs.   Periodically when they have a large set of operations jobs we'd like for them to burst into the SLURM controlled nodes.  We neither anticipate nor want the slurm jobs to burst into the htcondor set of nodes.

...