Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Reaper: Clean nodes of unwanted files, dirs and procs.  I don't think HTCondor will need this.

    • Slurm

      • There is the pam_slurm_adopt.so that supposedly tracks and kills errant processes but it conflicts with systemd and therefore requires some special tweaking.

    • HTCondor
      • Seems to handle /tmp, /var/tmp, and /dev/shm properly because it uses fake versions of these dirs for each job.
      • It seems to handle errant processes as well.

...

  • Reaper: Cancel jobs when accounts are closed.  This could be a cron job on the Central Manager that looks at all the owners of jobs and kills jobs of any user that is not active.


  • Node priority: With Torque/Moab we can control the order in which the scheduler picks nodes.  This allows us to run jobs on the faster nodes by default.

    • Slurm
      • The order of the nodes in PartitionName is not important.  But you can set a Weight to a NodeName.  Nodes with the lowest weight will be chosen first.
    • HTCondor
      • There isn't a simple list like pbsnodes in Torque but there is NEGOTIATOR_PRE_JOB_RANK which can be used to weight nodes by cpu, memory, etc.

...