Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • DONE: Pack Jobs: Put jobs on nodes efficiently such that as many nodes as possible are left idle and available for users with large memory and/or large core-count requirements.

    • Slurm
      • Add SchedulerType=sched/backfill to /etc/slurm/slurm.conf on the Management Node
    • HTCondor
      • Add NEGOTIATOR_DEPTH_FIRST = True to /etc/condor/config.d/99-nrao on the Central Manager
    • OpenPBS
      • Defaults to packing jobs.  Set smp_cluster_dist: pack in /var/spool/pbs/sched_priv on the central server.
      • qmgr -c 'set server backfill_depth = 10'


  • DONE: Reservations: The ability to reserve nodes far in the future for things like CASA classes and SIW would be very helpful.  It would need to prevent HTCondor from starting jobs on these nodes as reservation time approaches.

    • Slurm
      • scontrol create reservation starttime=now duration=5 nodes=testpost001 user=root
      • scontrol create reservation starttime=2022-05-3T08:00:00 duration=21-0:0:0 nodes=nmpost[020-030] user=root reservationname=siw2022
      • scontrol show res The output of this kinda sucks.  Hopefully there is a better way to see all the reservations.
    • HTcondor
      • There isn't a reservation feature in HTCondor.  Since CHTC makes use of preemption, their nodes can be removed at almost any time without adversely affecting running jobs.  Sadly NRAO cannot really use preemption.

...