...
Node priority: With Torque/Moab we can control the order in which the scheduler picks nodes by altering the oder of the nodes file. This allows us to run jobs on the faster nodes by default.
- Slurm
- I don't know how to set priorities in Slurm like we do in Torque where batch jobs get the faster nodes and interactive jobs get the slower nodes. There is a Weight feature to a NodeName where the lowest weight will be chosen first but that will affect batch and interactive partitions equally. I need another axis. Actually, this might work at least for hera and hera-jupyter.
NodeName=herapost[001-007] Sockets=2 CoresPerSocket=8 RealMemory=193370 Weight=10
NodeName=herapost011 Sockets=2 CoresPerSocket=10 RealMemory=515790 Weight=1
PartitionName=batch Nodes=herapost[001-007] Default=YES MaxTime=144000 State=UP
PartitionName=hera-jupyter Nodes=ALL MaxTime=144000 State=UP
- The order in which the nodes are defined in slurm.conf has no baring on which node the scheduler will choose. Even though the man page for slurm.conf reads "the order the nodes appear in the configuration file".
- Perhaps I can use some sbatch option in nodescheduler to choose slower nodes first.
- Perhaps use Gres to set a resource like KLARNS for various nodes (Gold 6135, E-5 2400, etc). The slower the node, the more KLARNS we will assign it. Then if Slurm assigns jobs to nodes with the most KLARNS then we can use that to select the slowest nodes first. Hinky? You betcha.
- I don't know how to set priorities in Slurm like we do in Torque where batch jobs get the faster nodes and interactive jobs get the slower nodes. There is a Weight feature to a NodeName where the lowest weight will be chosen first but that will affect batch and interactive partitions equally. I need another axis. Actually, this might work at least for hera and hera-jupyter.
- HTCondor
- There isn't a simple list like pbsnodes in Torque but there is NEGOTIATOR_PRE_JOB_RANK which can be used to weight nodes by cpu, memory, etc.
- OpenPBS
- Doesn't have a nodes file so I don't know what drives the order of the nodes chosen for jobs.
- Slurm
- Node packing: Doesn't seem to pack jobs on to one node and then move to the next. The documentation mentions a "best fit algorythm" but never explains what that is.
- SchedulerParameters=pack_serial_at_end This puts serial jobs (jobs with only one core) at the end of the node list. E.g. sbatch --cpus-per-task=2 tiny.sh will get put on testpost001 while sbatch --cpus-per-task=1 tiny.sh will get put on testpost004. So that isn't a good solution.
- SchedulerParameters=pack_serial_at_end This puts serial jobs (jobs with only one core) at the end of the node list. E.g. sbatch --cpus-per-task=2 tiny.sh will get put on testpost001 while sbatch --cpus-per-task=1 tiny.sh will get put on testpost004. So that isn't a good solution.
Reaper: Cancel jobs when accounts are closed.
This could be a cron job on the Central Manager that looks at all the owners of jobs and kills jobs of any user that is not active.
...