...
Pack Jobs: Put jobs on nodes efficiently such that as many nodes as possible are left idle and available for users with large memory and/or large core-count requirements.
- Slurm has a sched/backfill plugin that backfills jobs similar to Torque/Moab.
- Add SchedulerType=sched/backfill to /etc/slurm/slurm.conf on the Management Node
- HTCondor
- Add NEGOTIATOR_DEPTH_FIRST = True to /etc/condor/config.d/99-nrao on the Central Manager
- Add NEGOTIATOR_DEPTH_FIRST = True to /etc/condor/config.d/99-nrao on the Central Manager
- Slurm has a sched/backfill plugin that backfills jobs similar to Torque/Moab.
Reaper: Clean nodes of unwanted files, dirs and procs. Condor seems to handle /tmp and /var/tmp properly because it uses fake versions of these dirs for each job. But /dev/shm is still an issue. What about errant processes?
Slurm
There is the pam_slurm_adopt.so that supposedly tracks and kills errant processes but it conflicts with systemd and therefore requires some special tweaking.
- HTCondor
- Seems to handle /tmp and /var/tmp properly because it uses fake versions of these dirs for each job.
- but /dev/shm is still an issue.
- What about errant processes?
- Slurm
- There is the pam_slurm_adopt.so that supposedly tracks and kills errant processes but it conflicts with systemd and therefore requires some special tweaking.
Reaper: Cancel jobs when accounts are closed.
...