Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

MPI: We have some users that use MPI across multiple nodes. It would be nice to keep that as an option.
- Slurm
  - mpich2
    - PATH=${PATH}:/usr/lib64/mpich/bin salloc --ntasks=8 mpiexec mpiexec.sh
    - PATH=${PATH}:/usr/lib64/mpich/bin salloc --nodes=2 mpiexec mpiexec.sh
  - OpenMPI
    - Use #SBATCH to request a number of tasks (cores) and then run mpiexec or mpicasa as normal.
- HTCondor
  - While there is a parallel universe for HTCondor, I think we will use Slurm for MPI jobs.

Cgroups: We will need protection like what cgroups provide so that jobs can’t impact other jobs on the same node.
- Slurm
  - /etc/slurm/cgroup.conf
- HTCondor
  - Set CGROUP_MEMORY_LIMIT_POLICY = hard in /etc/condor/config.d/99-nrao on the execute nodes.

Submit hosts: we may have several hosts that will need to be able to submit and delete jobs. (wirth, mcilroy, hamilton, etc)
- Slurm
  - Slurm-20 requires systemd so hosts must be RHEL7 or later.
- HTCondor
  - https://staff.nrao.edu/wiki/bin/view/NM/HTCondor#Installation
  - https://staff.nrao.edu/wiki/bin/view/NM/HTCondor#Submit_Host

- Pack Jobs: Put jobs on nodes efficiently such that as many nodes as possible are left idle and available for users with large memory and/or large core-count requirements.
  - Slurm has a sched/backfill plugin that backfills jobs similar to Torque/Moab.

- Reaper: Clean nodes of unwanted files, dirs and procs. Condor seems to handle /tmp and /var/tmp properly because it uses fake versions of these dirs for each job. But /dev/shm is still an issue. What about errant processes?
  - HTCondor
  - Seems to handle /tmp and /var/tmp properly because it uses fake versions of these dirs for each job.
  - but /dev/shm is still an issue.
  - What about errant processes?
  - Slurm
  - There is the pam_slurm_adopt.so that supposedly tracks and kills errant processes but it conflicts with systemd and therefore requires some special tweaking.

- Reaper: Cancel jobs when accounts are closed.

- Node priority: With Torque/Moab we can control the order in which the scheduler pick nodes. This allows us to run jobs on the faster nodes by default. Can HTCondor do this?
  - Slurm
  - The order of the nodes in PartitionName is not important. But you can set a Weight to a NodeName. Nodes with the lowest weight will be chosen first.

- While preemption can be useful in some circumstances I expect we will want it disabled for the foreseeable future.
  - Slurm
  - The default is PreemptType=preempt/none which means Slurm will not preempt jobs.
- Run both Slurm and HTCondor on the same nodes
  - https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToScavengeCycles

https://open-confluence.nrao.edu/download/attachments/40537022/nmpost-slurm.conf?api=v2 is a proposed slurm.conf for our nmpost cluster.