...
MPI: We have some users that use MPI across multiple nodes. It would be nice to keep that as an option.
- Slurm
- mpich2
- PATH=${PATH}:/usr/lib64/mpich/bin salloc --ntasks=8 mpiexec mpiexec.sh
- PATH=${PATH}:/usr/lib64/mpich/bin salloc --nodes=2 mpiexec mpiexec.sh
- OpenMPI
- Use #SBATCH to request a number of tasks (cores) and then run mpiexec or mpicasa as normal.
- mpich2
- HTCondor
- Single-node MPI jobs should work in the Vanilla universe.
- Multi-node MPI jobs require the creation of a Parallel universe or using Slurm insteadWhile there is a parallel universe for HTCondor, I think we will use Slurm for MPI jobs.
- Slurm
Cgroups: We will need protection like what cgroups provide so that jobs can’t impact other jobs on the same node.
- Slurm
- /etc/slurm/cgroup.conf
- HTCondor
- Set CGROUP_MEMORY_LIMIT_POLICY = hard in /etc/condor/config.d/99-nrao on the execute nodes.
- Set CGROUP_MEMORY_LIMIT_POLICY = hard in /etc/condor/config.d/99-nrao on the execute nodes.
- Slurm
...