Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • MPI: We have some users that use MPI across multiple nodes.  It would be nice to keep that as an option.

    • Slurm
      • mpich2
        • PATH=${PATH}:/usr/lib64/mpich/bin salloc --ntasks=8 mpiexec mpiexec.sh
        • PATH=${PATH}:/usr/lib64/mpich/bin salloc --nodes=2 mpiexec mpiexec.sh
      • OpenMPI
        • Use #SBATCH to request a number of tasks (cores) and then run mpiexec or mpicasa as normal.
    • HTCondor
      • Single-node MPI jobs should work in the Vanilla universe.
      • Multi-node MPI jobs require the creation of a Parallel universe or using Slurm insteadWhile there is a parallel universe for HTCondor, I think we will use Slurm for MPI jobs.
  • Cgroups: We will need protection like what cgroups provide so that jobs can’t impact other jobs on the same node.

    • Slurm
      • /etc/slurm/cgroup.conf
    • HTCondor
      • Set CGROUP_MEMORY_LIMIT_POLICY = hard in /etc/condor/config.d/99-nrao on the execute nodes.

...