...
DONE: Ability to run jobs remotely (AWS, CHTC, OSG, etc)
- Slurm
- I don't think we will need this ability with Slurm
- HTCondor
- We have successfully tested both condor_annex to AWS, and flocking to CHTC.
- OpenPBS
- I don't think we will need this ability with OpenPBS
- I don't think we will need this ability with OpenPBS
- Slurm
DONE: Cgroups: We will need protection like what cgroups provide so that jobs can’t impact other jobs on the same node.
- Slurm
- /etc/slurm/cgroup.conf
- HTCondor
- Set CGROUP_MEMORY_LIMIT_POLICY = hard in /etc/condor/config.d/99-nrao on the execute nodes.
- Set CGROUP_MEMORY_LIMIT_POLICY = hard in /etc/condor/config.d/99-nrao on the execute nodes.
- OpenPBS
- qmgr -c "set hook pbs_cgroups enabled = true"
- Slurm
...