...
DONE: Access: Would like to prevent users from being able to login to nodes unless they have a proper reservation. Right now we restrict access via /etc/security/access.conf and use Torque's pam_pbssimpleauth.so to allow access for any user running a job.
- Slurm
- Has a pam_slurm.so module which does seem to work like the pam_pbssimpleauth.so module.
- Has a pam_slurm.so module which does seem to work like the pam_pbssimpleauth.so module.
- HTCondor
- How do we restrict access to condor nodes to only those users with valid jobs running?
- With the restrictions in access.conf, HTCondor can still run jobs as users like krowe2. I think this is because HTCondor doesn't use the login mechanism but just starts shells as the user.
- How do we restrict access to condor nodes to only those users with valid jobs running?
- OpenPBS
- Doesn't come with a PAM module and the Torque PAM module doesn't work with OpenPBS.
- restrictrestrict_user and restrict_user_exceptions work in the mom_priv/config file but there is a max of 10 user exceptions. With a PAM module we could make as many exceptions as we like and can use groups and netgroups.
- Slurm
...
DONE: Cgroups: We will need protection like what cgroups provide so that jobs can’t impact other jobs on the same node.
- Slurm
- /etc/slurm/cgroup.conf
- HTCondor
- Set CGROUP_MEMORY_LIMIT_POLICY = hard in /etc/condor/config.d/99-nrao on the execute nodes.
- Set CGROUP_MEMORY_LIMIT_POLICY = hard in /etc/condor/config.d/99-nrao on the execute nodes.
- OpenPBS
- qmgr -c "set hook pbs_cgroups enabled = true"
- Slurm
DONE: Submit hosts: we may have several hosts that will need to be able to submit and delete jobs. (wirth, mcilroy, hamilton, etc)
- Slurm
- Slurm-20 requires systemd so hosts must be RHEL7 or later.
- HTCondor
- Slurm
...