Currently, the nmpost cluster is a mix of Torque/Moab nmpost{001..090} and HTCondor nmpost{091..120} devhost{001..002}. Eventually we would like to replace Torque/Moab with Slurm as we think it can do most of what Torque/Moab does but is free and has regular versions released. Torque/Moab seem to have stalled.
An option to replace Torque/Moab, instead of Slurm, is openpbs which seems to be the free version of PBS Pro maintained by Altair Engineering. I haven't used openpbs yet but it may be a simpler transition than Slurm.
- To Do
- Change slurm so that nodes come up properly after a reboot instead of "unexpectedly rebooted"
- upgrade testpost-master to RHEL7 so it can run Slurm
- upgrade nmpost-master to RHEL7 so it can run Slurm
- Configure gibson so that it can flock to CHTC
- Implement some sort of mechanism to keep vlass jobs on vlass nodes, hera jobs on hera nodes, etc
- DONE
- DONE: Set a PoolName for the testpost and nmpost clusters. E.g. NRAO-NM-PROD and NRAO-NM-TEST. They don't have to be allcaps.