Page History

...

Create a subset of testpost cluster that only runs Slurm for admins to test.
- Install Slurm on testpost-serv-1, testpost-master, and OS image
- install Slurm reaper on OS image
Create a small subset of nmpost cluster that only runs Slurm for users to test.
- Install Slurm on nmpost-serv-1, nmpost-master, herapost-master, and OS image
- install Slurm reaper on OS image
- Need at least 4 nodes: batch, interactive, vlass/vlasstest, hera/hera-i
Identify stake-holders (E.g. operations, DAs, sci-staff, SSA, HERA, observers) and give them the chance to test Slurm and provide opinions
implement useful opinions
Set a date to transition remaining cluster to Slurm. Possibly before we have to pay for Torque again around Jun. 2022.
Do another pass on the documentation https://info.nrao.edu/computing/guide/cluster-processing
Port nodeextendjob to Slurm
Port nodesfree to Slurm

Switch remaining nmpost nodes from Torque/Moab to Slurm.
Switch Torque nodescheduler, nodeextendjob, nodesfree with Slurm nodeschedulerversions
Publish new documentation https://info.nrao.edu/computing/guide/cluster-processing

Done

DONE: Set a PoolName for the testpost and nmpost clusters. E.g. NRAO-NM-PROD and NRAO-NM-TEST. They don't have to be allcaps.
DONE: Change slurm so that nodes come up properly after a reboot instead of "unexpectedly rebooted" ReturnToService=2
DONE: Document how to use HTCondor and Slurm with emphasis on transitioning from Torque/Moab
- https://staff.nrao.edu/wiki/bin/view/NM/HTCondor#Simple_Documentation
- https://staff.nrao.edu/wiki/bin/view/NM/SlurmExampleSubmit
- I will convert these into pages in https://info.nrao.edu/computing/guide/cluster-processing/

...