...
- DONE: Port nodeextendjob to Slurm scontrol update jobid=974 timelimit=+7-0:0:0
- DONE: Port nodesfree to Slurm
- DONE: Port nodereboot to Slurm scontrol ASAP reboot reason=testing testpost001
- DONE: Create a subset of testpost cluster that only runs Slurm for admins to test.
- Done: Install Slurmctld on testpost-serv-1, testpost-master, and OS image
- Done: install Slurm reaper on OS image (RHEL-7.8.1.3)
- Done: Make the new testpost-master a Slurm submit host
- Create a small subset of nmpost cluster that only runs Slurm for users to test.
- Done: Install Slurmctld on nmpost-serv-1, nmpost-master, herapost-master, and OS image
- Done: install Slurm reaper on OS image (RHEL-7.8.1.3)
- Need at least 3 nodes: batch/interactive, vlass/vlasstest, hera/hera-i
- Done: Make the new nmpost-master a Slurm submit host
- Done: Make the new, disked herapost-master a Slurm submit host.
- Need at least 3 nodes: batch/interactive, vlass/vlasstest, hera/hera-i
- Identify stake-holders (E.g. operations, VLASS, DAs, sci-staff, SSA, HERA, observers) and give them the chance to test Slurm and provide opinions
- implement useful opinions
- Set a date to transition remaining cluster to Slurm. Possibly before we have to pay for Torque again around Jun. 2022.
- Do another pass on the documentation https://info.nrao.edu/computing/guide/cluster-processing
...