Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I did look at openpbs which seems to be the free version of PBS Pro maintained by Altair Engineering.  I have found it lacking in a few important things: it doesn't support a working directory like Torque does with -d or -w, and has no PAM module allowing users to login if they have an active job which would make nodescheduler very hard to implement.  So I don't think openpbs is a suitable replacement for Torque/Moab.

To Do

Prep

  • Done: upgrade testpost-master to RHEL7 so it can run Slurm 122408
  • upgrade nmpost-master to RHEL7 so it can run Slurm 122408
  • Look at upgrading to the latest version of Slurm

...

  • DONE: Port nodeextendjob to Slurm scontrol update jobid=974 timelimit=+7-0:0:0
  • DONE: Port nodesfree to Slurm
  • DONE: Port nodereboot to Slurm scontrol ASAP reboot reason=testing testpost001
  • DONE: Create a subset of testpost cluster that only runs Slurm for admins to test.
    • Done: Install Slurmctld on testpost-serv-1, testpost-master, and OS image
    • Done: install Slurm reaper on OS image (RHEL-7.8.1.3)
    • Done: Make the new testpost-master a Slurm submit host
  • Create a small subset of nmpost cluster that only runs Slurm for users to test.
    • Install Slurmctld on nmpost-serv-1, nmpost-master, herapost-master, and OS image
    • Done: install Slurm reaper on OS image (RHEL-7.8.1.3)
    • Need at least 3 nodes: batch/interactive, vlass/vlasstest, hera/hera-i
    • Make the new nmpost-master a Slurm submit host
    • Make the new, disked herapost-master a Slurm submit host.
  • Identify stake-holders (E.g. operations, VLASS, DAs, sci-staff, SSA, HERA, observers) and give them the chance to test Slurm and provide opinions
  • implement useful opinions
  • Set a date to transition remaining cluster to Slurm.  Possibly before we have to pay for Torque again around Jun. 2022.
  • Do another pass on the documentation https://info.nrao.edu/computing/guide/cluster-processing

...