...
- Switch remaining nmpost nodes from Torque/Moab to Slurm on nmpost-serv-1
- cd /opt/services/diskless_boot/RHEL-7.8.1.5/nmpost/snapshot
- echo 'SLURMD_OPTIONS="--conf-server nmpost-serv-1"' > TEMPLATE/etc/sysconfig/slurmd
- echo '/etc/sysconfig/slurmd' >> TEMPLATE/files
- rm -f nmpost*/etc/sysconfig/pbs_mom
- for x in nmpost* ; do \cp -f TEMPLATE/etc/sysconfig/slurmd ${x}/etc/sysconfig ; done
- for x in nmpost* ; do \cp -f TEMPLATE/files ${x} ; done
- for x in nmpost* ; do sed -i '/^\/etc\/ssh\/shosts.equiv/d' ${x}/files
- for x in nmpost* ; do sed -i '/^\/etc\/ssh\/ssh_known_hosts/d' ${x}/files
- reboot each node
- Switch Torque nodescheduler, nodeextendjob, nodesfree with Slurm versions on zia
- cd /opt/local/stow
- stow -D cluster
- (cd cluster/bin ; rm -f nodescheduler ; ln -s nodescheduler-slurm nodescheduler)
- (cd cluster/bin ; rm -f nodescheduler-test ; ln -s nodescheduler-test-slurm nodescheduler-test)
- (cd cluster/bin ; rm -f nodeextendjob ; ln -s nodeextendjob-slurm nodeextendjob)
- (cd cluster/bin ; rm -f nodesfree ; ln -s nodesfree-slurm nodesfree)
- stow cluster
...
- Remove nodefindfphantoms
- Remove cancelmanyjobs
- Remove nodereboot and associated cron job on servers
- Remove Torque reaper
- Uninstall Torque from OS image.
- Uninstall Torque from nmpost and testpost servers
- Remove snapshot/*/etc/ssh/shosts.equiv
- Remove snapshot/*/etc/ssh/ssh_known_hosts
Done
- DONE: Set a PoolName for the testpost and nmpost clusters. E.g. NRAO-NM-PROD and NRAO-NM-TEST. They don't have to be allcaps.
- DONE: Change slurm so that nodes come up properly after a reboot instead of "unexpectedly rebooted" ReturnToService=2
- DONE: Document how to use HTCondor and Slurm with emphasis on transitioning from Torque/Moab
...