...
Queues: We want to keep the queue functionality of Torque/Moab where, for example, hera jobs go to hera nodes, vlass jobs go to vlass nodes. We would also like to be able to have vlasstest jobs go to the vlass nodes with a higher priority without preempting running jobs.
Slurm
- Queues are called partitions. At some level they are called partitions in Torque as well.
- Job preemtion is disabled by default
- Allows for simple priority settings in partitions with the default PriorityType=priority/basic plugin.
- E.g. PartitionName=vlass Nodes=testpost[002-004] MaxTime=144000 State=UP Priority=1000
- HTCondor
- HTCondor doesn't have queues or partitions like Torque/Moab or Slurm but there are still ways to do what we need.
- Using separate pools for things like HERA and VLASS is an option, but may be overkill as it would require separate Central Managers.
- Requirements or Constraints is an option. For example, HERA nodes could set the following in their configs
- HERA = True
- STARTD_ATTRS = HERA, $(STARTD_ATTRS)
- and users could set the following in their submit files
- Requirements = (HERA =?= True)
- We could do the same for VLASS/VLASSTEST but I don't know if HTCondor can prioritize VLASS over VLASSTEST the way we do with Moab.
- VLASS = True
- VLASSTEST = True
- STARTD_ATTRS = VLASS, VLASSTEST, $(STARTD_ATTRS)
- then users would set either requirements = (VLASS =?= True) or requirements = (VLASSTEST =?= True)
- I don't know how to simulate the vlass/vlasstest queues. Perhaps by the time we move to HTCondor we won't need vlasstest anymore.
- HTCondor does support accounting groups that may work like queues.
- Can we combine Requirements and Rank in such a way that a job can prefer to run on a VLASS node but will run on VLASSTEST if there are no VLASS nodes available? Will using Rank in this way leave that job susceptible to preemption?
Interactive: The ability to assign all or part of a node to a user with shell level access (nodescheduler, qsub -I, etc), minimal granularity is per NUMA node, finer would be useful.
- What is it that we like about nodescheduler over something like qsub -I or srun --pty bash or condor_submit -i
- It's not tied to any tty so a user can login multiple times from multiple places to their reserved node without requiring screen or tmux or vnc. It also means that users aren't all going through nmpost-master.
- Its creation is asynchronous. If the cluster is full you don't wait around for your reservation to start, you get an email message when it is ready.
- It's time limited (e.g. two weeks). We might be able to do the same with a queue/partition setting but could we then extend that reservation?
- We get to define the shape of a reservation (whole node, NUMA node, etc). If we just let people use qsub -I they could reserve all sorts of sizes which may be less efficient. Then again it may be more efficient. But either way it is simpler for our users.
- It's not tied to any tty so a user can login multiple times from multiple places to their reserved node without requiring screen or tmux or vnc. It also means that users aren't all going through nmpost-master.
- Slurm
- I don't see how Slurm can reserve NUMA nodes so we will have to just reserve X tasks with Y memory.
- I don't know how to keep Slurm from giving a user multiple portions of the same host. With Moab I used naccesspolicy=uniqueuser This prevents the ambiguity of which ssh connection goes to which cgroup.
- HTCondor
- Can HTCondor even do this?
- nodevnc
- Given the limitation of Slurm and HTCondor and that we already recommend users use VNC on their interactive nodes, why don't we just provide a nodevnc script that reserves a node (via torque, slurm or HTCondor), start a vnc server and then tells the user it is ready and how to connect to it? If someone still needs/wants just simple terminal access, then qsub -I or srun --pty bash or condor_submit -i might suffice.
- Given the limitation of Slurm and HTCondor and that we already recommend users use VNC on their interactive nodes, why don't we just provide a nodevnc script that reserves a node (via torque, slurm or HTCondor), start a vnc server and then tells the user it is ready and how to connect to it? If someone still needs/wants just simple terminal access, then qsub -I or srun --pty bash or condor_submit -i might suffice.
- What is it that we like about nodescheduler over something like qsub -I or srun --pty bash or condor_submit -i
...