...
Queues: We want to keep the multiple queue functionality of Torque/Moab where, for example, HERA jobs go to hera nodes and VLASS jobs go to vlass nodes. We would also like to be able to have vlasstest jobs go to the vlass nodes with a higher priority without preempting running jobs.
Slurm
- Queues are called partitions. At some level they are called partitions in Torque as well.
- Job preemtion is disabled by default
- Allows for simple priority settings in partitions with the default PriorityType=priority/basic plugin.
- E.g. PartitionName=vlass Nodes=testpost[002-004] MaxTime=144000 State=UP Priority=1000
- HTCondor
- HTCondor doesn't have queues or partitions like Torque/Moab or Slurm but there are still ways to do what we need.
- Using separate pools for things like HERA and VLASS is an option, but may be overkill as it would require separate Central Managers.
- Requirements or Constraints is an option. For example, HERA nodes could set the following in their configs
- HERA = True
- STARTD_ATTRS = HERA, $(STARTD_ATTRS)
- and users could set the following in their submit files
- Requirements = (HERA =?= True) or Requirements = (HERA == True) The differences may not be important.
- We could do the same for VLASS/VLASSTEST but I don't know if HTCondor can prioritize VLASS over VLASSTEST the way we do with Moab. We could also do something like this for interactive nodes and nodescheduler if we end up using that.
- VLASS = True
- VLASSTEST = True
- STARTD_ATTRS = VLASS, VLASSTEST, $(STARTD_ATTRS)
- then users would set either requirements = (VLASS =?= True) or requirements = (VLASSTEST =?= True)
- Or if you wanted to keep the priority where VLASS jobs only run on VLASSTEST nodes when they aren't busy, the user could set Rank = (VLASS == 1) and requirements = (VLASS =?= True) in order to run on a VLASS node and only run on a VLASSTEST node when there are no VLASS nodes available.
- HTCondor does support accounting groups that may work like queues.
- Because of the design of HTCondor there isn't a central place to define the order and "queue" of nodes like there is in Torque.
...