...
Interactive: The ability to assign all or part of a node to a user with shell level access (nodescheduler, qsub -I, etc), minimal granularity is per NUMA node, finer would be useful.
- nodescheduler: What is it that we like about nodescheduler over something like qsub -I or srun --pty bash or condor_submit -i
- It's not tied to any tty so a user can login multiple times from multiple places to their reserved node without requiring something like screen, tmux, or vnc. It also means that users aren't all going through nmpost-master.
- Its creation is asynchronous. If the cluster is full you don't wait around for your reservation to start, you get an email message when it is ready.
- It's time limited (e.g. two weeks). We might be able to do the same with a queue/partition setting but could we then extend that reservation?
- We get to define the shape of a reservation (whole node, NUMA node, etc). If we just let people use qsub -I they could reserve all sorts of sizes which may be less efficient. Then again it may be more efficient. But either way I think nodescheduler it is simpler for our users.
- It's not tied to any tty so a user can login multiple times from multiple places to their reserved node without requiring something like screen, tmux, or vnc. It also means that users aren't all going through nmpost-master.
- Slurm
- srun --pty bash This logs the user into an interactive shell on a node with defaults (1 core, 1 GB memory)
- I don't see how Slurm can reserve NUMA nodes so we will have to just reserve X tasks with Y memory.
- I don't know how to keep Slurm from giving a user multiple portions of the same host. With Moab I used naccesspolicy=uniqueuser This prevents the ambiguity of which ssh connection goes to which cgroup.
Can
- srun --pty bash This logs the user into an interactive shell on a node with defaults (1 core, 1 GB memory)
- HTCondor
- even do this?
- condor_submit -interactive can be shortened to just condor_submit -ii This logs the user into an interactive shell on a node with defaults (1 core equivelent, 0.5 GB memory)
- Could run a sleep job just like we do with Torque and use condor_ssh_to_job which seems to do X11 properly. We would probably want to make gygax part of the nmpost pool.
- nodevnc
- Given the limitation of Slurm and HTCondor and that we already recommend users use VNC on their interactive nodes, why don't we just provide a nodevnc script that reserves a node (via torque, slurm or HTCondor), start a vnc server and then tells the user it is ready and how to connect to it? If someone still needs/wants just simple terminal access, then qsub -I or srun --pty bash or condor_submit -i might suffice.
- Given the limitation of Slurm and HTCondor and that we already recommend users use VNC on their interactive nodes, why don't we just provide a nodevnc script that reserves a node (via torque, slurm or HTCondor), start a vnc server and then tells the user it is ready and how to connect to it? If someone still needs/wants just simple terminal access, then qsub -I or srun --pty bash or condor_submit -i might suffice.
- nodescheduler: What is it that we like about nodescheduler over something like qsub -I or srun --pty bash or condor_submit -i
...