Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Using my Cluster Translation Table at https://staff.nrao.edu/wiki/bin/view/NM/ClusterCommands here is what I suggest for Slurm.  Notible things missing are Slurm differences: Srm doesn't provide user-level prologue/epilogue scripts, Slurm can't set a umask of a job, Slurm exports all environment variables to the job by default

[/usr/bin/sudo, -u, almapipe, /usr/bin/sbatch, -p, batch,-N, 1, -n, 1, --mem=18G, -t 12-00:00:00, --export=ALL,CAPSULE_CACHE_DIR=~/.capsule-vatest,CAPO_PROFILE=vatest, --export=ALL, -D, /lustre/naasc/web/almapipe/pipeline/vatest/tmp/ArchiveWorkflowStartupTask_runAlmaBasicRestoreWorkflow_4276995994868118298/, --mail-type=FAIL, --mail-user=jgoldste,dlyons,jsheckar, -J, PrepareWorkingDirectoryJob.vatest.86b484f2-dfda-4f51-ad71-c808066441de, -o, /lustre/naasc/web/almapipe/pipeline/vatest/tmp/ArchiveWorkflowStartupTask_runAlmaBasicRestoreWorkflow_4276995994868118298/PrepareWorkingDirectoryJob.out.txt, -e, /lustre/naasc/web/almapipe/pipeline/vatest/tmp/ArchiveWorkflowStartupTask_runAlmaBasicRestoreWorkflow_4276995994868118298/PrepareWorkingDirectoryJob.err.txt, /lustre/naasc/web/almapipe/workflows/vatest/bin/job-runner.sh, 18 -c edu.nrao.archive.workflow.jobs.PrepareWorkingDirectoryJob -p vatest -w /lustre/naasc/web/almapipe/pipeline/vatest/tmp/ArchiveWorkflowStartupTask_runAlmaBasicRestoreWorkflow_4276995994868118298]

...



Replacement options for Torque/Moab (Pros and Cons)



TorqueOpenPBSSlurmHTCondor
Working directoryYes both -d and -wNo -d nor -w to set working directoryYes -D
Passed argsYes -FNo. At lease what the man page reads doesn't work for me.Yes
Prolog/EpilogYesNo user-level prolog/epilog scripts.No user-level prolog/epilog scripts.
Array jobsYesYesYesUses DAGs instead of array jobs
Complex queuesCan handle vlass/vlasstest queuesCan handle vlass/vlasstest queuesCan handle vlass/vlasstest queues but they are partitions not queues. Should be fine.Uses requirements instead of queues but should be sufficient
ReservationsYesReservations work differently but may still be useful. Version 2021.1 may do this better.YesNo way to reserve nodes for maintenance or special occasions.
AuthorizationYes. PAM moduleNo PAM module. The MoM can kill processes not running a job and not owned by up to 10 special users.Has a PAM module similar to Torque
Remote JobsMaybe with Nodus but I was unimpressedPresumably with Altair Control
Yes to CHTC, OSG, AWS
cgroupsYes with cpusetYes both cpuset and cpuacctYes with cpusetYes with cpuacct
Multiple Submit HostsYesYesYesYes
Pack jobsYesYesYesYes
Multi-node MPIYesYesYesYes but needs the Parallel Universe
Preemption
Yes but can be disabledYes but can be disabledYes but can be disabled
nodeschedulerYes because of cgreg and uniqueuserNoNoNo
nodevncYes
YesYes but is buggy
Cleans Up files and processesNo. Will require a reaper scriptNo. Will require a reaper scriptNo. Will require a reaper scriptYes
Node orderYes. The nodefile defines order
Not really a way to set the order in which the scheduler will give out nodesNot really a way to set the order in which the scheduler will give out nodes


Skip to end of metadata


...