Open questions:
HPC Cluster
Could I have access to the HPC cluster? To learn Slurm.
ANSWER: https://chtc.cs.wisc.edu/hpc-overview Lauren will get my account set up.
How does CHTC keep shared directories (/tmp, /var/tmp, /dev/shm) clean with Slurm?
ANSWER: CHTC doesn't do any cleaning of shared directories. But the suggested looking at https://derekweitzel.com/2016/03/22/fedora-copr-slurm-per-job-tmp/ I don't know if this plugin will clean files created by an interactive ssh, but i suspect it won't because it is a slurm plugin and ssh'ing to the host is outside of the control of Slurm except for the pam_slurm_adopt that adds you to the cgroup. So I may still need a reaper script to keep these directories clean.
Answered Questions:
...
Answered Questions:
- JOB ID question from Daniel
When I submit a job, I get a job ID back. My plan is to hold onto that job ID permanently for tracking. We have had issues in the past with Torque/Maui because the job IDs got recycled later and our internal bookkeeping got mixed up. So my questions are:
- Are job IDs guaranteed to be unique in HTCondor?
- How unique are they—are they _globally_ unique or just unique within a particular namespace (such as our cluster or the submit node)?- A Job ID (ClusterID.ProcID)
- DNS name of the schedd and
When I submit a job, I get a job ID back. My plan is to hold onto that job ID permanently for tracking. We have had issues in the past with Torque/Maui because the job IDs got recycled later and our internal bookkeeping got mixed up. So my questions are:
- Are job IDs guaranteed to be unique in HTCondor?
- How unique are they—are they _globally_ unique or just unique within a particular namespace (such as our cluster or the submit node)?- A Job ID (ClusterID.ProcID)
- DNS name of the schedd and ctime of the job_queued.log file.
- It is unique to a schedd.
- We should talk with Daniel about this. They should craft their own ID. It could be seeded with a JobID but should not depend on just it.
- UpgradingHTCondor without killing jobs?
- schedd can be upgraded and restarted without loosing state assuming the restart is less than the timeout.
- currently restarting execute services will kill jobs. CHTC is working on improving this.
- negotiator and collector can be restarted without killing jobs.
- CHTC works hard to ensure 8.8.x is compatible with 8.8.y or 8.9.x is compatible with 8.9.y.
- Leaving data on execution host between jobs (data reuse)
- Todd is working on this now.
- Ask about installation of CASA locally and ancillary data (cfcache)
- CHTC has a Ceph filesystem that is available to many of their execution hosts (notibly the larger ones)
- There is another software filesystem where CASA could live that is more used for admin usage but might be available to us.
- We could download the tarball each time over HTTP. CHTC uses a proxy server so it would often be cached.
- Environment: Is there a way to have condor "login" when a job starts thus sourcing /etc/proflie and the user's rc files? Currently, not even $HOME is set.
- A good analogy is Torque does a su - _username_ while HTCondor just does a su _username_
- WORKAROUND: setting getenv = True which is like the -V option to qsub, may help. It doesn't source rc files but does inherit your current environment. This may be a problem if your current environment is not what you want on the cluster node. Perhaps the cluster node is a different OS or architecture.
- ANSWER: condor doesn't execute things with a shell. You could set your executable as /bin/bash and then have the arguments be the executable you used to have. I just changed our stuff to staticly set $HOME and I think that is good enough.
...
We can reproduce this without HTCondor. So this is either being caused by our mpicasa program or the openmpi libraries it uses. Even better, I can reproduce this with a simple shell script executed from two shells at the same time on the same host. Another MPI implementation (mvapich2) didn't show this problem.
#!/bin/sh
export PATH=/usr/lib64/openmpi/bin:{$PATH}
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:${LD_LIBRARY_PATH}
mpirun -np 2 /users/krowe/work/doc/all/openmpi/busy/busy
Array Jobs
Does HTCondor support array jobs like Slurm? For example in Slurm #SBATCH --array=0-3%2 or is one supposed to use queue options and DAGMan throttling?
ANSWER: HTCondor does reduce the priority of a user the more jobs they run so there may be less need of a maxjob or modulus option. But here are some other things to look into.
queue from seq 10 5 30 |
queue item in 1, 2, 3
combined cluster (Slurm and HTCondor)
Slurm starts and stops condor. CHTC does this because their HTCondor can preempt jobs. So when Slurm starts a job it kills the condor startd and any HTCondor jobs will get preempted and probably restarted somewhere else.
Node Priority
Is there a way to set an order to which nodes are picked first or a weight system? We want certain nodes to be chosen first because they are faster, or have less memory or other such criteria.
!/bin/sh
export PATH=/usr/lib64/openmpi/bin:{$PATH}
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:${LD_LIBRARY_PATH}
mpirun -np 2 /users/krowe/work/doc/all/openmpi/busy/busy
Array Jobs
Does HTCondor support array jobs like Slurm? For example in Slurm #SBATCH --array=0-3%2 or is one supposed to use queue options and DAGMan throttling?
ANSWER: HTCondor does reduce the priority of a user the more jobs they run so there may be less need of a maxjob or modulus option. But here are some other things to look into.
queue from seq 10 5 30 |
queue item in 1, 2, 3
combined cluster (Slurm and HTCondor)
Slurm starts and stops condor. CHTC does this because their HTCondor can preempt jobs. So when Slurm starts a job it kills the condor startd and any HTCondor jobs will get preempted and probably restarted somewhere else.
Node Priority
Is there a way to set an order to which nodes are picked first or a weight system? We want certain nodes to be chosen first because they are faster, or have less memory or other such criteria.
NEGOTIATOR_PRE_JOB_RANK on the negotiator
HPC Cluster
Could I have access to the HPC cluster? To learn Slurm.
ANSWER: https://chtc.cs.wisc.edu/hpc-overview I need to login to submit2 first but that's fine.
How does CHTC keep shared directories (/tmp, /var/tmp, /dev/shm) clean with Slurm?
ANSWER: CHTC doesn't do any cleaning of shared directories. But the suggested looking at https://derekweitzel.com/2016/03/22/fedora-copr-slurm-per-job-tmp/ I don't know if this plugin will clean files created by an interactive ssh, but i suspect it won't because it is a slurm plugin and ssh'ing to the host is outside of the control of Slurm except for the pam_slurm_adopt that adds you to the cgroup. So I may still need a reaper script to keep these directories clean. NEGOTIATOR_PRE_JOB_RANK on the negotiator