Glidein to Slurm

I have an idea how to make one OS image that can be used for Torque, HTCondor, and Slurm such that we can have HTCondor jobs glidein to the Slurm cluster. First we employ /etc/sysconfig/condor to cause HTCondor to fail to start if CONDOR_CONFIG points to a non-existing file, or succeed to start if CONDOR_CONFIG points to an existing file or isn't set at all. Then we change the LOCAL_CONFIG_FILE in /etc/condor/condor_config to point to something in /var which can be modified locally on the diskless host. This allows the Slurm pilot job to create a config file in /var with the DAEMON_SHUTDOWN rules. This gives us an OS that can run glidein jobs or glide jobs in or whatever the terminology is.

Now a special pilot job can be submitted to the Slurm cluster that causes an HTCondor startd, by running /usr/sbin/condor_master -f, to run and therefore make the node into an HTCondor execution host. This works because the OS is configured as an execution host for HTCondor as well as Slurm (and Torque probably) even though it doesn't start HTCondor on boot. This way when the pilot job starts condor_master, the node announces itself as an execution host to the central manager. HTCondor jobs can now run on this "glidein" node or whatever and when there are no more HTCondor jobs to run, the startd will exit then the master will exit then the Slurm pilot job will exit, doing some cleanup first, and the node goes back to being a Slurm node.

CONDOR_CONFIG

The condor_startd reads the CONDOR_CONFIG environment variable if it exists, to find its config file instead of the default /etc/condor/condor_config and exits with an error if there is a problem reading that file.

https://htcondor.readthedocs.io/en/latest/admin-manual/introduction-to-configuration.html?highlight=condor_config#ordered-evaluation-to-set-the-configuration

DAEMON_SHUTDOWN

The condor_startd daemon will shutdown gracefully and not be restarted if the ClassAd STARTD.DAEMON_SHUTDOWN evlauates to True. E.g.

STARTD.DAEMON_SHUTDONW = size(ChildState) == 0 && size(ChildActivity) == 0 && (MyCurrentTime - EnteredCurrentActivity) > 600'
MASTER.DAEMON_SHUTDOWN = STARTD_StartTime == 0

https://htcondor.readthedocs.io/en/latest/admin-manual/configuration-macros.html

https://htcondor.readthedocs.io/en/latest/classad-attributes/machine-classad-attributes.html

sysconfig

The condor.service unit in systemd reads /etc/sysconfig/condor but does not evaluate it. So adding something like the following to /etc/sysconfig/condor won't work, besides this would cause HTCondor to fail if that file didn't exist and that isn't what I want.

CONDOR_CONFIG=$(cat /var/run/condor/config)

But I can use this to keep HTCondor from starting, just like I do with Torque and Slurm. I can set CONDOR_CONFIG=/dontstartcondor in /etc/syconfig/condor in the OS image and override it with a snapshot. Then stop setting 99-nrao as a snapshot.

OS image

All three schedulers (Torque, slurm, condor) will be configured to start via systemd. The file pbs_mom, slurm, and condor in /etc/sysconfig will be set such that all of these schedulers will fail to start on boot.

/etc/sysconfig/pbs_mom: PBS_ARGS="-h"
/etc/sysconfig/slurmd: SLURMD_OPTIONS="-h"
/etc/sysconfig/condor: CONDOR_CONFIG=/etc/condor/condor_off

Where /etc/condor/condor_off is a copy of /etc/condor/condor_config with LOCAL_CONFIG_DIR commented out and START_MASTER = False added.

If any of these schedulers are wanted to start on boot, the appropriate /etc/sysconfig file (pbs_mom, slurm, condor) will be altered via a snapshot.

/etc/sysconfig/pbs_mom: PBS_ARGS=""
/etc/sysconfig/slurmd: SLURMD_OPTIONS="--conf-server testpost-serv-1"
/etc/sysconfig/condor: CONDOR_CONFIG=/etc/condor/condor_config

Change the LOCAL_CONFIG_FILE in HTCondor to a file that will contain the configurations needed for a Slurm node to run an HTCondor Pilot job (e.g. STARTD.DAEMON_SHUTDOWN). This file will be created by the Pilot job.

echo 'LOCAL_CONFIG_FILE = /var/run/condor/condor_config.local' >> /etc/condor/condor_config

The alternative was to make a complete copy of condor_config and all its sub-config files into an /etc/condor/glidein-slurm.conf and add the DAEMON_SHUTDOWN ad as well. This seems dangerous to me as now those two config files can drift.

Pilot Job

The Pilot job submitted to Slurm. This will start condor because unlike the systemd unit file, calling condor_master manually doesn't check /etc/sysconfig/condor

echo 'CONDOR_CONFIG=/etc/condor/glidein-slurm.conf' > /var/run/condor/config
echo 'STARTD.DAEMON_SHUTDOWN = size(ChildState) == 0 && size(ChildActivity) == 0 && (MyCurrentTime - EnteredCurrentActivity) > 600' > /var/run/condor/condor_config.local
echo 'MASTER.DAEMON_SHUTDOWN = STARTD_StartTime == 0' >> /var/run/condor/condor_config.local
/usr/sbin/condor_master -f
rm -f /var/run/condor/condor_config.local
rm -f /var/run/condor/config
exit

Factory

The factory process that watches the clusters and launches Pilot jobs should be pretty simple cron job

PILOT_JOB=/lustre/aoc/admin/tmp/krowe/pilot.sh
idle_condor_jobs=$(condor_q -global -allusers -constraint 'JobStatus == 1' -format "%d\n" 'ServerTime - QDate' | sort -nr | head -1)
#krowe Jul 21 2021: when there are no jobs, condor_q -global returns 'All queues are empty'. Let's reset that.
if [ "${idle_condor_jobs}" = "All queues are empty" ] ; then
    idle_condor_jobs=""
fi

# Is there at least one free node in Slurm?
free_slurm_nodes=$(sinfo --states=idle --Format=nodehost --noheader)
# launch one pilot job
if [ -n "${idle_condor_jobs}" ] ; then
    if [ -n "${free_slurm_nodes}" ] ; then
        if [ -f "${PILOT_JOB}" ] ; then
            sbatch --quiet ${PILOT_JOB}
        fi
    fi
fi

Space shortcuts

Page tree