Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

CONDOR_CONFIG=$(cat /var/run/condor/config)

I could instead add a second EnvironmentFile like so

EnvironmentFile=-/etc/sysconfig/condor
EnvironmentFile=-/var/run/condor/config

where /var/run/condor/config sets CONDOR_CONFIG=/etc/condor/condor_config

But I can use this to keep HTCondor from starting, just like I do with Torque and Slurm. I can set CONDOR_CONFIG=/dontstartcondor in /etc/syconfig/condor in the OS image and override it with a snapshot.  Then stop setting 99-nrao as a snapshot.

...

/etc/sysconfig/pbs_mom: PBS_ARGS="-h"
/etc/sysconfig/slurm: SLURMD_OPTIONS="-h"
/etc/sysconfig/condor: CONDOR_CONFIG=/nosuchfiledontstartcondor

If any of these schedulers are wanted to start on boot, the appropriate /etc/sysconfig file (pbs_mom, slurm, condor) will be altered via a snapshot.

...

The Pilot job submitted to Slurm.  This will start condor because unlike the systemd unit file, calling condor_master manually doesn't check /etc/sysconfig/condor

echo 'CONDOR_CONFIG=/etc/condor/glidein-slurm.conf' > /var/run/condor/config

echo 'STARTD.DAEMON_SHUTDOWN = size(ChildState) == 0 && size(ChildActivity) == 0 && (MyCurrentTime - EnteredCurrentActivity) > 600' > /var/run/condor/condor_config.local

echo 'MASTER.DAEMON_SHUTDOWN = STARTD_StartTime == 0' >> /var/run/condor/condor_config.local

/usr/sbin/condor_master -f

rm -f /var/run/condor/condor_config.local

rm -f /var/run/condor/config

exit

...

PILOT_JOB=/lustre/aoc/admin/tmp/krowe/pilot.sh

idle_condor_jobs=$(condor_q -global -allusers -constraint 'JobStatus == 1' -format "%d\n" 'ServerTime - QDate' | sort -nr | head -1)

#krowe Jul 21 2021: when there are no jobs, condor_q -global returns 'All queues are empty'. Let's reset that.

if [ "${idle_condor_jobs}" = "All queues are empty" ] ; then
    idle_condor_jobs=""
fi


# Is there at least one free node in Slurm?
free_slurm_nodes=$(sinfo --states=idle --Format=nodehost --noheader)

# launch one pilot job
if [ -n "${idle_condor_jobs}" ] ; then
    if [ -n "${free_slurm_nodes}" ] ; then
        if [ -f "${PILOT_JOB}" ] ; then
            sbatch --quiet ${PILOT_JOB}
        fi
    fi
fi

Problems

Ideas

...