Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

I have an idea how to make one OS image that can be used for Torque, HTCondor, and Slurm such that we can have HTCondor jobs glidein to the Slurm cluster.  First we employ /etc/sysconfig/condor to cause HTCondor to fail to start if .  If this file sets CONDOR_CONFIG points to a non-existing file, or succeed to start if config file that sets START_MASTER = False, then HTCondor will not start.  If it sets CONDOR_CONFIG points to an existing file to /etc/condor/condor_config or isn't set at all, then HTCondor will start normallyThen Next we change the LOCAL_CONFIG_FILE in /etc/condor/condor_config to point to something in /var/run/condor/condor_config.local which can be modified locally on the diskless host.  This allows the Slurm pilot job to create a this config file in /var with the DAEMON_SHUTDOWN rules.  This gives us an OS that can run glidein jobs or glide jobs in or whatever the terminology is.

Now a special pilot job can be submitted to the Slurm cluster that causes starts an HTCondor startd, by running /usr/sbin/condor_master -f, to run and therefore make makes the node into an HTCondor execution host.  This works because the OS is configured as an execution host for HTCondor as well as Slurm (and Torque probably) even though it doesn't start HTCondor on boot.  This way when the pilot job starts condor_master which starts condor_startd, the node announces itself as an execution host to the central manager.  HTCondor jobs can now run on this "glidein" node or whatever and when When there are no more HTCondor jobs to run, the startd will exit then the master will exit, then the Slurm pilot job will exit, doing do some cleanup firstand exit, and the node goes will go back to being just a Slurm node.


CONDOR_CONFIG

...