Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Reaper: Clean nodes of unwanted files, dirs and procs.  I don't think HTCondor will need this.

    • Slurm - Needs a reaper script to delete files/dirs and kill processes.

      • If I run vncserver via Torque, my reaper script has to kill a bunch of processes when the job is done. But when I run vncserver via Slurm, those processes remain. So we will need some sort of reaper-type script for Slurm.
      • There is https://slurm.schedmd.com/pam_slurm_adopt.html that tracks and kills errant processes but it conflicts with systemd and therefore requires some special installation instructions.
        • Aha.  I may have it working.  You have to add PrologFlags=contain to both the client and server slurm.conf files.
        • But it doesn't delete files or directories from /tmp, /var/tmp, or /dev/shm when the job ends.
        • I will have to write a reaper script for files/dirs to use in Slurm system epilogs.
        • Reading up on how pam_slurm_adopt works, it will probably never cooperate with systemd and therefore it is a hack and not future-proof.  https://github.com/systemd/systemd/issues/13535  I am unsure how wise it is to start using pam_slurm_adopt in the first place.
      • If we aren't going to use pam_slurm_adopt.so then reaper will need to kill procs and delete files/dirs just like it does with Torque/Moab.
      • Feb. 22, 2021 krowe: I think I have a working reaper script for slurm. (/users/krowe/reaper/slurm/slurm_reaper.py)  It needs more testing.
    • HTCondor - Doesn't seem to need file/dirs nor proc reaped.
      • Seems to handle /tmp, /var/tmp, and /dev/shm properly because it uses fake versions of these dirs for each job.
      • It seems to handle errant processes as well.
      • There is also condor_preen that cleans condor directories like /var/lib/condor/spool/...

...