Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Open questions:

  • How can you tell which job is associated with an email given the email message doesn't include a working dir or the assigned batch_name?
    • CHTC will look into adding such information to the email condor sends.
  • Bug where James's jobs are all put on the same core.  Here is top -u krowe showing the Last Used Cpu (SMP) after I submitted five sleep jobs to the same host.
    • Is this just a side effect of condor using cpuacct instead of cpuset in cgroup?
    • Is this a failure of the Linux kernel to schedule things on separate cores?
    • Is this because cpu.shares is set to 100 instead of 1024?
    • Check if CPU affinity is set in /proc/self/status
    • Is sleep cpu-intensive enough to properly test this?  Perhaps submit a while 1 loop instead?

...

  • Torque has this command called pbsnodes that can not only offline/drain a node but keeps a note about it that all can see in one place.  I know I can use condor_off to drain a node but is there a central place keep notes so I can remember a month later why I set a certain node to drain?
    • ANSWER: there is no place to keep such notes but Greg likes the idea and may look into it.
    • May want to use condor_drain instead of condor_off.  condor_off will kill the startd when all jobs finish and it no longer shows up in condor_status.  condor_drain will leave the node in condor_status.
    • condor_drain doesn't work for me because it immediatly sets jobs idle instead of letting them run to completion.  This is why I use condor_off -startd -peaceful instead.


  • How can you tell which job is associated with an email given the email message doesn't include a working dir or the assigned batch_name?
    • CHTC will look into adding such information to the email condor sends.