Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

What happens to running jobs if the submit host reboots?  Shadow processes?  What if the submithost is replaced with a new server?  I think we have shown there is a 2400 second (40 minute) timeout.


Transfer Plugin Upload

I have added my nraorsync_plugin.py to /usr/libexec/condor and added the following to the execution host configuration:

FILETRANSFER_PLUGINS = $(LIBEXEC)/nraorsync_plugin.py, $(FILETRANSFER_PLUGINS)

I have am working on a transfer plugin that uses rsync and I ran into a situation that confounds me.  I have a job the following job:

#!/bin/sh

mkdir newdir

date > newdir/date

/bin/sleep ${1}

...

and the following submit file:

executable = small.sh
arguments = "27"
output = stdout.$(ClusterId).log
error = stderr.$(ClusterId).log
log = condor.$(ClusterId).log

should_transfer_files = YES
transfer_input_files = /users/krowe/.ssh/condor_transfer
transfer_output_files = newdir

...


output_destination = nraorsync://$ENV(PWD)
+WantIOProxy = True

queue

The in the submit file, the resulting input file that is fed to my plugin when the plugin is called with the -upload argument is (.nraorsync_plugin.in) contains this:

[ LocalFileName = "/lustre/aoc/admin/tmp/condor/testpost003/execute/dir_29453/_condor_stderr"; Url = "nraorsync:///lustre/aoc/sciops/krowe/plugin/_condor_stderr" ][ LocalFileName = "/lustre/aoc/admin/tmp/condor/testpost003/execute/dir_29453/_condor_stdout"; Url = "nraorsync:///lustre/aoc/sciops/krowe/plugin/_condor_stdout" ][ LocalFileName = "/lustre/aoc/admin/tmp/condor/testpost003/execute/dir_29453/newdir/date"; Url = "nraorsync:///lustre/aoc/sciops/krowe/plugin/newdir/date" ]

...