Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

007 (4149.000.000) 08/05 09:22:31 Shadow exception!
Error from slot1_4@testpost003.aoc.nrao.edu: Repeated attempts to transfer output failed for unknown reasons
0 - Run Bytes Sent By Job
1007 - Run Bytes Received By Job

If I create a file like 'outputfile' instead of 'newdir' and transfer that, everything works fine.


Shadow jobs and Lustre

We had some jobs get restarted because they lost contact with their shadow jobs.  I assume this is because the shadow jobs keep the condor.log file open and if that file is on Lustre and Lustre goes down then the shadow job fails to communicate with the job and the job gets killed.   Does that seem accurate to you?

...