Table of Contents | ||
---|---|---|
|
...
Current Questions
output_destination and stdout/stderr
It used to be that once you set output_destination = someplugin:// then that plugin was responsible for transferring all files even stdout and stderr. That no longer seems to be the case as of version 23. My nraorsync transfer plugin has code in it looking for _condor_stdout and _condor_stderr as arguments but never sees them with version 23. The stdout and stderr files are copied back to the submit directory instead of letting my plugin transfer them.
This is a change. I am not sure if it affects us adversely or not but can we unchange this?
ANSWER: from Greg "After some archeology, it turns out that the change so that a file transfer plugin requesting to transfer the whole sandbox no longer sees stdout/stderr is intentional, and was asked for by several users. The current workaround is to explicitly list the plugin in the stdout/stderr lines of the submit file, e.g."
output = nraorsync://some_location/stdout
error = nraorsync://some_location/stderr
This seems like it should work but my plugin produces errors. Probably my fault.
tokens and collector.locate
...
I was going to test this on CHTC but I can't seem get an interactive job on CHTC anymore.
DONE: send greg error ouptut and security config
transfer_output_files change in version 23
My silly nraorsync transfer plugin relies on the user setting transfer_output_files = .job.ad in the submit description file to trigger the transfer of files. Then my nraorsync plugin takes over and looks at +nrao_output_files for the files to copy. But with version 23, this no longer works. I am guessing someone decided that internal files like .job.ad, .machine.ad, _condor_stdout, and _condor_stderr will no longer be tranferrable via trasnfer_output_files. Is that right? If so, I think I can work around it. Just wanted to know.
getnenv
Did it change since 10.0? Can we still use getenv in DAGs or regular jobs?
#krowe Nov 5 2024: getenv no longer includes your entire environment as of version 10.7 or so. But instead it only includes the environment varialbes you list with the "ENV GET" syntax in the .dag file.
ANSWER: the starter has an exclude list and .job.ad is probably in it and maybe it is being access sooner or later than before. Greg will see if there is a better, first-class way to trigger transfers.
DONE: We will use condor_transfer since it needs to be there anyway.https://git.ligo.org/groups/computing/-/epics/30
Installing version 23
I am looking at upgrading from version 10 to 23 LTS. I noticed that y'all have a repo RPM to install condor but it installs the Feature Release only. It doens't provide repos to install the LTS.
https://htcondor.readthedocs.io/en/main/getting-htcondor/from-our-repositories.html
ANSWER: Greg will find it and get back to me.
DONE: https://research.cs.wisc.edu/htcondor/repo/23.0/el8/x86_64/release/
Virtual memory vs RSS
Looks like condor is reporting RSS but that may actually be virtual memory. At least according to Felipe's tests.
ANSWER: Access to the cgroup information on the nmpost cluster is good because condor is running as root and condor reports the RSS accurately. But on systems using glidein like PATh and OSG they may not have appropriate access to the cgroup so memory reporting on these clusters may be different thatn memory reporting on the nmpost cluster. On glide-in jobs condor reports the virtual memory accross across all the processes in the job.
...
ANSWER: try using RESERVED_MEMORY=4096 (units are in Megabytes) instead of SLOT_TYPE_1=95% and put SLOT_TYPE_1=100% again.
getnenv
Did it change since 10.0? Can we still use getenv in DAGs or regular jobs?
#krowe Nov 5 2024: getenv no longer includes your entire environment as of version 10.7 or so. But instead it only includes the environment variables you list with the "ENV GET" syntax in the .dag file.
https://git.ligo.org/groups/computing/-/epics/30
ANSWER: Yes this is true. CHTC would like users to stop using getenv=true. There may be a knob to restore the old behavior.
DONE: check out docs and remove getenv=true
...