You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

condor_submit -i job.htc

To run a job interactively for debugging.  I've used this to get interactive resources but never to debug jobs.


Access Points and Execute Points

They are using "Access Point" where I use "Submit Host"

Also "Execute Point" where I use "Execute Host"

I will update my slides.


transfer_output_remaps

transfer_output_remaps = "count.Dracula.txt=output/count.Dracula.txt"


condor_tail

See stdout and stderr files of a job.  Only works if should_transfer_files = YES

condor_tail JobID

condor_tail -stderr JobID


GPUs

9.0 has trouble handling different models of GPUs.  Later versions (9.8+) are needed for a cluster of hetero GPUs.  See John Knoeller's talk


condor_submit -spool

Can SSA use this with their containers?  Have I already looked at this?



checkpointing

This is a crazy idea but what about using checkpointing with SSA's workflow.  Right now they have a three-step process: download, process, upload.  all of which use lustre.  But what if we ran checkpointing after each step?  Would this allow the data to be downloaded directlyi to local storage instead of lustre, then processeed, then uploaded.  Now that I write it out, I don't see how this is much better than the current process of copying from archive to lustre to local to lustre to local to lustre.  Have to think about it more.

This checkpointing is kinda a trick to get multiple jobs, actually checkpoints of one job, to run on the same host (something we wanted a while ago)

Though probably the best solution is to keep SSA from doing their unnecessary three-step process.


rrsync

from Rafi Rubin  

For security, rsync has a script in the src tree "rrsync" you use that in authorized_keys to restrict what rsync can do over ssh. I usually recommend single purpose keypairs for that.  You can also just use a standing rsyncd.


GlideinWMS

Is this better than my cheesy factory/pilot scripts?


Job Sets

Inroduced in HTCondor-9.4


Extended submit commands

promote +commands to first class commands





  • No labels