Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is a crazy idea but what about using checkpointing with SSA's workflow.  Right now they have a three-step process: download, process, upload.  all of which use lustre.  But what if we ran checkpointing after each step?  Would this allow the data to be downloaded directlyi to local storage instead of lustre, then processeed, then uploaded.  Now that I write it out, I don't see how this is much better than the current process of copying from archive to lustre to local to lustre to local to lustre.  Have to think about it more.

This checkpointing is kinda kind of a trick to get multiple jobs, actually checkpoints of one job, to run on the same host (something we wanted a while ago)

Let me see if I can explain what I think the process is for SSAs std_calibration which is a DAG

  1. fetch - Copies data from someplace (perhaps the archive) to local storage on nmpost node.
  2. Then DAG node ends and data is returned to lustre.
  3. envoy - Copies data from lustre to local storage and runs calibration.
  4. Then DAG node ends and data is returned to lustre.
  5. convey - Copies data from lustre to local storage and then delivers is someplace.

Though probably the best solution is to keep SSA from doing their unnecessary three-step process.

...

promote +commands to first class commands

...

Newer versions of HTCondor allow an admin to make custom commands (say NRAO_TRANSFER_FILES) into standard commands that no longer require the plus sign to use.

new htcondor command line

Todd Miller would like folks to test it.  The new command is htcondor  It works like aws or globus where it is htcondor <noun> <verb>.  It is available at CHTC but not NRAO yet.



condor_adstash

output classad history for things like elasticsearch DB

...