Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

https://htcondor.readthedocs.io/en/main/getting-htcondor/from-our-repositories.html

Output to two places

Some of our pipeline jobs don't set shoud_transfer_files=YES because they need to transfer some output to an area for Analysts to look at and a some other output (may be a subset) to a different area for the User to look at.  Is there a condor way to do this?  transfer_output_remaps?

ANSWER: Greg doesn't think there is a Condor way to do this.  Could make a copy of the subset and use transfer_output_rempas on the copy but that is a bit of a hack.

Pelican?

Felipe is playing with it and we will probably want it at NRAO.

ANSWER: Greg will ask around.


Virtual memory vs RSS

Looks like condor is reporting RSS but that may actually be virtual memory.  At least according to Felipe's tests.

...

Setup /usr/bin/mail on mcilroy so that it works.  Condor will use this to send mail to root when it encounters an error.  Need to submit jira ticket to SSA. (krowe)

...


Resubmitting Jobs

We have had many NMT VLASS nodes crash since we upgraded to RHEL8.  I think the nodes were busy when they crashed.  So I changed our SLOT_TYPE_1 from 100% to 95%.  Is this a good idea?

ANSWER: try using RESERVED_MEMORY=4096 (units are in Megabytes) instead of SLOT_TYPE_1=95% and put SLOT_TYPE_1=100% again.

Resubmitting Jobs

I have an example in 

I have an example in 

/lustre/aoc/cluster/pipeline/vlass_prod/spool/se_continuum_/lustre/aoc/cluster/pipeline/vlass_prod/spool/se_continuum_imaging/VLASS2.1_T10t30.J194602-033000_P161384v1_2020_08_15T01_21_14.433

...

STARTD_ATTRS =  NRAOGLIDEIN



Output to two places

Some of our pipeline jobs don't set shoud_transfer_files=YES because they need to transfer some output to an area for Analysts to look at and a some other output (may be a subset) to a different area for the User to look at.  Is there a condor way to do this?  transfer_output_remaps?

ANSWER: Greg doesn't think there is a Condor way to do this.  Could make a copy of the subset and use transfer_output_rempas on the copy but that is a bit of a hack.


Pelican?

Felipe is playing with it and we will probably want it at NRAO.

ANSWER: Greg will ask around.


RHEL8 Crashing

We have had many NMT VLASS nodes crash since we upgraded to RHEL8.  I think the nodes were busy when they crashed.  So I changed our SLOT_TYPE_1 from 100% to 95%.  Is this a good idea?

ANSWER: try using RESERVED_MEMORY=4096 (units are in Megabytes) instead of SLOT_TYPE_1=95% and put SLOT_TYPE_1=100% again.




...