You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Date

Goals

Discussion items

ItemWhoNotes
HERA hardwareJames

herastore02 129289

  • herastore02 racked, powered, OSed.  Needs /opt. CIS borrowing for NGAS firmware upgrades.
  • 02a racked, disked, firmwared, powered, SASed.  Needs format, mount.
  • Done: 02b racked, firmwared.  Haven't purchased disks yet.
  • Done: 02c racked, firmwared.  Haven't purchased disks yet.
  • Done: 02d racked, firmwared.  Haven't purchased disks yet.

Done: aoc253k-pdu-1 has critical alamrs 132028.  During the power outage they replaced the PDU with the spare.

aocoss13 130466 racked, booted. Needs Lustre.  Stolen to repair aocoss04.

More HERA nodesjrobnett, krowe
  • Done: Connect IB switch to fabric.  Requires some re-arranging of ports.  133166
  • Done: Cards/cables req: 182337, 182338.  Install in new nodes.
  • Done: Boot three 2U nodes with 24 cores each with GPU kits but no GPUs for now
nmngasjrobnett, krowe
  • nmngas{01..04}c racked, firmwared, powerd, SASd.  Needs format, mount.
  • nmngas{01..04}c-mirror racked, firmwared, powerd, SASd.  Needs format, mount.
  • Ticket 134766 to format and mount the new volumes.
Order test GPUSjrobnett

Need to order test GPUs against 114412506.6432

  • req: 182816 approved by dhalstea Oct. 12, 2021
Understand MDS loadjrobnettWhy is the MDS load so high
Track down shadow exceptionskrowe

Why do some htcondor data transfers trigger a shadow exception

  • Done: One reason is MaxStatups in sshd_config. We set it to 10:30:60 which means it refuses unathenticated connections after 10 at a rate of 30% with a hard limit of 60.  Perhaps set it to 30:30:100
  • fmadsen to confirm

Key Summary Updated Assignee P Status
Loading...
Refresh

Action items

CAS-13342


  • No labels