Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ItemWhoNotes
HERA hardwareJames

DONE: herastore01

  • Done: herastore01b Needs firmware
  • Done: herastore01c Needs firmware
  • Done: herastore01d 127013 racked, disked, powered, SASed, firmwared, formatted, mount.

herastore02 129289

  • herastore02 racked, powered, OSed.  Needs /opt. CIS borrowing for NGAS firmware upgrades.
  • 02a racked, disked, firmwared, powered, SASed.  Needs format, mount.
  • Done: 02b racked, firmwared.  Haven't purchased disks yet.
  • Done: 02c racked, firmwared.  Haven't purchased disks yet.
  • Done: 02d racked, firmwared.  Haven't purchased disks yet.

Done: aoc253k-pdu-1 has critical alamrs 132028.  During the power outage they replaced the PDU with the spare.

aocoss13 130466 racked, booted. Needs Lustre.  Stolen to repair aocoss04.

More HERA nodesjrobnett, krowe
  • Done: new herapost-master and make old herapost-master a compute node.
  • Done: new IB card/cable for new herapost-master 132576
  • Done: Buy an IB switch for HERA racks.  $13,300 133166
  • Connect switch to fabric.  Requires some re-arranging of ports.  133166
  • Cards/cables req: 182337, 182338.  Install in new nodes.
  • Boot three 2U nodes with 24 cores each with GPU kits but no GPUs for now
nmngasjrobnett, krowe
  • nmngas{01..04}c racked, firmwared, powerd, SASd.  Needs format, mount.
  • nmngas{01..04}c-mirror racked, firmwared, powerd, SASd.  Needs format, mount.
  • Done: Ticket 114896 sadly didn't mention formatting or mounting volumes so it was closed.
  • krowe submitted ticket 134766 to format and mount the new volumes.
Order test GPUSjrobnett

Need to order test GPUs against 114412506.6432

  • req: 182816 approved by dhalstea Oct. 12, 2021
Understand MDS loadjrobnettWhy is the MDS load so high
Track down shadow exceptionskrowe

Why do some htcondor data transfers trigger a shadow exception

  • Done: One reason is MaxStatups in sshd_config. We set it to 10:30:60 which means it refuses unathenticated connections after 10 at a rate of 30% with a hard limit of 60.  Perhaps set it to 30:30:100

...