Done: aoc253k-pdu-1 has critical alamrs 132028. During the power outage they replaced the PDU with the spare.
aocoss13 130466 racked, booted. Needs Lustre. Stolen to repair aocoss04.
More HERA nodes
jrobnett, krowe
Done: new herapost-master and make old herapost-master a compute node.
Done: new IB card/cable for new herapost-master 132576
Done: Buy an IB switch for HERA racks. $13,300 133166
Connect switch to fabric. Requires some re-arranging of ports. 133166
Cards/cables req: 182337, 182338. Install in new nodes.
Boot three 2U nodes with 24 cores each with GPU kits but no GPUs for now
nmngas
jrobnett, krowe
nmngas{01..04}c racked, firmwared, powerd, SASd. Needs format, mount.
nmngas{01..04}c-mirror racked, firmwared, powerd, SASd. Needs format, mount.
Done: Ticket 114896 sadly didn't mention formatting or mounting volumes so it was closed.
krowe submitted ticket 134766 to format and mount the new volumes.
Order test GPUS
jrobnett
Need to order test GPUs against 114412506.6432
req: 182816 approved by dhalstea Oct. 12, 2021
Understand MDS load
jrobnett
Why is the MDS load so high
Track down shadow exceptions
krowe
Why do some htcondor data transfers trigger a shadow exception
One reason is MaxStatups in sshd_config. We set it to 10:30:60 which means it refuses unathenticated connections after 10 at a rate of 30% with a hard limit of 60.