Item | Who | Notes |
---|
HERA hardware | James | herastore02 129289 - herastore02 racked, powered, OSed. Needs /opt. CIS borrowing for NGAS firmware upgrades.
- 02a racked, disked, firmwared, powered, SASed. Needs format, mount.
- Done: 02b racked, firmwared. Haven't purchased disks yet.
- Done: 02c racked, firmwared. Haven't purchased disks yet.
- Done: 02d racked, firmwared. Haven't purchased disks yet.
Done: aoc253k-pdu-1 has critical alamrs 132028. During the power outage they replaced the PDU with the spare. aocoss13 130466 racked, booted. Needs Lustre. Stolen to repair aocoss04. |
More HERA nodes | jrobnett, krowe | - Done: Connect IB switch to fabric. Requires some re-arranging of ports. 133166
- Done: Cards/cables req: 182337, 182338. Install in new nodes.
- Done: Boot three 2U nodes with 24 cores each with GPU kits but no GPUs for now
|
nmngas | jrobnett, krowe | - nmngas{01..04}c racked, firmwared, powerd, SASd. Needs format, mount.
- nmngas{01..04}c-mirror racked, firmwared, powerd, SASd. Needs format, mount.
- Ticket 134766 to format and mount the new volumes.
|
Order test GPUS | jrobnett | Need to order test GPUs against 114412506.6432 - req: 182816 approved by dhalstea Oct. 12, 2021
|
Understand MDS load | jrobnett | Why is the MDS load so high |
Track down shadow exceptions | krowe | Why do some htcondor data transfers trigger a shadow exception - Done: One reason is MaxStatups in sshd_config. We set it to 10:30:60 which means it refuses unathenticated connections after 10 at a rate of 30% with a hard limit of 60. Perhaps set it to 30:30:100
- fmadsen to confirm
|