Item | Who | Notes |
---|
HERA hardware | James | herastore02 129289 135532 - herastore02 racked, powered, OSed, sotrcli, cards moved. Needs /opt. CIS borrowing for NGAS firmware upgrades.working storcli. probably a new ticket.
- 02a racked, disked, firmwared, powered, SASed. Needs format, mount.
- Done: 02b racked, firmwared. Haven't purchased disks yet.
- Done: 02c racked, firmwared. Haven't purchased disks yet.
- Done: 02d racked, firmwared. Haven't purchased disks yet.
Done: aoc253k-pdu-1 has critical alamrs 132028. During the power outage they replaced the PDU with the spareaocoss13 130466 racked, booted. Needs Lustre. Stolen to repair aocoss04. More HERA nodes | jrobnett, krowe | - Done: Connect IB switch to fabric. Requires some re-arranging of ports. 133166
- Done: Cards/cables req: 182337, 182338. Install in new nodes.
- Done: Boot three 2U nodes with 24 cores each with GPU kits but no GPUs for now
|
nmngas | jrobnett, krowe | - nmngas{01..04}c racked, firmwared, powerd, SASd. Needs format, mount.
- nmngas{01..04}c-mirror racked, firmwared, powerd, SASd. Needs format, mount.
- Ticket 134766 to format and mount the new volumes.
|
Order test GPUS | jrobnett | Need to order test GPUs against 114412506.6432 - req: 182816 approved by dhalstea Oct. 12, 2021
#krowe Oct 25 2021: PO: 374056, $3,505.00, Tesla T4 #krowe Oct 25 2021: PO: 374060, $2,899.00, RTX A5000 #krowe Oct 25 2021: PO: 374065, $1,382.00, RTX A4000
|
Understand MDS load | jrobnett | Why is the MDS load so high |
Glideins | krowe | 135553 Port RHEL-7.8.1.5 to CV | Track down shadow exceptions | krowe | Why do some htcondor data transfers trigger a shadow exception Done: One reason is MaxStatups in sshd_config. We set it to 10:30:60 which means it refuses unathenticated connections after 10 at a rate of 30% with a hard limit of 60. Perhaps set it to 30:30:100fmadsen to confirm
|