Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

03 Aug

Goals

  • Don't die

Discussion items

ItemWhoNotes
HERA hardwareJames

herastore01

  • herastore01b Needs firmware
  • herastore01c Needs firmware
  • herastore01d 127013 racked, disked
.  Needs power
  • ,
SAS
  • powered,
firmware?
  • SASed,
format
  • firmwared, formatted.  Needs mount.

herastore02 and four shelves 129289 129289

  • herastore02 racked, powered, OSed.  Needs /opt. CIS borrowing for NGAS firmware upgrades.
  • 02a racked, disked, firmwared.  Needs power, SAS, firmware, format, mount.
  • Done: 02b racked, firmwaredhavenHaven't purchased disks yet.  Needs firmware.herastore02
  • Done: 02c racked, firmwaredNeeds power, OS.02c racked. havenHaven't purchased disks yet.  Needs firmware.
  • Done: 02d racked, firmwared. haven  Haven't purchased disks yet.  Needs firmware.

Done: aoc253k-pdu-1 has critical alamrs 132028.  During the power outage they replaced the PDU with the spare.

aocoss13 130466 racked, booted. Needs Lustre.  Stolen to repair aocoss04.

Lustre project quotasjrobnett

Lustre project quotas are still not quite right. 130900

krowe isn't sure Leo's new scripts are doing the right thing.  It looks to me like the script (/lustre/aoc/admin/bin/set_quota_lustre.sh) is setting user quotas for users, sciops, and observers.

VLASS RAM swapkroweRestore VLASS memory in nodes at NMT  131614

More HERA nodesjrobnett, krowe
  • Done: new herapost-master and make old herapost-master a compute node.
  • Done: new IB card/cable for new herapost-master 132576
  • Done: Buy an IB switch for HERA racks.  $13,300 133166 Switch req: 182292
  • Connect switch to fabric.  Requires some re-arranging of ports.  133166
  • Cards/cables req: 182337, 182338.  Install in new nodes.
  • Boot three Three 2U nodes with 24 cores each with GPU kits but no GPUs for now
nmngasjrobnett, krowe114896
  • nmngas{01..04}c racked, firmwared, powerd, SASd.  Needs firmware, power, SAS, format, mount.
  • nmngas{01..04}c-mirror still in box.  Needs racked, firmwarefirmwared, disks, power, SAS, powerd, SASd.  Needs format, mount.
master nodesjrobnett, krowe122408 upgrade testpost-master and nmpost-master.  testpost-master is done.  nmpost-master scheduled for Sep. 15, 2021. 
nmgnas updatejrobnettNGAS replacement
  • Done: Ticket 114896 sadly didn't mention formatting or mounting volumes so it was closed.
  • krowe submitted ticket 134766 to format and mount the new volumes.
VLASS memory usagejrobnettWe need to investigate memory usage for VLASS SE imaging jobs.
Order test GPUSjrobnettNeed to order test GPUs against 114412506.6432HTCondor requirementskroweWe need to set requirements for cluster nodes.  SSA wants to run on NMT machines.  Is requirements = (HasLustre =!= True) really the best way to do that?  How about two axis (HasLustre and Partition == VLASS)?

Jira
serverDMS JIRA
columnskey,summary,updated,assignee,priority,status
maximumIssues20
jqlQueryfilter=12536
serverIdeb2e750b-a83a-387e-8345-36eee8a98f01

Action items

CAS-13342