Operating System
- Must support CASA
- Will need a patching/updating mechanism
- Try to have one OS that supports both our use and local use?
- Or they could dual boot?
- Or kubernetes?
Third party software for VLASS
- CASA
- HTCondor
- Slurm?
- Will need a way to maintain the software
Third party software for Local
- Will need a way to maintain software for the local site
Services
- DNS
- DHCP
- SMTP
- NTP
- NFS?
- LDAP? How do we handle accounts?
- ssh
- rsync (nraorsync_plugin.py)
Management Access
- PDU
- UPS
- BMC/IPMI
- switch
Maintenance
- replace disk
- replace/reseat DIMM
- replace power supply
- NRAO may handle replacement hardware. Drop ship. Spare ourselves?
Hardware
- Cabinet Rack: Doors front and rear locking with mesh. Width: 19". Height: 42U is most common. Depth: 42" or 48"? Rack must support at least 2,000 lbs static load
- https://greatcabinets.com/product/es-ms/
- https://www.apc.com/shop/us/en/products/APC-NetShelter-SX-Server-Rack-Enclosure-42U-Black-1991H-x-600W-x-1070D-mm/P-AR3100
- https://www.apc.com/shop/us/en/products/APC-NetShelter-SX-Server-Rack-Enclosure-42U-Shock-Packaging-2000-lbs-Black-1991H-x-600W-x-1070D-mm/P-AR3100SP
- PDU: one PDU or two PDUs? What plug? What voltage? This may very across sites.
- UPS: for just the head node and switch?
- rackmount KVM (not remote) and patch cables
- environmental monitoring Could the PDU do this?
- head node with lots of disk
- test head node at NRAO (either CV or NM)
- Switch: 1Gb/s might be enough. 10Gb/s if price is good.
- GPUs? Do we get 1U nodes with room for 1 or 2 Tesla T4 GPUs or 2U nodes with room for 1 or 2 regular GPU?
- 30 1U nodes or 15 2U nodes or mix?
- NVMe drives for nodes
Other
Since we are making our own little OSG, should we try to leverage OSG for this or not? Or do we want to make each POD a pool and flock?
How do we handle the 50% workload?
Should we try to buy as much as we can from one vendor like Dell to simplify things?