Operating System
- Must support CASA
- Will need a patching/updating mechanism
- Try to have one OS that supports both our use and local use?
- Or they could dual boot?
- Or kubernetes?
Third party software for VLASS
- CASA
- HTCondor
- Slurm?
- Will need a way to maintain the software
Third party software for Local
- Will need a way to maintain software for the local site
Services
- DNS
- DHCP
- SMTP
- NTP
- NFS?
- LDAP? How do we handle accounts?
- ssh
- rsync (nraorsync_plugin.py)
Management Access
- PDU
- UPS
- BMC/IPMI
- switch
Maintenance
- replace disk
- replace/reseat DIMM
- replace power supply
- NRAO may handle replacement hardware. Drop ship. Spare ourselves?
Hardware
- GPUs? Do we get 1U nodes with room for 1 or 2 Tesla T4 GPUs or 2U nodes with room for 1 or 2 regular GPU?
- 1Gb/s might be enough. 10Gb/s if price is good.
- 30 1U nodes or 15 2U nodes or mix?
- head node with lots of disk
- test head node at NRAO (either CV or NM)
- one PDU or two PDUs? What plug? What voltage? This may very across sites.
- UPS for just the head node and switch?
- environmental monitoring Could the PDU do this?
- rackmount KVM (not remote) and patch cables
- NVMe drives for nodes
- Cabinet Rack Doors front and rear locking with mesh. Width: 19". Height: 42U is most common. Depth: 42" or 48"? Rack must support at least 2,000 lbs static load
- https://greatcabinets.com/product/es-ms/
- https://www.apc.com/shop/us/en/products/APC-NetShelter-SX-Server-Rack-Enclosure-42U-Black-1991H-x-600W-x-1070D-mm/P-AR3100
- https://www.apc.com/shop/us/en/products/APC-NetShelter-SX-Server-Rack-Enclosure-42U-Shock-Packaging-2000-lbs-Black-1991H-x-600W-x-1070D-mm/P-AR3100SP
Other
Since we are making our own little OSG, should we try to leverage OSG for this or not? Or do we want to make each POD a pool and flock?
How do we handle the 50% workload?
Should we try to buy as much as we can from one vendor like Dell to simplify things?