Operating System
- Must support CASA
- Will need a patching/updating mechanism
- Try to have one OS that supports both our use and local use?
- Or they could dual boot?
- Or kubernetes?
Third party software for VLASS
- CASA
- HTCondor
- Slurm?
- Will need a way to maintain the software
Third party software for Local
- Will need a way to maintain software for the local site
Services
- DNS
- DHCP
- SMTP
- NTP
- NFS?
- LDAP? How do we handle accounts?
- ssh
- rsync (nraorsync_plugin.py)
Management Access
- PDU
- UPS
- BMC/IPMI
- switch
Maintenance
- replace disk
- replace/reseat DIMM
- replace power supply
- NRAO may handle replacement hardware. Drop ship. Spare ourselves?
Hardware
- GPUs? Do we get 1U nodes with room for 1 or 2 Tesla T4 GPUs or 2U nodes with room for 1 or 2 regular GPU?
- 1Gb/s might be enough. 10Gb/s if price is good.
- 30 1U nodes or 15 2U nodes or mix?
- head node with lots of disk
- test head node at NRAO (either CV or NM)
- one PDU or two PDUs?
- UPS for just the head node and switch?
- environmental monitoring Could the PDU do this?
- rackmount KVM (not remote) and patch cables
- NVMe drives for nodes
Other
Since we are making our own little OSG, should we try to leverage OSG for this or not? Or do we want to make each POD a pool and flock?
How do we handle the 50% workload?