RADIAL Support

Operating System

Must support CASA
Will need a patching/updating mechanism
How to boot diskless OS images
- I am not finding any new sexy software packages to automate PXE+DHCP+TFTP+NFS https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_storage_devices/setting-up-a-remote-diskless-system_managing-storage-devices
- One OS image for both our use and locals use or multiple OS images?
- Use containers (docker, singularity/apptainer, kubernetes, mesos, etc)?
- Ask Greg at CHTC what they use
What Linux distrobution to use?
- Can we use Red Hat with our current license? I have looked in JDE and I can't find a recent subscription. Need to ask David.
- Should we buy Red Hat licenses like we did for USNO?
  - USNO is between $10K and $15K per year for 81 licensed nodes. This may not be an EDU license.
  - NRAO used to have a 1,000 host license for Red Hat but I don't know what they have now.
- Do we even want to use Red Hat?
  - Alternatives would be Rocky Linux or AlmaLinux since CentOS is essentially dead
What version do we use RHEL7 or RHEL8?

Third party software for VLASS

CASA
HTCondor
Will need a way to maintain the software
- stow, rpm, modules, containers?

Third party software for Local

Will need a way to maintain software for the local site

Services

DNS
- What DNS domain will these hosts be in? nrao.edu? local.site? other?
DHCP
SMTP
NTP
NFS
LDAP? How do we handle accounts? I think we will want accounts on at least the head node. The execution nodes could run everything as nobody or as real users. If we want real users on the execute hosts then we should use a directory service which should probably be LDAP. No sense in teaching folks how to use NIS anymore.
- Local accounts only?
ssh
rsync (nraorsync_plugin.py)
NAT so the nodes can download/upload data
TFTP (for OSes and switch)
condor (port 9618) https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToMixFirewallsAndHtCondor
ganglia
nagios

Management Access

PDU
UPS
BMC/IPMI
switch

Maintenance

replace disk (local admin)
replace/reseat DIMM (local admin)
replace power supply (local admin)
NRAO may handle replacement hardware. Drop ship. Spare ourselves?
Patching OS images (NRAO)
Patching third party software like CASA and HTCondor (NRAO)
Altering OS images (NRAO)

Hardware

Cabinet Rack: Doors front and rear locking with mesh. Width: 19". Height: 42U is most common. Depth: 42" is most common. Rack must support at least 2,000 lbs static load
- https://greatcabinets.com/product/es-ms/ This is what we usually get
- https://www.apc.com/shop/us/en/products/APC-NetShelter-SX-Server-Rack-Enclosure-42U-Black-1991H-x-600W-x-1070D-mm/P-AR3100 APC NetShelter SX
- https://www.apc.com/shop/us/en/products/APC-NetShelter-SX-Server-Rack-Enclosure-42U-Shock-Packaging-2000-lbs-Black-1991H-x-600W-x-1070D-mm/P-AR3100SP APC NetShelter SX designed for re-shipping after equipped.
PDU: one PDU or two PDUs? What plug? What voltage? This may very across sites. What if the site has two power sources?
- https://www.apc.com/shop/us/en/products/APC-Rack-PDU-2G-switched-0U-30A-100V-to-120V-24-NEMA-5-20R-sockets/P-AP8932 24 NEMA 5-20R, 120V, 2.8kW, NEMA L5-30P 1Phase
- https://www.apc.com/shop/us/en/products/APC-Rack-PDU-9000-switched-0U-8-6kW-208V-21-C13-and-C15-3-C19-and-C21-sockets/P-APDU9965 21 C13/C15 and 3 C19/C21, 208V, 8.6kW, NEMA L21-30P 3Phase
- https://www.apc.com/shop/us/en/products/APC-Rack-PDU-9000-switched-0U-17-3kW-208V-42-C13-and-C15-6-C19-and-C21-sockets/P-APDU9967 42 C13/C15 and 6 C19/C21, 208V, 17kW, IEC60309 60A 3P+PE 3Phase
- Stagger startups on PDU
UPS: for just the head node and switch? This may depend on the voltage of the PDUs.
Switch: 10Gb/s.
Environmental Monitoring: Could the PDU do this?
KVM: rackmount, not remote, and patch cables
Ethernet cables:
Power cables: single or Y calbes depending on number of PDUs and two power sources.
Head Node: lots of disk. Do the locals have access to this disk space? Maybe not.
- iDRAC: Ask CIS what they recommend
- Memory: at least 32GB of RAM to help cache the OS image. 64GB would be even better.
- Storage mdRAID, ZFS, Btrfs, RAID card? Do we want both boot and data arrays to be the same type?
  - OS/OSimages, RAID1 with or without spare? (3 disks), about 1TB
    - OS: We have been making 40GB partitions for / for over a decade and that looks to still work with RHEL8.
    - Swap: 0 or 8GB at most
    - /export/home/<hostnmae>: services and diskless_images
  - Working data/software (local and nrao), RAID6 w/spare or RAID7 (9 disks), about 72TB
    - An SE imaging input data size is about 10GB per job
    - We need maybe 20TB+ of total space or more so maybe 60TB/2
    - Carve into two partitions (NRAO data and NRAO software, Local data and Local software) each partition has data and software directories.
- Networking: May need more than one port. One for internal networking to nodes and one for external Internet access.
30 1U nodes or 15 2U nodes or mix? NVMe drives for nodes. Swap drive?
- NVMe for scratch and swap
GPUs: Do we get GPUs? Do we get 1U nodes with room for 1 or 2 Tesla T4 GPUs or 2U nodes with room for 1 or 2 regular GPU?

Networking

NRAO side

Submit host needs to be able to establish a connection to the remote head node on port 9618 (HTCondor)
Submit host needs to be able to listen for a connection from the remote head node on port 9618 (HTCondor)
- mcilroy has external IPs (146.88.1.66 for 1Gb/s and 146.88.10.66 for 10Gb/s). Is the container listening?
NRAO needs to be able to establish a connection to the remote head node on port 22 (ssh)

Remote side

Head node establish on port 9618 to nrao.edu. (HTCondor)
Head node listens on port 9618 from nrao.edu. (HTCondor)
Execute node establish on port 9618 to nrao.edu. Execute host be NATed. (HTCondor if flocking)
Execute node establish on port 22 to gibson.aoc.nrao.edu. Execute host can be NATed. (nraorsync if flocking)
Head node listens on port 22 from nrao.edu (ssh)
Head node establish on port 25 to revere.aoc.nrao.edu (mail)

Using

Get NRAO jobs on the remote racks. This may depend on how we want to use these remote racks. If we want them to do specific types of jobs then ClassAd options may be the solution. If we want them as overflow for jobs run at NRAO then flocking may be the solution. Perhaps we want both flocking and ClassAd options. Actually flocking may be the best method because I think it doesn't require the execute nodes to have external network access.
- Flocking? What are the networking requirements?
- Classad options? I think this will require the execute hosts to have routable IPs because our submit host will talk directly to them and vice-versa. Could CCB help here?
- Other?
Remote HTCondor concerns
- Do we want our jobs to run a an NRAO user like vlapipe or nobody?
- Do we want local jobs to run as the local user, some dedicated user, or nobody?Remote HTCondor concerns
Need to support 50% workload for NRAP and 50% workload for local. How?
- Could have 15 nodes for us and 15 nodes for them
- What if we do nothing? HTCondor's fair-share algorithm may do the work for us if all our jobs are run as user vlapipe or something like that.
- Use RANK, and therefore preemption. https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigPrioritiesForUsers
- Group Accounting
- User Priority https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToSetUserPriority
Share disk space on head node 50% NRAO and 50% local
- Two partitions: one for NRAO and one for local?

Documentation

A projectbook like we did for USNO could be appropriate
Process diagrams (how systems boot, how jobs get started from NRAO and run, how locals start jobs, etc)

Shipping

Drop ship everything to the site and assemble on site. This will require an NRAO person on site to assemble with a pre-built OS disk for the head node.
Ship everything here and assemble then ship a rack-on-pallet
Mix the two. Ship minimal stuff here (head node, switch, couple of nodes, etc) and configure and drop ship most of the nodes to the site.
A person from the remote site could travel to NM or CV to see the test system and get instruction.

Data Path

This is conceptual at this point.

We pre-stage data on head node
We then either submit a job locally and it flocks to the remote site or we login to the remote site and submit there.
- Can we use a nifty filesystem to simplify this (Ceph or that LHC fs)?
- This might be a good phase2 problem to solve.
- Is this kinda what nraorsync does?
The remote execute hosts transfer data from the remote head node
The job uploads resulting data to the head node
We retrieve data from the head node

Other

Keep each rack as similar to the other racks as possible.
Test system at NRAO should be one of everything.

Since we are making our own little OSG, should we try to leverage OSG for this or not? Or do we want to make each POD a pool and flock?

Should we try to buy as much as we can from one vendor like Dell to simplify things?

APC sells a packaged rack on a pallet ready for shipping. We could fill this with gear and ship it. Not sure if that is a good idea or not. We will not be able to move the unit into the server room while still on the pallet because no doorway is tall enough. We would have to roll it off the pallet (it comes with a ramp and the rack is on casters) move it into the server room, fill and configure it, roll it out of the server room, roll it back onto the pallet, probably remove the bottom server(s) so we can attach it to the pallet, then re-add the bottom server(s). We could use the double glass doors for this but there is a lip on the transition. We could use the doors in the PRA closet as it has no lip but would require a lot of moving of shelves and stuff.

APC NetShelter SX packaged:
- On Pallet: Height 85.79in (2179mm) Width 43.5in (1105mm)
- On Casters: Height 78.39in 1991mm) Width 23.62in (600mm)
Double Glass doors: Height: 80in (2032mm) (because of the 2in maglock)
NRAO-NM wide server doors: Height: 83in (2133mm) Width: 48in (1187mm)

I could start prototyping now using AWS.

Do we want jobs to flock or do we want to submit jobs on the remote host and have pre-transfered data? Involve SSA and VLASS in this question.

If jobs are submitted from the remote host does that mean SSA will want a container on that remote host?

Site Questions

Voltage in server room (120V or 208V or 240V)
Receptacles in server room (L5-30R or L21-30R or ...)
Single or dual power feeds?
Is power from below or from above?
Door width and height and path to server room.
- Can a rack-on-pallet fit upright? Height: 85.79inches (2179mm) Width: 43.5inches (1105mm)
- Can a rack-on-casters fit upright? Height: 78.39inches (1991mm) Width: 23.62inches (600mm)
- NRAO-NM wide server door Height: 84inches (2108mm) Width: 46.75inches (1219mm)
Firewalls
How are you going to use this?
Do you care if this is in your DNS zone or ours?

Resources

USNO correlator (Mark Wainright)
VLBA Control Computers (William Colburn)
Red Hat maintenance (William Colburn)
Virtual kickstart (William Colburn)
Switch models and ethernet (Jeff Long)
HTCondor best practices (Greg Thain)
OSG (Lauren Michael)
SDSC at UCSD
TACC at UT Austin
- https://www.tacc.utexas.edu/about/directory
IDIA https://www.idia.ac.za/

Timeline

Buy test system as soon as practical (assuming the project is still a go)
- Does Jeff know if this is a go or not
- Talk to Matthew about where to put this stuff
Buy by July
Receive by Aug
Install by Dec
Running in Jan. 2023

Space shortcuts

Page tree