You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 103 Next »

Operating System

  • Must support CASA
  • Will need a patching/updating mechanism
  • How to boot diskless OS images
  • What Linux distrobution to use?
    • Can we use Red Hat with our current license?  I have looked in JDE and I can't find a recent subscription.  Need to ask David.
    • Should we buy Red Hat licenses like we did for USNO?
      • USNO is between $10K and $15K per year for 81 licensed nodes.  This may not be an EDU license.
      • NRAO used to have a 1,000 host license for Red Hat but I don't know what they have now.
    • Do we even want to use Red Hat?
      • Alternatives would be Rocky Linux or AlmaLinux since CentOS is essentially dead

Third party software for VLASS

  • CASA
  • HTCondor
  • Will need a way to maintain the software
    • stow, rpm, modules, containers?

Third party software for Local

  • Will need a way to maintain software for the local site

Services

  • DNS
  • DHCP
  • SMTP
  • NTP
  • NFS?
  • LDAP?  How do we handle accounts?  I think we will want accounts on at least the head node.  The execution nodes could run everything as nobody or as real users.  If we want real users on the execute hosts then we should use a directory service which should probably be LDAP.  No sense in teaching folks how to use NIS anymore.
    • Local accounts only?
  • ssh
  • rsync (nraorsync_plugin.py)
  • NAT so the nodes can download/upload data
  • TFTP
  • condor (port 9618) https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToMixFirewallsAndHtCondor

Management Access

  • PDU
  • UPS
  • BMC/IPMI
  • switch

Maintenance

  • replace disk (local admin)
  • replace/reseat DIMM (local admin)
  • replace power supply (local admin)
  • NRAO may handle replacement hardware. Drop ship. Spare ourselves?
  • Patching OS images (NRAO)
  • Patching third party software like CASA and HTCondor (NRAO)
  • Altering OS images (NRAO)

Hardware


Using

  • Get NRAO jobs on the remote racks.  This may depend on how we want to use these remote racks. If we want them to do specific types of jobs then ClassAd options may be the solution. If we want them as overflow for jobs run at NRAO then flocking may be the solution. Perhaps we want both flocking and ClassAd options.  Actually flocking may be the best method because I think it doesn't require the execute nodes to have external network access.
    • Flocking?  What are the networking requirements?
    • Classad options?  I think this will require the execute hosts to have routable IPs because our submit host will talk directly to them and vice-versa.  Could CCB help here?
    • Other?
  • Remote HTCondor concerns
    • Do we want our jobs to run a an NRAO user like vlapipe or nobody?
    • Do we want local jobs to run as the local user, some dedicated user, or nobody?Remote HTCondor concerns
  • Need to support 50% workload for NRAP and 50% workload for local.  How?
  • Share disk space on head node 50% NRAO and 50% local
    • Two partitions: one for NRAO and one for local?

Documentation

  • A projectbook like we did for USNO could be appropriate
  • Process diagrams (how systems boot, how jobs get started from NRAO and run, how locals start jobs, etc)


Other

  • Keep each rack as similar to the other racks as possible.
  • Test system at NRAO should be one of everything.

Since we are making our own little OSG, should we try to leverage OSG for this or not?  Or do we want to make each POD a pool and flock?

Should we try to buy as much as we can from one vendor like Dell to simplify things?

APC sells a packaged rack on a pallet ready for shipping.  We could fill this with gear and ship it.  Not sure if that is a good idea or not.


  • Double Glass doors: Height: 80in (2032mm) (because of the 2in maglock)
  • NRAO-NM wide server door Height: 83in (2133mm) Width: 48in (1187mm)



Site Questions

  • Voltage in server room (120V or 208V or 240V)
  • Receptacles in server room (L5-30R or L21-30R or ...)
  • Single or dual power feeds?
  • Is power from below or from above?
  • Door width and height and path to server room.
    • Can a rack-on-pallet fit upright?  Height: 85.79inches (2179mm) Width: 43.5inches (1105mm)
    • Can a rack-on-casters fit upright?  Height: 78.39inches (1991mm) Width: 23.62inches (600mm)
    • NRAO-NM wide server door Height: 84inches (2108mm) Width: 46.75inches (1219mm)
  • Firewalls
  • How are you going to use this?


Resources

  • USNO correlator (Mark Wainright)
  • VLBA Control Computers (William Colburn)
  • Red Hat maintenance (William Colburn)
  • Virtual kickstart (William Colburn)
  • Switch models and ethernet (Jeff Long)
  • HTCondor best practices (Greg Thain)
  • OSG (Lauren Michael)
  • SDSC at UCSD
  • TACC at UT Austin
  • IDIA https://www.idia.ac.za/


Timeline

  • June buy
  • Receive by July
  • Pay by Aug
  • Install by Dec
  • Running in Jan. 2023


  • No labels