...
- A projectbook like we did for USNO could be appropriate
- Process diagrams (how systems boot, how jobs get started from NRAO and run, how remote institutions start jobs, etc)
Networking
NRAO side
- NRAO -> remote head node on port 22 (ssh)
- Submit Host -> remote head node (condor_collector) Submit host needs to be able to establish a connection to the remote head node on port 9618 (HTCondor) for flocking
- Submit host needs to be able to listen for a connection from the Host <- remote head node (condor_negotiator) on port 9618 (HTCondor) for flocking
- mcilroy has external IPs (146.88.1.66 for 1Gb/s and 146.88.10.66 for 10Gb/s). Is the container listening?
- Submit Host <- remote execute hosts (condor_starter) on port 9618 (HTCondor) for flocking
Remote side
- Head node establish on port 9618 to <- from nrao.edu . on port 22 (HTCondorssh)
- Head node listens on port 9618 from nrao.edu. (HTCondor)
- Execute node establish on port 9618 to nrao.edu. Execute host be NATed. (HTCondor if flocking)
- -> revere.aoc.nrao.edu on port 25 (smtp)
- Head node -> NRAO Submit Host on port 9618 (HTCondor) for flocking
- Head node <- NRAO Submit Host on port 9618 (HTCondor) for flocking
- Execute node -> NRAO Submit Host on port 9618 (HTCondor) for flocking. Execute host may be NATed.
- Execute node -> Execute node establish on port 22 to gibson.aoc.nrao.edu on port 22 (ssh) for flocking with nraorsync. Execute host can be NATed. (nraorsync if flocking)Head node listens on port 22 from nrao.edu (ssh)Head node establish on port 25 to revere.aoc.nrao.edu (mail)
Services
- DNS
- What DNS domain will these hosts be in? nrao.edu? remote-institution.site? other?
- DHCP
- SMTP
- NTP
- NFS
- LDAP? How do we handle accounts? I think we will want accounts on at least the head node. The execution nodes could run everything as nobody or as real users. If we want real users on the execute hosts then we should use a directory service which should probably be LDAP. No sense in teaching folks how to use NIS anymore.
- remote institution accounts only?
- ssh
- rsync (nraorsync_plugin.py)
- NAT so the nodes can download/upload data
- TFTP (for OSes and switch)
- condor (port 9618) https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToMixFirewallsAndHtCondor
- ganglia
- nagios
...