...
- Buy test system as soon as practical (assuming the project is still a go)
- Does Jeff Kern know if this is a go or not
- Talk to Matthew about where to put this stuff
- May 3, 2022 krowe: talked to Matthew. He will consult with Peter and get back to me. I am thinking 253T.
- Ask Jeff Long to spec a switch
- Buy by July
- Receive by Aug
- Install by Dec
- Running in Jan. 2023
Documentation
- A projectbook like we did for USNO could be appropriate
- Process diagrams (how systems boot, how jobs get started from NRAO and run, how locals start jobs, etc)
Data Path
This is conceptual at this point.
...
- Get NRAO jobs on the remote racks. This may depend on how we want to use these remote racks. If we want them to do specific types of jobs then ClassAd options may be the solution. If we want them as overflow for jobs run at NRAO then flocking may be the solution. Perhaps we want both flocking and ClassAd options. Actually flocking may be the best method because I think it doesn't require the execute nodes to have external network access.
- Flocking? What are the networking requirements?
- Classad options? I think this will require the execute hosts to have routable IPs because our submit host will talk directly to them and vice-versa. Could CCB help here?
- Other?
- Remote HTCondor concerns
- Do we want our jobs to run a an NRAO user like vlapipe or nobody?
- Do we want local jobs to run as the local user, some dedicated user, or nobody?Remote HTCondor concerns
- Need to support 50% workload for NRAP and 50% workload for local. How?
- Could have 15 nodes for us and 15 nodes for them
- What if we do nothing? HTCondor's fair-share algorithm may do the work for us if all our jobs are run as user vlapipe or something like that.
- Use RANK, and therefore preemption. https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigPrioritiesForUsers
- Group Accounting
- User Priority https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToSetUserPriority
- Share disk space on head node 50% NRAO and 50% local
- Two partitions: one for NRAO and one for local?
Documentation
- A projectbook like we did for USNO could be appropriate
- Process diagrams (how systems boot, how jobs get started from NRAO and run, how locals start jobs, etc)
Networking
NRAO side
- Submit host needs to be able to establish a connection to the remote head node on port 9618 (HTCondor)
- Submit host needs to be able to listen for a connection from the remote head node on port 9618 (HTCondor)
- mcilroy has external IPs (146.88.1.66 for 1Gb/s and 146.88.10.66 for 10Gb/s). Is the container listening?
- NRAO needs to be able to establish a connection to the remote head node on port 22 (ssh)
...