Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Why does na-arc-3 have such poor network performance to the other na-arc nodes?
    • ping na-arc-[1,2,4,5] with anything larger than -s 1490 drops all packets
    • iperf tests show 10Gb/s between the VM host of na-arc-3 (naasc-vs-3 p5p1.120) and the VM host of na-arc-5 (naasc-vs-5 p2p1.120).  So it isn't a bad card in either of the VM hosts.
    • iptables on na-arc-3 looks different than iptables on na-arc-[2,3,5].  na-arc-1 also looks a bit different.
    • docker_gwbridge interface on na-arc-[1,2,4,5] shows NO_CARRIER but not on na-arc-3.
    • na-arc-3 has a veth10fd1da@if37 interface.  None of the other na-arc-* nodes have a veth interface.
  • Why is na-arc-5 using qdisc pfifo_fast instead of qdisc_fq_codel for eth0? (see ip addr)
  • Is putting all the 1Gb/s production docker swarm nodes on the same ASIC on the same Fabric Extender of the cv-nexus switch a good idea?
    • I am thinking it does not matter because it looks like the production docker swarm nodes use the 10Gb/s network which is on cv-nexus9k
  • Why does natest-arc-3 have ens3 instead of eth0 and why is its speed 100Mb/s?
    • virsh domiflist natest-arc-3 shows the Model as rtl8139 instead of virtio
    • When I run ethtool eth0 on nar-arc-{1..5} natest-arc-{1..2} as root, the result is just Link detected: yes instead of the full report with speed while na-arc-3 shows 100Mb/s.
  • Can we set up a test archive query that uses the "other" docker swarm which in this case would be the production swarm (na-arc-*)?
  • Is putting the production swarm nodes (na-arc-*) on the 10Gb/s network a good idea?  Sure it makes a fast connection to cvsan but it adds one more hop to the nangas servers (e.g. na-arc-1 -> cv-nexus9k -> cv-nexus -> nangas14)
  • Why are there VLANs on the VM hosts.  e.g. em1.97 on naasc-vs-4?
    • 2022-08-12 dhart: If you want all of your guest VMs to be on the same subnet as the VM host, then VLAN awareness isn't needed.  However, in most cases we want the flexibility of being able to have VM guests on different networks (from one another and/or the VM host) so the VM host is configured with a trunk interface to the network to allow for any VLAN to be passed to the underlying VM guests housed on that VM host machine

    • 2022-08-12 dhart: 10.2.97.x (and 10.2.96.x) = internal VLAN for servers (primarily) 10.2.99.x = internal VLAN for server management 10.2.120.x = internal VLAN for 10 GE connections

To Do

  1. Done: Recreate na-arc-3 so it gets the same performance as other na-arc-* nodes which is apparently at least 10Gb/s. (pmurphy)
    1. 2022-08-11: cloned na-arc-2 and moved the clone to naasc-vs-3 (zbutcher)
    2. 2022-08-11: moved old na-arc-3 to na-arc-3-OLD (thalstea)
    3. 2022-08-11: Renamed the clone to na-arc-3.  We connected it to the swarm successfully, but it had a low connection speed.
    4. 2022-08-11: Changed the model of  na-arc-3's vnet5 interface on naasc-vs-3 from rtl8139 to virtio to match all the other na-arc-* nodes.  Performance was still poor.
    5. 2022-08-11: Changed the MTU of na-arc-3 eth0 to 1500.  This is different than all the other na-arc-* nodes but it was either that or change the p5p1.120 and br97 on naasc-vs-3 from 9000 to 1500 which my have impacted other VM guests on that host.  Performance was now reasonable.  7Gb/s.  I was expecting about 9Gb/s but perhaps the 1500 MTU is affecting performance.
    6. 2022-08-11: Joined na-arc-3 to the swarm and started services (sbooth)
  2.  Done: Launch services on production swarm (sbooth)
    1. 2022-08-11: Joined na-arc-3 to the swarm and started services (sbooth)
  3. Test the production docker swarm with a test web interface. (lsharp)
  4. ask other ARC if they use MTU 9000 on 10Gb. (krowe)
    1. JAO uses MTU of 1500
    2. ESO uses two VM hosts running VMware with 10Gb/s and MTU of 1500
  5. Switch the production docker swarm back to MTU 1500 since the test docker swarm uses MTU 1500 and is performing better?
  6. Fix natest-arc-3 so it's NIC Model is virtio instead of rtl8139
  7. Upgrade production swarm to meet ALMA requirements (16-core, 32GB)

...