Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Set ethtool -K em1 gro off perminantly on naasc-vs-4 and document it.  How do we do this?
  • Strawman proposal for reassigning VM guests
  • krowe to make tickets for solutions
  • 2022-10-05 krowe: Change the NIC Model on natest-arc-3.  It is currently rtl8139 instead of virtio and is its speed 100Mb/s instead of 1000Mb/s.
    • You can see this with virsh domiflist natest-arc-3 on naasc-vs-5.
    • 2022-10-05 krowe: This should be fixed but after the test swarm is no longer acting as the production swarm.
  • 2022-10-14 krowe: See if the sfctool utility can help helpful with either TCP Retransmissions or dropped packets.


Answers

  • Why does iperf show 10Gb/s between na-arc-5 and na-arc-[1,2,4]?  How is this possible if the default interface on the respective VM Hosts is 1Gb/s?
    • ANSWER: The vnets for the VM guests are tied to the 10Gb/s NICs on the VM hosts not the 1Gb/s NICs.
  • Why do natest-arc-{1..3} have 9 veth* interfaces in ip addr show while na-arc-{1..5} don't have any veth* interfaces?
    • Each container creates a veth* interface.
  • Why does na-arc-3 have such poor network performance to the other na-arc nodes?
    • ping na-arc-[1,2,4,5] with anything larger than -s 1490 drops all packets
    • iperf tests show 10Gb/s between the VM host of na-arc-3 (naasc-vs-3 p5p1.120) and the VM host of na-arc-5 (naasc-vs-5 p2p1.120).  So it isn't a bad card in either of the VM hosts.
    • iptables on na-arc-3 looks different than iptables on na-arc-[2,3,5].  na-arc-1 also looks a bit different.
    • docker_gwbridge interface on na-arc-[1,2,4,5] shows NO_CARRIER but not on na-arc-3.
    • na-arc-3 has a veth10fd1da@if37 interface.  None of the other na-arc-* nodes have a veth interface.
    • Production docker swarm iperf tests measured in Gb/s.


      na-arc-1

      (naasc-vs-4)

      na-arc-2

      (naasc-vs-4)

      na-arc-3

      (naasc-vs-3)

      na-arc-4

      (naasc-vs-4)

      na-arc-5

      (naasc-vs-5)

      na-arc-1
      180.0022010

      na-arc-2

      20
      0.0022010
      na-arc-30.0020.002
      0.0020.002
      na-arc-420190.002

      na-arc-510100.0021010

      There is clearly something wrong with na-arc-3

    • ANSWER: Since there were so many problems with na-arc-3, it was decided to recreate it.  It was recreated from a clone of na-arc-2.
  • Is putting all the 1Gb/s production docker swarm nodes on the same ASIC on the same Fabric Extender of the cv-nexus switch a good idea?
    • I am thinking it does not matter because it looks like the production docker swarm nodes use the 10Gb/s network which is on cv-nexus9k
  • Can we set up a test archive query that uses the "other" docker swarm which in this case would be the production swarm (na-arc-*)?
  • Why are there VLANs on the VM hosts.  e.g. em1.97 on naasc-vs-4?
    • 2022-08-12 dhart: If you want all of your guest VMs to be on the same subnet as the VM host, then VLAN awareness isn't needed.  However, in most cases we want the flexibility of being able to have VM guests on different networks (from one another and/or the VM host) so the VM host is configured with a trunk interface to the network to allow for any VLAN to be passed to the underlying VM guests housed on that VM host machine

    • 2022-08-12 dhart: 10.2.97.x (and 10.2.96.x) = internal VLAN for servers (primarily) 10.2.99.x = internal VLAN for server management
    • 10.2.120.x = internal VLAN for 10 GE connections
  • Where is the main docker config (yaml file)?
  • 2022-09-20 krowe: Why does naasc-vs-2 have APIPA configured networks (169.254.0.0)?  Aren't these usually created only if there are misconfigured network(s)?
    • [root@naasc-vs-2 ~]# netstat -nr
      Kernel IP routing table
      Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
      0.0.0.0         10.2.99.1       0.0.0.0         UG        0 0          0 eno1
      10.2.99.0       0.0.0.0         255.255.255.0   U         0 0          0 eno1
      10.2.120.0      0.0.0.0         255.255.255.0   U         0 0          0 ens1f0np0.120
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 ens1f0np0
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 ens1f0np0.120
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 br97
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 br101
      192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 virbr0
    • 2022-09-28 krowe: APIPA routes are created via /etc/sysconfig/network-scripts/ifup-eth which is installed from the network-scripts RPM.  This RPM is legacy for RHEL8 (naasc-vs-2 is RHEL8.6) and must have been installed specificly.  It is not installed on any other RHEL8 machine I have checked.
  • 2022-09-26 krowe: Can an older solarflare card (Solarflare Communications SFC9020) replace the card in naasc-vs-2 to see if that helps with the TCP Retransmissions? 
  • Why can't I download via na-arc-6?  I don't think it is properly setup yet.
  • Why do I see cv-6509 when tracerouting from na-arc-5 to nangas13 but not on natest-arc-1
    • [root@na-arc-5 ~]# traceroute nangas13
      traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
       1  cv-6509-vlan97.cv.nrao.edu (10.2.97.1)  0.426 ms  0.465 ms  0.523 ms
       2  cv-6509.cv.nrao.edu (10.2.254.5)  0.297 ms  0.277 ms  0.266 ms
       3  nangas13.cv.nrao.edu (10.2.140.33)  0.197 ms  0.144 ms  0.109 ms
       
    • [root@natest-arc-1 ~]# traceroute nangas13
      traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
       1  cv-6509-vlan96.cv.nrao.edu (10.2.96.1)  0.459 ms  0.427 ms  0.402 ms
       2  nangas13.cv.nrao.edu (10.2.140.33)  0.184 ms  0.336 ms  0.311 ms
    • Derek wrote that 10.2.99.1 = CV-NEXUS and 10.2.96.1 = CV-6509
  • 2022-09-28 krowe: Why was the network-scripts RPM installed on naasc-vs-2?  No other RHEL8 machine has this RPM.  Was it because nobody knew how to configure vlans and other complicated networking using NetworkManager, which is the new standard in RHEL8?
    • 2022-10-05 krowe: Yes. RHEL8 makes bridges and vlans really complicated so Tracy installed the network-scripts RPM and configured things the old way.
  • 2022-09-21 krowe: Why are there stuck inventory processes on naasc-vs-2?
    • 2022-10-05 krowe: This is an RHEL8 issue, not a network issue.  All the RHEL8 machines in CV have this problem.
  • Why does naasc-vs-3 have a br120 in state UNKNOWN?  none of the other naasc-vs nodes have a br120.
    • 2022-10-05 krowe: This is because it is easier to create and not use then not create.
  • Why does natest-arc-3 have ens3 instead of eth0 and why is its speed 100Mb/s?
    • virsh domiflist natest-arc-3 shows the Model as rtl8139 instead of virtio
    • When I run ethtool eth0 on nar-arc-{1..5} natest-arc-{1..2} as root, the result is just Link detected: yes instead of the full report with speed while na-arc-3 shows 100Mb/s.
    • 2022-10-05 krowe: This should be fixed but after the test swarm is no longer acting as the production swarm.
    • I think this is just another example of why CV needs good documentation to create VMs

...