Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • I don't think this is because of broadcast noise on the 10Gb/s network (10.2.120.0/24) as I don't see see large dropped packet counts on all naasc-vs hosts.
  • 2022-09-26 krowe: Interestingly, if I watch the number of packets dropped per minute (I worte a script) and run the scripts at the same time on all four naasc-vs hosts, I see patterns.  The number of dropped packets each minute is identical between naasc-vs-2 and naasc-vs-4 and hovers around 100.  The number of dropped packets each minute is identical between naasc-vs-3 and naasc-vs-5 and hovers around 2.  This tells me that naasc-vs-2 and naasc-vs-4 are getting the same traffic and dropping it the same way.  What is this traffic?
  • 2022-09-26 krowe: I set na-arc-6, the only guest on naasc-vs-2, to drain in docker swarm to see if that reduced the number of dropped packets seen on naasc-vs-2.  Thinking it was docker swarm creating this traffic.  There was no change in dropped packet rate.  It continued to match naasc-vs-4.
  • 2022-09-26 krowe: On naasc-vs-3 and naasc-vs-5 I see the dropped packet count per minute at about 2 but every 5 or 6 minutes the count inreases to 10 or 11.
  • 2022-09-26 krowe: I tried looking at other nodes on the 10Gb 10.2.120.0/24 network but I couldn't login to most of them.  One I could login to is cv-vs-4 and it is also seeing dropped Rx packets on its 10Gb interface at about the same rate as naasc-vs-3 and naasc-vs-5.  This makes me think that these dropped packets have nothing to do with docker swarm.  Perhaps there is just something on that network (some misconfigured Windows box or something) that is throwing bad packets around.  That doesn't explain the difference in the dropped packet rates though.
  • 2022-09-27 krowe: try clearing the ARP cache on the switch?  Perhaps the switch is sending packets to the node to an IP address that is no longer there like because the container moved.
  • use tcpdump and sort by destination looking for the number of dropped packets per minute.
  • 2022-10-03 krowe: dhart and thalstead inserted a second 10Gb/s card in naasc-vs-2.  This one is supposedly a Solarflare SNF8522 even though Linux detects it as the same model as the original card (Solarflare Communications SFC9220 10/40G Ethernet Controller [1924:0a03]).  Tracy configured this card to be the 10Gb/s NIC of naasc-vs-2 (ens2f0np0).  I don't know why they didn't just remove the original card an insert the new card thus requiring no changes to the configuration but whatever.  I am still seeing about 60 dropped packets per minute, and it still matches the dropped packets on naasc-vs-4.  So the idea that the original card had some hardware flaw (like bad memory or something) is disproven.
  • 2022-10-14 krowe: I may have a line on what is causing the dropped packets.  Several times now when I do tcpdumps on either naasc-vs-2 or naasc-vs-4, tcpdump at the end tells me how many packets were dropped by the interface.  This number matches what my droprate.sh script shows (it gets its information from ifconfig which is probably what tcpdump does also).  Looking at these tcpdumps, I see a number of LLMNR (that's the protocol) packets which is about twice this number.  Half of these packets are IPV4 and half are IPV6.  They are both going to multicast addresses (224.0.0.252 and ff02::1:2).  This is leading me to think naasc-vs-2 and naasc-vs-4 are dropping packets either because the are from IPV6 addresses or they are destened for IPV6 multicast addresses.  Not sure which or even if that is correct.  But so far these sorts of packets fit the number of dropped packets.  I see these same LLMNR packets, both IPV4 and IPV6 on naasc-vs-3 but that host reports almost no dropped packets.  Configuration difference?
    • This page https://access.redhat.com/articles/22304 has a python script that will join a multicast group.
    • I see with netstat -ng naasc-vs-3 and naasc-vs-5 are in the special multicast group 224.0.0.251 while naasc-vs-2 and naasc-vs-4 are not.  This configuration difference lines up with seeing dropped packets on naasc-vs-2 and naasc-vs-4 but not on naasc-vs-3 and naasc-vs-5 (or at least a lot fewer dropped packets).  But joining this multicast group on naasc-vs-2 didn't change the dropped packet rate.
    • I will ask if the LLMNR protocol on cvsccm can be disabled.  Looking around the internet it seems that this is a pretty old way for Windows to share hostnames and is largely deprecated.
    • 2022-10-17 krowe: Actually, I see a strong correlation between IPV6 packets in tcpdump and dropped packets reported by both my script and tcpdump itself on naasc-vs-2 and naasc-vs-4.  But I don't understand why these two hosts would be dropping IPV6 packets while naasc-vs-3 would not.  If I tcpdump all three hosts, I see the same IPV6 packets on all three (usually LLMNR multicast or DHCP6 broadcast).  And all three hosts have ipv6.disable=1 in /proc/cmdline.
    • 2022-10-17 krowe: I see an even stronger correlation between dropped packets and IPV6 packets on VLAN ID 96.  This could explain the dropped packets as neither naasc-vs-2 nor naasc-vs-4 have VLAN 96 configured while both naasc-vs-3 and naasc-vs-5 do have VLAN 96 configured.  This should be an easy test.  Just configure VLAN 96 on naasc-vs-2 and see if the dropped packet rate goes from about 1 per second to less than 1 per minute.
    • Here is a PCAP file.  tcpdump reported 85 packets dropped by the interface while capturing these packets.  If you use wireshark to look at this file you can see there are 86 IPV6 packets (ipv6.src) and all but one of them were on VLAN 96 (ipv6.src and vlan.id == 96).
    • 2022-10-18 krowe: Tracy created VLAN96 interfaces (p1p1.96, br96) on naasc-vs-2 and the dropped packet rate on naasc-vs-2 is now very similar to that of naasc-vs-3 and naasc-vs-5, which is about 2 packets per minute as opposed to 1 or more packets per second.  This means naasc-vs-2 was dropping packets that were both IPV6, which is disabled, and VLAN96, which wasn't configured.  I find it strange that it took both features to drop a packet (IPV6 and VLAN96).  I would have thought either would be sufficient for the packet to be dropped.  This is good to know for future reference when putting a Linux machine on a trunking port.  Doing tcpdumps now, I expect the remaining packets, 1 or 2 per minute, are likely DHCP6 IPV6 packets for VLAN192.
    • We should decide if we want to configure all possible VLANs these hosts may see so that the dropped packet count remains zero or live with an ever increasing dropped packet count and document that this is probably caused by unconfigured VLANs.


Comparisons

naasc-vs-2, 3, 4, 5

...

  • Set ethtool -K em1 gro off perminantly on naasc-vs-4 and document it.  How do we do this?
  • Strawman proposal for reassigning VM guestskrowe to make tickets for solutions
  • 2022-10-05 krowe: Change the NIC Model on natest-arc-3.  It is currently rtl8139 instead of virtio and is its speed 100Mb/s instead of 1000Mb/s.
    • You can see this with virsh domiflist natest-arc-3 on naasc-vs-5.
    • 2022-10-05 krowe: This should be fixed but after the test swarm is no longer acting as the production swarm
  • Decide if we want to configure all possible VLANs these hosts may see so that the dropped packet count remains zero or live with an ever increasing dropped packet count and document that this is probably caused by unconfigured VLANs.


Answers

  • Why does iperf show 10Gb/s between na-arc-5 and na-arc-[1,2,4]?  How is this possible if the default interface on the respective VM Hosts is 1Gb/s?
    • ANSWER: The vnets for the VM guests are tied to the 10Gb/s NICs on the VM hosts not the 1Gb/s NICs.
  • Why do natest-arc-{1..3} have 9 veth* interfaces in ip addr show while na-arc-{1..5} don't have any veth* interfaces?
    • Each container creates a veth* interface.
  • Why does na-arc-3 have such poor network performance to the other na-arc nodes?
    • ping na-arc-[1,2,4,5] with anything larger than -s 1490 drops all packets
    • iperf tests show 10Gb/s between the VM host of na-arc-3 (naasc-vs-3 p5p1.120) and the VM host of na-arc-5 (naasc-vs-5 p2p1.120).  So it isn't a bad card in either of the VM hosts.
    • iptables on na-arc-3 looks different than iptables on na-arc-[2,3,5].  na-arc-1 also looks a bit different.
    • docker_gwbridge interface on na-arc-[1,2,4,5] shows NO_CARRIER but not on na-arc-3.
    • na-arc-3 has a veth10fd1da@if37 interface.  None of the other na-arc-* nodes have a veth interface.
    • Production docker swarm iperf tests measured in Gb/s.


      na-arc-1

      (naasc-vs-4)

      na-arc-2

      (naasc-vs-4)

      na-arc-3

      (naasc-vs-3)

      na-arc-4

      (naasc-vs-4)

      na-arc-5

      (naasc-vs-5)

      na-arc-1
      180.0022010

      na-arc-2

      20
      0.0022010
      na-arc-30.0020.002
      0.0020.002
      na-arc-420190.002

      na-arc-510100.0021010

      There is clearly something wrong with na-arc-3

    • ANSWER: Since there were so many problems with na-arc-3, it was decided to recreate it.  It was recreated from a clone of na-arc-2.
  • Is putting all the 1Gb/s production docker swarm nodes on the same ASIC on the same Fabric Extender of the cv-nexus switch a good idea?
    • I am thinking it does not matter because it looks like the production docker swarm nodes use the 10Gb/s network which is on cv-nexus9k
  • Can we set up a test archive query that uses the "other" docker swarm which in this case would be the production swarm (na-arc-*)?
  • Why are there VLANs on the VM hosts.  e.g. em1.97 on naasc-vs-4?
    • 2022-08-12 dhart: If you want all of your guest VMs to be on the same subnet as the VM host, then VLAN awareness isn't needed.  However, in most cases we want the flexibility of being able to have VM guests on different networks (from one another and/or the VM host) so the VM host is configured with a trunk interface to the network to allow for any VLAN to be passed to the underlying VM guests housed on that VM host machine

    • 2022-08-12 dhart: 10.2.97.x (and 10.2.96.x) = internal VLAN for servers (primarily) 10.2.99.x = internal VLAN for server management
    • 10.2.120.x = internal VLAN for 10 GE connections
  • Where is the main docker config (yaml file)?
  • 2022-09-20 krowe: Why does naasc-vs-2 have APIPA configured networks (169.254.0.0)?  Aren't these usually created only if there are misconfigured network(s)?
    • [root@naasc-vs-2 ~]# netstat -nr
      Kernel IP routing table
      Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
      0.0.0.0         10.2.99.1       0.0.0.0         UG        0 0          0 eno1
      10.2.99.0       0.0.0.0         255.255.255.0   U         0 0          0 eno1
      10.2.120.0      0.0.0.0         255.255.255.0   U         0 0          0 ens1f0np0.120
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 ens1f0np0
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 ens1f0np0.120
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 br97
      169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 br101
      192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 virbr0
    • 2022-09-28 krowe: APIPA routes are created via /etc/sysconfig/network-scripts/ifup-eth which is installed from the network-scripts RPM.  This RPM is legacy for RHEL8 (naasc-vs-2 is RHEL8.6) and must have been installed specificly.  It is not installed on any other RHEL8 machine I have checked.
  • 2022-09-26 krowe: Can an older solarflare card (Solarflare Communications SFC9020) replace the card in naasc-vs-2 to see if that helps with the TCP Retransmissions? 
  • Why can't I download via na-arc-6?  I don't think it is properly setup yet.
  • Why do I see cv-6509 when tracerouting from na-arc-5 to nangas13 but not on natest-arc-1
    • [root@na-arc-5 ~]# traceroute nangas13
      traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
       1  cv-6509-vlan97.cv.nrao.edu (10.2.97.1)  0.426 ms  0.465 ms  0.523 ms
       2  cv-6509.cv.nrao.edu (10.2.254.5)  0.297 ms  0.277 ms  0.266 ms
       3  nangas13.cv.nrao.edu (10.2.140.33)  0.197 ms  0.144 ms  0.109 ms
       
    • [root@natest-arc-1 ~]# traceroute nangas13
      traceroute to nangas13 (10.2.140.33), 30 hops max, 60 byte packets
       1  cv-6509-vlan96.cv.nrao.edu (10.2.96.1)  0.459 ms  0.427 ms  0.402 ms
       2  nangas13.cv.nrao.edu (10.2.140.33)  0.184 ms  0.336 ms  0.311 ms
    • Derek wrote that 10.2.99.1 = CV-NEXUS and 10.2.96.1 = CV-6509
  • 2022-09-28 krowe: Why was the network-scripts RPM installed on naasc-vs-2?  No other RHEL8 machine has this RPM.  Was it because nobody knew how to configure vlans and other complicated networking using NetworkManager, which is the new standard in RHEL8?
    • 2022-10-05 krowe: Yes. RHEL8 makes bridges and vlans really complicated so Tracy installed the network-scripts RPM and configured things the old way.
  • 2022-09-21 krowe: Why are there stuck inventory processes on naasc-vs-2?
    • 2022-10-05 krowe: This is an RHEL8 issue, not a network issue.  All the RHEL8 machines in CV have this problem.
  • Why does naasc-vs-3 have a br120 in state UNKNOWN?  none of the other naasc-vs nodes have a br120.
    • 2022-10-05 krowe: This is because it is easier to create and not use then not create.
  • Why does natest-arc-3 have ens3 instead of eth0 and why is its speed 100Mb/s?
    • virsh domiflist natest-arc-3 shows the Model as rtl8139 instead of virtio
    • When I run ethtool eth0 on nar-arc-{1..5} natest-arc-{1..2} as root, the result is just Link detected: yes instead of the full report with speed while na-arc-3 shows 100Mb/s.
    • 2022-10-05 krowe: This should be fixed but after the test swarm is no longer acting as the production swarm.
    • I think this is just another example of why CV needs good documentation to create VMs

...