Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • 2022-09-07 krowe: Doing tcpdumps of iperf3 tests between ingress_sbox namespaces shows that the TCP iperf3 packets are being NATed into UDP packets.  So I used iperf3 from across na-arc nodes (not in the ingress_sbox namespaces)
    • iperf3 -B <LOCAL IP> -c <REMOTE IP> -u -b 2000000000 -t 100
    • Table5: iperf3 UDP to/from hosts (% packet loss)


      na-arc-1

      (naasc-vs-4)

      na-arc-2

      (naasc-vs-4)

      na-arc-3

      (naasc-vs-3)


      na-arc-4

      (naasc-vs-4)

      na-arc-5

      (naasc-vs-5)

      na-arc-1




      na-arc-2




      na-arc-3




      na-arc-4




      na-arc-5




  • 2022-09-08 krowe: I have tested the other overlay networks (production_agent_network 10.0.1.0/24 and production_default 10.0.2.0/24) and they perform similarly to the ingress overlay network 10.0.0.0/24.
  • 2022-09-09 krowe: na-arc-6 is now online served from naasc-vs-2. Here are the iperf3 tests from ingress_sbox to ingress_sbox.  When throughput is slow (Kb/s) I see that the congestion window size is reduced from about 1MB to about 2.73KB.


    Table6: iperf3 TCP throughput from/to ingress_sbox (Mb/s)

    na-arc-1

    (naasc-vs-4)

    na-arc-2

    (naasc-vs-4)

    na-arc-3

    (naasc-vs-3)

    na-arc-4

    (naasc-vs-4)

    na-arc-5

    (naasc-vs-5)

    na-arc-6

    (naasc-vs-2)

    na-arc-1
    39202300420031103280
    na-arc-23950
    2630400033503530
    na-arc-30.20.3
    0.227202810
    na-arc-4386035802410
    33903290
    na-arc-50.20.224800.2
    2550
    na-arc-60.0050.00527900.0053290
  • 2022-09-09 krowe: The ingress network (docker mesh) that I have been testing using the ingress_sbox namespace uses a veth interface (this is like a pipe) that connects to its corrosponding veth interface in another namespace on the same host which connects to a vxlan over a bridge in that second namespace.  vxlan is a tunneling protocol that uses UDP over port 4789.  This is why I am seeing my TCP packets turn into UDP packets.  Using tcpdump in the ingress_sbox to watch iperf TCP traffic going from na-arc-2 to na-arc-3 looks clean.  Watching traffic going from na-arc-3 to na-arc-2, which is slow (32KB/s), shows lots of TCP Retransmission and TCP Out-Of-Order packets.
  • 2022-09-15 krowe: Even with rx-gro-hw=off on naasc-vs-4, I am still seeing some retransmissions in iper3 tests.  These are the same as TCP Retransmissions seen previously.  On a modern, well-designed network I would expect to see almost no TCP Retransmissions.  So this may indicate that there are still improvements to be made.  The number of retransmissions seems to vary over time from 0 retransmissions to over a thousand retransmissions on certain directions.  This makes me think there is something else using the 10Gb network that is interfering with my tests.
  • This is a 10 second iper3 test using TCP from the host in the left column to the host in the top row.

    TableXX iperf3 Retransmissions over 10Gb and rx-gro-hw=off

    naasc-vs-2

    (10.2.120.107)

    naasc-vs-3

    (10.2.120.109)

    naasc-vs-4

    (10.2.120.110)

    naasc-vs-5

    (10.2.120.112)

    naasc-vs-2
    0, 0, 00, 0, 045, 52, 59
    naasc-vs-387, 0, 19, 1734
    0, 0, 074, 52, 56
    naasc-vs-40, 342, 1147, 3630, 0, 0
    83, 51, 50
    naasc-vs-5494, 0, 1296, 240, 0, 00, 0, 0

    This looks like some sort of misconfiguration on the receiving ends of naasc-vs-2 and naasc-vs-5.

    TableXX iperf3 Retransmissions over 10Gb and rx-gro-hw=off

    na-arc-1

    10.2.97.71

    na-arc-2

    10.2.97.72

    na-arc-3

    10.2.97.73

    na-arc-4

    10.2.97.74

    na-arc-5

    10.2.97.75

    na-arc-6

    10.2.97.76

    na-arc-1
    0, 0, 00, 0, 00, 0, 055, 75, 50323, 501, 538
    na-arc-20, 0, 0
    0, 0, 00, 0, 068, 81, 64768, 1050, 658
    na-arc-31692, 1627, 20710, 1326, 592
    1471, 3376, 686360, 2477, 6641873, 1872, 2384
    na-arc-40, 0, 00, 0, 00, 0, 0
    58, 86, 654, 9, 38
    na-arc-5108, 6, 66, 6, 62, 1, 16, 6, 6
    1293, 1197, 33
    na-arc-6106, 0, 280, 0, 210, 88, 07, 0, 2889, 75, 52
  • I think the large number of retransmissions when transmissing from naasc-vs-* to naasc-vs-2 the cause for the large number of retransmissions when transmitting from na-arc-* to na-arc-6.
  • I don't know what explains the retransmissions when transmitting from na-arc-3 to na-arc-*.
  • I don't think the retransmissions from na-arc-3 to na-arc-* can be atributed to MTU.  Sure eth0 on na-arc-3 is 1500 while all the other na-arc nodes are 9000 but that should not cause a problem.  If anything it sould be a problem the other way around.  Also I tested changing na-arc-6 to 1500 and retransmissions didn't change.  The lack of retransmissions between na-arc-1, na-arc-2, and na-arc-4 is because they are all on the same VM Host (naasc-vs-4).

    • You can use ping to see if your packet size actually gets through.  This is a good way to test MTU sizes.
      • ping -c 3 -M do -s 1500 na-arc-1

Comparisons

naasc-vs-2, 3, 4, 5

Identical

  • 2022-09-02 krowe: sysctl -a | grep br97 across naasc-vs-{3..5} are identical.
  • 2022-09-02 krowe: sysctl -a | grep <vnet> across naasc-vs-{3..5} are identical except for the vnet name (e.g. vnet2, vnet4, etc)
  • 2022-09-02 krowe: sysctl -a across naasc-vs-3 and naasc-vs-5 have no significant differences.
  • 2022-09-06 krowe: CV-NEXUS switch port capabilities for naasc-vs-{3..5} are identical.
  • 2022-09-06 krowe: CV-NEXUS9K switch port capabilities for naasc-vs-{2..5} are identical.
  • 2022-09-06 krowe: ethtool -k <NIC> across naasc-vs-3 and naasc-vs-5 are identical execpt for the NIC name.
  • 2022-09-07 krowe: iptables -L and iptables -S across naasc-vs-{3..5} are identical.
  • 2022-09-02 krowe: sysctl -a | grep <10Gb NIC> across naasc-vs-3 and naasc-vs-5 are identical except for the name of the NIC.
  • 2022-09-19 krowe: ethtool -g <NIC> across naasc-vs-3 and naasc-vs-5 are identical.

...