Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • 2022-09-07 krowe: Doing tcpdumps of iperf3 tests between ingress_sbox namespaces shows that the TCP iperf3 packets are being NATed into UDP packets.  So I used iperf3 from across na-arc nodes (not in the ingress_sbox namespaces)
    • iperf3 -B <LOCAL IP> -c <REMOTE IP> -u -b 2000000000 -t 100
    • Table5: iperf3 UDP to/from hosts (% packet loss)


      na-arc-1

      (naasc-vs-4)

      na-arc-2

      (naasc-vs-4)

      na-arc-3

      (naasc-vs-3)


      na-arc-4

      (naasc-vs-4)

      na-arc-5

      (naasc-vs-5)

      na-arc-1




      na-arc-2




      na-arc-3




      na-arc-4




      na-arc-5




  • 2022-09-08 krowe: I have tested the other overlay networks (production_agent_network 10.0.1.0/24 and production_default 10.0.2.0/24) and they perform similarly to the ingress overlay network 10.0.0.0/24.
  • 2022-09-09 krowe: na-arc-6 is now online served from naasc-vs-2. Here are the iperf3 tests from ingress_sbox to ingress_sbox.  When throughput is slow (Kb/s) I see that the congestion window size is reduced from about 1MB to about 2.73KB.


    Table6: iperf3 TCP throughput from/to ingress_sbox (Mb/s)

    na-arc-1

    (naasc-vs-4)

    na-arc-2

    (naasc-vs-4)

    na-arc-3

    (naasc-vs-3)

    na-arc-4

    (naasc-vs-4)

    na-arc-5

    (naasc-vs-5)

    na-arc-6

    (naasc-vs-2)

    na-arc-1
    39202300420031103280
    na-arc-23950
    2630400033503530
    na-arc-30.20.3
    0.227202810
    na-arc-4386035802410
    33903290
    na-arc-50.20.224800.2
    2550
    na-arc-60.0050.00527900.0053290
  • 2022-09-09 krowe: The ingress network (docker mesh) that I have been testing using the ingress_sbox namespace uses a veth interface (this is like a pipe) that connects to its corrosponding veth interface in another namespace on the same host which connects to a vxlan over a bridge in that second namespace.  vxlan is a tunneling protocol that uses UDP over port 4789.  This is why I am seeing my TCP packets turn into UDP packets.  Using tcpdump in the ingress_sbox to watch iperf TCP traffic going from na-arc-2 to na-arc-3 looks clean.  Watching traffic going from na-arc-3 to na-arc-2, which is slow (32KB/s), shows lots of TCP Retransmission and TCP Out-Of-Order packets.
  • 2022-09-15 krowe: I am seeing retransmissions in iper3 tests.  These are not the same as TCP Retransmissions

Comparisons

naasc-vs-3, 4, 5

...

  • 2022-09-02 krowe: sysctl -a | grep <10Gb NIC> between naasc-vs-3/naasc-vs-5 and naasc-vs-4 are different
    • naasc-vs-4 has entries for VLANs 101 and 140 while naasc-vs-3 and naasc-vs-5 have entries for VLANs 192 and 96.
  • 2022-09-02 krowe: sysctl -a on naasc-vs-4 and naasc-vs-5 and found many questionable differences
    • naasc-vs-4: net.iw_cm.default_backlog = 256
      • Is this because the IB modules are loaded?
    • naasc-vs-4: net.rdma_ucm.max_backlog = 1024
      • Is this because the IB modules are loaded?
    • naasc-vs-4: sunrpc.rdma*
      • Is this because the IB modules are loaded?
    • naasc-vs-4: net.netfilter.nf_log.2 = nfnetlink_log
      • nfnetlink is a module for packet mangling.  Could this interfear with the docker swarm networking?
    • Though the recorded output rate of naasc-vs-5 is about 500 Mb/s while naasc-vs-{3..4} is about 300Kb/s.
    • And the recorded input rate of naasc-vs-5 is about 500 Mb/s while naasc-vs{3..4} is about 5 Mb/s.
    • This is very strange as it seemed naasc-vs-5 was the limiting factor but the switch ports suggest not.  Perhaps this data rate is caused by other VM guests on naasc-vs-5 (helpdesk-prod, naascweb2-prod, cartaweb-prod, natest-arc-3, cobweb2-dev)
  • 2022-09-06 krowe: ethtool -k <NIC> for naasc-vs-3/naasc-vs-5 are very different from naasc-vs-4.
    • hw-tc-offload: off vs hw-tc-offload: on
    • rx-gro-hw: off vs rx-gro-hw: on
    • rx-vlan-offload: off vs rx-vlan-offload: on
    • rx-vlan-stag-hw-parse: off vs rx-vlan-stag-hw-parse: on
    • tcp-segmentation-offload: off vs tcp-segmentation-offload: on
    • tx-gre-csum-segmentation: off vs tx-gre-csum-segmentation: on
    • tx-gre-segmentation: off vs tx-gre-segmentation: on
    • tx-gso-partial: off vs x-gso-partial: on
    • tx-ipip-segmentation: off vs tx-ipip-segmentation: on
    • tx-sit-segmentation: off vs tx-sit-segmentation: on
    • tx-tcp-segmentation: off vs tx-tcp-segmentation: on
    • tx-udp_tnl-csum-segmentation: off vs tx-udp_tnl-csum-segmentation: on
    • tx-udp_tnl-segmentation: off vs tx-udp_tnl-segmentation: on
    • tx-vlan-offload: off vs tx-vlan-offload: on
    • tx-vlan-stag-hw-insert: off vs tx-vlan-stag-hw-insert: on
  • 2022-09-12 krowe: I found the rx and tx buffers for em1 on naasc-vs-4 were 511 while on naasc-vs-2, 3, and 5 were 1024.  I changed naasc-vs-4 with the following ethtool -G em1 rx 1024 tx 1024 but it didn't change iperf performance.
  • 2022-09-12 krowe: I found an article suggesting that gro can make traffic slower when it is enabled.  I see that rx-gro-hw is enabled on naasc-vs-4 but disabled on naasc-vs-3 and 5.  You can see this with ethtool -k em1 | grep gro.So I disabled it on naasc-vs-4 with ethtool -K em1 gro off and iperf3 tests now show about 2Gb/s both directions!!!
  • 2022-09-15 krowe: The VM Hosts have different 10Gb network cards
    • naasc-vs-2 uses a Solarflare Communications SFC9220
    • naasc-vs-3 uses a Solarflare Communications SFC9020
    • naasc-vs-4 uses a Broadcom BCM57412 NetXtreme-E
    • naasc-vs-5 uses a Solarflare Communications SFC9020

...

  • 2022-09-02 krowe: sysctl -a | grep <10Gb NIC> between naasc-vs-3/naasc-vs-5 and naasc-vs-4 are different
    • naasc-vs-4 has entries for VLANs 101 and 140 while naasc-vs-3 and naasc-vs-5 have entries for VLANs 192 and 96.
  • 2022-09-02 krowe: sysctl -a on naasc-vs-4 and naasc-vs-5 and found many questionable differences
    • naasc-vs-4: net.iw_cm.default_backlog = 256
      • Is this because the IB modules are loaded?
    • naasc-vs-4: net.rdma_ucm.max_backlog = 1024
      • Is this because the IB modules are loaded?
    • naasc-vs-4: sunrpc.rdma*
      • Is this because the IB modules are loaded?
    • naasc-vs-4: net.netfilter.nf_log.2 = nfnetlink_log
      • nfnetlink is a module for packet mangling.  Could this interfere with the docker swarm networking?
  • 2022-09-06 krowe: ethtool -k <NIC> for naasc-vs-3/naasc-vs-5 are very different from naasc-vs-4.
    • hw-tc-offload: off vs hw-tc-offload: on
    • rx-gro-hw: off vs rx-gro-hw: on
    • rx-vlan-offload: off vs rx-vlan-offload: on
    • rx-vlan-stag-hw-parse: off vs rx-vlan-stag-hw-parse: on
    • tcp-segmentation-offload: off vs tcp-segmentation-offload: on
    • tx-gre-csum-segmentation: off vs tx-gre-csum-segmentation: on
    • tx-gre-segmentation: off vs tx-gre-segmentation: on
    • tx-gso-partial: off vs x-gso-partial: on
    • tx-ipip-segmentation: off vs tx-ipip-segmentation: on
    • tx-sit-segmentation: off vs tx-sit-segmentation: on
    • tx-tcp-segmentation: off vs tx-tcp-segmentation: on
    • tx-udp_tnl-csum-segmentation: off vs tx-udp_tnl-csum-segmentation: on
    • tx-udp_tnl-segmentation: off vs tx-udp_tnl-segmentation: on
    • tx-vlan-offload: off vs tx-vlan-offload: on
    • tx-vlan-stag-hw-insert: off vs tx-vlan-stag-hw-insert: on
  • 2022-09-15 krowe: The VM Hosts have different 10Gb network cards
    • naasc-vs-2 uses a Solarflare Communications SFC9220
    • naasc-vs-3 uses a Solarflare Communications SFC9020
    • naasc-vs-4 uses a Broadcom BCM57412 NetXtreme-E
    • naasc-vs-5 uses a Solarflare Communications SFC9020

I found an articles suggesting that GRO can make traffic slower when it is enabled.  I see that rx-gro-hw is enabled on naasc-vs-4 but disabled on naasc-vs-3 and 5.  You can see this with ethtool -k em1 | grep gro. So I disabled it on naasc-vs-4 with ethtool -K em1 gro off and iperf3 tests now show about 2Gb/s both directions!!!


References