Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

TL;DR ethtool -K em1 gro off needs to be permenantly set on naasc-vs-4

This was first reported on 2022-04-18 and documented in https://ictjira.alma.cl/browse/AES-52  What we have seen/has been reported is that sometimes downloads are incredibly slow (10s of kB/s) and sometimes the transfer is closed with data missing from the download. Other times we see perfectly reasonable download speeds (~10 MB/s).  This was reproducable with a command like the following

...

Some of the NAASC VM hosts show lots of dropped Rx packets.  The rate ranges from 2 to over 100 per minute.  This is really unacceptable on a modern, well-designed network.  While I can't say these dropped packets are indicative of a problem, they could become a problem with increased load and they certainly will make debugging more difficult when there is a problem.  I suggest the reason for these dropped packets be found and resolved.

Further tests show patterns.  It looks like the same packets may be being dropped on naasc-vs-2 and naasc-vs-4 as they report the same dropped packet rate.  For example, I wrote a simple script to print dropped packets per time interval and ran it at the same time on all four naasc-vs hosts.  You can see that naasc-vs-2 and naasc-vs-4 show a similar pattern, while naasc-vs-3 and naasc-vs-5 show a different pattern.

naasc-vs-2naasc-vs-3naasc-vs-4naasc-vs-5
300300
220240
131111
9090
8080
121121




TCP retransmissions

The newest NAASC VM Host (naasc-vs-2) shows over 100 TCP retransmissions per second when doing iperf3 tests.  Other nodes like naasc-vs-3 and naasc-vs-4 do not show these at all.  While I can't say these TCP retransmissions are indicative of a problem, they could become a problem with increased load and they certainly will make debugging more difficult when there is a problem.  I suggest the reason for these TCP retransmissions be found and resolved.

...