This idea proposes that the increased system CPU usage on the MDS is caused by RHEL7 but more spcificly specificaly Lustre-2.10. In the 1-year CPU graph of aocmds you can clearly see a large increase in system time starting around mid October 2019. There is a very similar increase in user/system CPU time on nmpost061 through nmpost070.
To test this, we rebooted nmpost071 through nmpost090 back into RHEL6/Lustre-2.5.5 so that vlasstest jobs will run on them. I have started 10 casa jobs, each one on a node in this range, started about 20 minutes apart. The system CPU usage on aocmds has not significantly changed because of these jobs which is inconclusive. The proper test of this idea may just be upgrading the Lustre servers to 2.10.
After upgrading the Lustre system in NM to 2.10.8, the effects on the MDS seem the same. So, it appears this wasn't caused by just a client difference.