Lloyd Wood commenting on an e2e post recently was asked why UDP has an end-to-end checksum on the packet since it doesn’t do retransmissions, and should it be turned off. Lloyd noted UDP “could have the checksum turned off, which proved disastrous for a number of applications, subtly corrupted filing systems which didn’t have higher-level end2end checks”. Lloyd is exactly right here. But why would someone turn off UDP checksums in the first place – it doesn’t seem to make sense, does it?
It is often the case that people turn off UDP checksums to “buy” more performance by relying on the CRC of the ethernet packet. So this is not a stupid question – it’s a very smart question, and a lot of smart people get fooled by the simplicity of the process. Performance gain by turning off checksums now can be obviated through the use of intelligent NIC technologies like SiliconTCP and TOE that calculate the checksum as the packet is being received.
This is a surprisingly common problem in datacenters – sometimes the problem would be a switch, sometimes a configuration error, sometimes a programming error in the application, and so forth. I most recently experienced this problem with an overheated ethernet switch passing video on an internal network. Since we don’t have things like SiliconTCP in commodity switches yet, check that switch if you’re having problems. In the meantime, here’s a few little datacenter horror stories to put in your pocket.