AlgoDesign

Checksum

In a distributed system, while moving data between components, it is possible that the data fetched from a node may arrive corrupted. This corruption can occur because of faults in a storage device, network, software, etc. How can a distributed system ensure data integrity, so that the client receives an error instead of corrupt data?

Calculate a checksum and store it with data.

To calculate a checksum, a cryptographic hash function like MD5, SHA-1, SHA-256, or SHA-512 is used. The hash function takes the input data and produces a string (containing letters and numbers) of fixed length; this string is called the checksum.

When a system is storing some data, it computes a checksum of the data and stores the checksum with the data. When a client retrieves data, it verifies that the data it received from the server matches the checksum stored. If not, then the client can opt to retrieve that data from another replica.

Checksum use cases

Data Integrity Checks

Imagine sending a super-secret spy message - you wanna make sure it doesn’t get altered during transmission, right? Checksums ensure data hasn’t been tampered with during transmission by checking - you guessed it - the Checksum! If it doesn’t match, something’s fishy.

Error Detection

Ever download a file and it just won’t open? Checksums can help detect if a bit of data got scrambled during a download, helping systems know when they need to try downloading it again.

Data Retrieval and Verification

When you download software or a file from a website, they often provide a Checksum value. You can use this to verify the integrity of the data, ensuring that what you downloaded is exactly what the creators intended, no nasty surprises hiding inside!

Networking

In networking, Checksums help detect errors in packets of data sent over a network. If the arriving packet's Checksum doesn’t match the one it was sent with, the packet can be rejected, ensuring no corrupted data gets through!

Secure Storage

In some databases, Checksums help maintain the integrity of the stored data. Periodically, the stored data is checked against the Checksum - if it doesn’t match, the system knows something’s amiss in the storage system!

Password Verification

Some systems store the Checksum of a password instead of the password itself. When you log in, the system runs the Checksum algorithm on the password you enter and checks that Checksum against the stored Checksum. If they match, you’re in! No need to store actual passwords, adding a layer of security!

Preventing Accidental Duplicates

Systems can use Checksums to prevent accidentally storing duplicate data. If two pieces of data have the same Checksum, they might be duplicates, saving storage space and preventing redundancy.