I am looking to use Java to get the MD5 checksum of a file. I was really surprised but I haven't been able to find anything that shows how to get the MD5 checksum of a file.
How is it done?
There's an input stream decorator, java.security.DigestInputStream, so that you can compute the digest while using the input stream as you normally would, instead of having to make an extra pass over the data.
MessageDigest md = MessageDigest.getInstance("MD5");
try (InputStream is = Files.newInputStream(Paths.get("file.txt"));
DigestInputStream dis = new DigestInputStream(is, md))
{
/* Read decorated stream (dis) to EOF as normal... */
}
byte[] digest = md.digest();
Use DigestUtils from Apache Commons Codec library:
try (InputStream is = Files.newInputStream(Paths.get("file.zip"))) {
String md5 = org.apache.commons.codec.digest.DigestUtils.md5Hex(is);
}
There's an example at Real's Java-How-to using the MessageDigest class.
Check that page for examples using CRC32 and SHA-1 as well.
import java.io.*;
import java.security.MessageDigest;
public class MD5Checksum {
public static byte[] createChecksum(String filename) throws Exception {
InputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[1024];
MessageDigest complete = MessageDigest.getInstance("MD5");
int numRead;
do {
numRead = fis.read(buffer);
if (numRead > 0) {
complete.update(buffer, 0, numRead);
}
} while (numRead != -1);
fis.close();
return complete.digest();
}
// see this How-to for a faster way to convert
// a byte array to a HEX string
public static String getMD5Checksum(String filename) throws Exception {
byte[] b = createChecksum(filename);
String result = "";
for (int i=0; i < b.length; i++) {
result += Integer.toString( ( b[i] & 0xff ) + 0x100, 16).substring( 1 );
}
return result;
}
public static void main(String args[]) {
try {
System.out.println(getMD5Checksum("apache-tomcat-5.5.17.exe"));
// output :
// 0bb2827c5eacf570b6064e24e0e6653b
// ref :
// http://www.apache.org/dist/
// tomcat/tomcat-5/v5.5.17/bin
// /apache-tomcat-5.5.17.exe.MD5
// 0bb2827c5eacf570b6064e24e0e6653b *apache-tomcat-5.5.17.exe
}
catch (Exception e) {
e.printStackTrace();
}
}
}
The com.google.common.hash API offers:
A unified user-friendly API for all hash functions
Seedable 32- and 128-bit implementations of murmur3
md5(), sha1(), sha256(), sha512() adapters, change only one line of code to switch between these, and murmur.
goodFastHash(int bits), for when you don't care what algorithm you use
General utilities for HashCode instances, like combineOrdered / combineUnordered
Read the User Guide (IO Explained, Hashing Explained).
For your use-case Files.hash() computes and returns the digest value for a file.
For example a sha-1 digest calculation (change SHA-1 to MD5 to get MD5 digest)
HashCode hc = Files.asByteSource(file).hash(Hashing.sha1());
"SHA-1: " + hc.toString();
Note that crc32 is much faster than md5, so use crc32 if you do not need a cryptographically secure checksum. Note also that md5 should not be used to store passwords and the like since it is to easy to brute force, for passwords use bcrypt, scrypt or sha-256 instead.
For long term protection with hashes a Merkle signature scheme adds to the security and The Post Quantum Cryptography Study Group sponsored by the European Commission has recommended use of this cryptography for long term protection against quantum computers (ref).
Note that crc32 has a higher collision rate than the others.
Using nio2 (Java 7+) and no external libraries:
byte[] b = Files.readAllBytes(Paths.get("/path/to/file"));
byte[] hash = MessageDigest.getInstance("MD5").digest(b);
To compare the result with an expected checksum:
String expected = "2252290BC44BEAD16AA1BF89948472E8";
String actual = DatatypeConverter.printHexBinary(hash);
System.out.println(expected.equalsIgnoreCase(actual) ? "MATCH" : "NO MATCH");
Guava now provides a new, consistent hashing API that is much more user-friendly than the various hashing APIs provided in the JDK. See Hashing Explained. For a file, you can get the MD5 sum, CRC32 (with version 14.0+) or many other hashes easily:
HashCode md5 = Files.hash(file, Hashing.md5());
byte[] md5Bytes = md5.asBytes();
String md5Hex = md5.toString();
HashCode crc32 = Files.hash(file, Hashing.crc32());
int crc32Int = crc32.asInt();
// the Checksum API returns a long, but it's padded with 0s for 32-bit CRC
// this is the value you would get if using that API directly
long checksumResult = crc32.padToLong();
Ok. I had to add. One line implementation for those who already have Spring and Apache Commons dependency or are planning to add it:
DigestUtils.md5DigestAsHex(FileUtils.readFileToByteArray(file))
For and Apache commons only option (credit #duleshi):
DigestUtils.md5Hex(FileUtils.readFileToByteArray(file))
Hope this helps someone.
A simple approach with no third party libraries using Java 7
String path = "your complete file path";
MessageDigest md = MessageDigest.getInstance("MD5");
md.update(Files.readAllBytes(Paths.get(path)));
byte[] digest = md.digest();
If you need to print this byte array. Use as below
System.out.println(Arrays.toString(digest));
If you need hex string out of this digest. Use as below
String digestInHex = DatatypeConverter.printHexBinary(digest).toUpperCase();
System.out.println(digestInHex);
where DatatypeConverter is javax.xml.bind.DatatypeConverter
I recently had to do this for just a dynamic string, MessageDigest can represent the hash in numerous ways. To get the signature of the file like you would get with the md5sum command I had to do something like the this:
try {
String s = "TEST STRING";
MessageDigest md5 = MessageDigest.getInstance("MD5");
md5.update(s.getBytes(),0,s.length());
String signature = new BigInteger(1,md5.digest()).toString(16);
System.out.println("Signature: "+signature);
} catch (final NoSuchAlgorithmException e) {
e.printStackTrace();
}
This obviously doesn't answer your question about how to do it specifically for a file, the above answer deals with that quiet nicely. I just spent a lot of time getting the sum to look like most application's display it, and thought you might run into the same trouble.
public static void main(String[] args) throws Exception {
MessageDigest md = MessageDigest.getInstance("MD5");
FileInputStream fis = new FileInputStream("c:\\apache\\cxf.jar");
byte[] dataBytes = new byte[1024];
int nread = 0;
while ((nread = fis.read(dataBytes)) != -1) {
md.update(dataBytes, 0, nread);
};
byte[] mdbytes = md.digest();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < mdbytes.length; i++) {
sb.append(Integer.toString((mdbytes[i] & 0xff) + 0x100, 16).substring(1));
}
System.out.println("Digest(in hex format):: " + sb.toString());
}
Or you may get more info
http://www.asjava.com/core-java/java-md5-example/
We were using code that resembles the code above in a previous post using
...
String signature = new BigInteger(1,md5.digest()).toString(16);
...
However, watch out for using BigInteger.toString() here, as it will truncate leading zeros...
(for an example, try s = "27", checksum should be "02e74f10e0327ad868d138f2b4fdd6f0")
I second the suggestion to use Apache Commons Codec, I replaced our own code with that.
public static String MD5Hash(String toHash) throws RuntimeException {
try{
return String.format("%032x", // produces lower case 32 char wide hexa left-padded with 0
new BigInteger(1, // handles large POSITIVE numbers
MessageDigest.getInstance("MD5").digest(toHash.getBytes())));
}
catch (NoSuchAlgorithmException e) {
// do whatever seems relevant
}
}
Very fast & clean Java-method that doesn't rely on external libraries:
(Simply replace MD5 with SHA-1, SHA-256, SHA-384 or SHA-512 if you want those)
public String calcMD5() throws Exception{
byte[] buffer = new byte[8192];
MessageDigest md = MessageDigest.getInstance("MD5");
DigestInputStream dis = new DigestInputStream(new FileInputStream(new File("Path to file")), md);
try {
while (dis.read(buffer) != -1);
}finally{
dis.close();
}
byte[] bytes = md.digest();
// bytesToHex-method
char[] hexChars = new char[bytes.length * 2];
for ( int j = 0; j < bytes.length; j++ ) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
return new String(hexChars);
}
Here is a handy variation that makes use of InputStream.transferTo() from Java 9, and OutputStream.nullOutputStream() from Java 11. It requires no external libraries and does not need to load the entire file into memory.
public static String hashFile(String algorithm, File f) throws IOException, NoSuchAlgorithmException {
MessageDigest md = MessageDigest.getInstance(algorithm);
try(BufferedInputStream in = new BufferedInputStream((new FileInputStream(f)));
DigestOutputStream out = new DigestOutputStream(OutputStream.nullOutputStream(), md)) {
in.transferTo(out);
}
String fx = "%0" + (md.getDigestLength()*2) + "x";
return String.format(fx, new BigInteger(1, md.digest()));
}
and
hashFile("SHA-512", Path.of("src", "test", "resources", "some.txt").toFile());
returns
"e30fa2784ba15be37833d569280e2163c6f106506dfb9b07dde67a24bfb90da65c661110cf2c5c6f71185754ee5ae3fd83a5465c92f72abd888b03187229da29"
String checksum = DigestUtils.md5Hex(new FileInputStream(filePath));
Another implementation: Fast MD5 Implementation in Java
String hash = MD5.asHex(MD5.getHash(new File(filename)));
Standard Java Runtime Environment way:
public String checksum(File file) {
try {
InputStream fin = new FileInputStream(file);
java.security.MessageDigest md5er =
MessageDigest.getInstance("MD5");
byte[] buffer = new byte[1024];
int read;
do {
read = fin.read(buffer);
if (read > 0)
md5er.update(buffer, 0, read);
} while (read != -1);
fin.close();
byte[] digest = md5er.digest();
if (digest == null)
return null;
String strDigest = "0x";
for (int i = 0; i < digest.length; i++) {
strDigest += Integer.toString((digest[i] & 0xff)
+ 0x100, 16).substring(1).toUpperCase();
}
return strDigest;
} catch (Exception e) {
return null;
}
}
The result is equal of linux md5sum utility.
Here is a simple function that wraps around Sunil's code so that it takes a File as a parameter. The function does not need any external libraries, but it does require Java 7.
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import javax.xml.bind.DatatypeConverter;
public class Checksum {
/**
* Generates an MD5 checksum as a String.
* #param file The file that is being checksummed.
* #return Hex string of the checksum value.
* #throws NoSuchAlgorithmException
* #throws IOException
*/
public static String generate(File file) throws NoSuchAlgorithmException,IOException {
MessageDigest messageDigest = MessageDigest.getInstance("MD5");
messageDigest.update(Files.readAllBytes(file.toPath()));
byte[] hash = messageDigest.digest();
return DatatypeConverter.printHexBinary(hash).toUpperCase();
}
public static void main(String argv[]) throws NoSuchAlgorithmException, IOException {
File file = new File("/Users/foo.bar/Documents/file.jar");
String hex = Checksum.generate(file);
System.out.printf("hex=%s\n", hex);
}
}
Example output:
hex=B117DD0C3CBBD009AC4EF65B6D75C97B
If you're using ANT to build, this is dead-simple. Add the following to your build.xml:
<checksum file="${jarFile}" todir="${toDir}"/>
Where jarFile is the JAR you want to generate the MD5 against, and toDir is the directory you want to place the MD5 file.
More info here.
Google guava provides a new API. Find the one below :
public static HashCode hash(File file,
HashFunction hashFunction)
throws IOException
Computes the hash code of the file using hashFunction.
Parameters:
file - the file to read
hashFunction - the hash function to use to hash the data
Returns:
the HashCode of all of the bytes in the file
Throws:
IOException - if an I/O error occurs
Since:
12.0
public static String getMd5OfFile(String filePath)
{
String returnVal = "";
try
{
InputStream input = new FileInputStream(filePath);
byte[] buffer = new byte[1024];
MessageDigest md5Hash = MessageDigest.getInstance("MD5");
int numRead = 0;
while (numRead != -1)
{
numRead = input.read(buffer);
if (numRead > 0)
{
md5Hash.update(buffer, 0, numRead);
}
}
input.close();
byte [] md5Bytes = md5Hash.digest();
for (int i=0; i < md5Bytes.length; i++)
{
returnVal += Integer.toString( ( md5Bytes[i] & 0xff ) + 0x100, 16).substring( 1 );
}
}
catch(Throwable t) {t.printStackTrace();}
return returnVal.toUpperCase();
}
Pulling together ideas from other answers, here's simple code with no third party dependencies (or DatatypeConverter which is longer in the latest JDKs) that generates this as a hex string compatible with output of the md5sum tool:
import java.io.IOException;
import java.math.BigInteger;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
...
static String calculateMD5(String path) throws IOException
{
try {
MessageDigest md = MessageDigest.getInstance("MD5");
md.update(Files.readAllBytes(Paths.get(path)));
return String.format("%032x", new BigInteger(1, md.digest())); // hex, padded to 32 chars
} catch (NoSuchAlgorithmException ex)
{
throw new RuntimeException(ex); // MD5 is always available so this should be impossible
}
}
I have a java code that want converted to php .
public String getSHA1Hash(String input) throws NoSuchAlgorithmException, UnsupportedEncodingException {
String SHA1Hash = null;
MessageDigest md = MessageDigest.getInstance("SHA1");
md.reset();
byte[] buffer = input.getBytes("UTF-8");
md.update(buffer);
byte[] digest = md.digest();
String hexStr = "";
for (int i = 0; i < digest.length; i++) {
hexStr += Integer.toString((digest[i] & 0xff) + 0x100, 16).substring(1);
SHA1Hash = hexStr;
}
return SHA1Hash;
help
PHP has a function sha1() that creates a sha1 hash from the input string. No need to convert the java function and rebuild the logic.
PHP has a native function to hash sha1 strings:
sha1 — Calculate the sha1 hash of a string
Example from Manual:
$str = 'apple';
if (sha1($str) === 'd0be2dc421be4fcd0172e5afceea3970e2f3d940') {
echo "Would you like a green or red apple?";
}
This will give the same output as your Java code would give for "apple".
I have a scenario create like checkin & checkout of documents. Then how to manage and check that if a user checking in a document, is a new checkin or an existing checkin of document. on what file attributes we can differentiate this. I do not want to use lastModifiedTime, Size, or name of the file. Please let me know. Thanks..
when i had such a thing to do, i tried with MD5 hash (in perl). i think this might help:
How can I generate an MD5 hash?
The md5 file encode will help you with this. This is a working example of how to get the MD5 value.
public static void main(String[] args)throws Exception
{
MessageDigest md = MessageDigest.getInstance("MD5");
FileInputStream fis = new FileInputStream("c:\\loging.log");
byte[] dataBytes = new byte[1024];
int nread = 0;
while ((nread = fis.read(dataBytes)) != -1) {
md.update(dataBytes, 0, nread);
};
byte[] mdbytes = md.digest();
//convert the byte to hex format method 1
StringBuffer sb = new StringBuffer();
for (int i = 0; i < mdbytes.length; i++) {
sb.append(Integer.toString((mdbytes[i] & 0xff) + 0x100, 16).substring(1));
}
System.out.println("Digest(in hex format):: " + sb.toString());
//convert the byte to hex format method 2
StringBuffer hexString = new StringBuffer();
for (int i=0;i<mdbytes.length;i++) {
String hex=Integer.toHexString(0xff & mdbytes[i]);
if(hex.length()==1) hexString.append('0');
hexString.append(hex);
}
System.out.println("Digest(in hex format):: " + hexString.toString());
}
Output:
Digest(in hex format):: e72c504dc16c8fcd2fe8c74bb492affa
Digest(in hex format):: e72c504dc16c8fcd2fe8c74bb492affa
What you have to do is to compare the old MD5 value with the new one and if it corresponds, no changes have been made to the file
in the last 5 hours im trying to do something that should be very simple and did it in like 10 minutes in C#, but no luck with Java.
I got a 32 UpperCase and Numeric String (A-Z0-9), I need to convert this String to Dec, and then md5 it.
My problem is that I dont have unsgined bytes so I cant md5 my array :\
Here is what I need to do in python:
salt = words[1].decode("hex")
passwordHash = generatePasswordHash(salt, pw)
generatePasswordHash(salt, password):
m = md5.new()
m.update(salt)
m.update(password)
return m.digest()
and here it is in C# :
public static string GeneratePasswordHash(byte[] a_bSalt, string strData) {
MD5 md5Hasher = MD5.Create();
byte[] a_bCombined = new byte[a_bSalt.Length + strData.Length];
a_bSalt.CopyTo(a_bCombined, 0);
Encoding.Default.GetBytes(strData).CopyTo(a_bCombined, a_bSalt.Length);
byte[] a_bHash = md5Hasher.ComputeHash(a_bCombined);
StringBuilder sbStringifyHash = new StringBuilder();
for (int i = 0; i < a_bHash.Length; i++) {
sbStringifyHash.Append(a_bHash[i].ToString("X2"));
}
return sbStringifyHash.ToString();
}
protected byte[] HashToByteArray(string strHexString) {
byte[] a_bReturn = new byte[strHexString.Length / 2];
for (int i = 0; i < a_bReturn.Length; i++) {
a_bReturn[i] = Convert.ToByte(strHexString.Substring(i * 2, 2), 16);
}
return a_bReturn;
}
I will be very happy to get a help with this :)
To parse a hex string into a byte : (byte) Integer.parseInt(s, 16).
To transform your password string into a byte array, using the default encoding (which I suggest not to do : always specify a specific encoding) : password.getBytes() (or password.getBytes(encoding) for a specific encoding).
To hash a byte array : MessageDigest.getInstance("MD5").digest(byte[]).
To transform a byte to a hex String : See In Java, how do I convert a byte array to a string of hex digits while keeping leading zeros?
I believe that something like the following will work:
// convert your hex string to bytes
BigInteger bigInt = new BigInteger(salt, 16);
byte[] bytes = bigInt.toByteArray();
// get the MD5 digest library
MessageDigest md5Digest = null;
try {
md5Digest = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
// error handling here...
}
// by default big integer outputs a 0 sign byte if the first bit is set
if (bigInt.testBit(0)) {
md5Digest.update(bytes, 1, bytes.length - 1);
} else {
md5Digest.update(bytes);
}
// get the digest bytes
byte[] digestBytes = md5Digest.digest();
Here's more ideas for converting a hex string to a byte[] array:
Convert a string representation of a hex dump to a byte array using Java?
You can use unsigned numbers in java with applying bit masks. Take a look details here.
I have a SQL table with usernames and passwords. The passwords are encoded using MessageDigest's digest() method. If I encode a password - let's say "abcdef12" - with MessageDigest's digest() method and then convert it to hexadecimal values, the String is different than if I do the same using PHP's SHA1-method. I'd expect these values to be exactly the same though.
Code that is used to encode the passwords:
MessageDigest md = MessageDigest.getInstance("SHA-1");
byte[] passbyte;
passbyte = "abcdef12".getBytes("UTF-8");
passbyte = md.digest(passbyte);
The conversion of the String to hexadecimal is done using this method:
public static String convertStringToHex(String str) {
char[] chars = str.toCharArray();
StringBuffer hex = new StringBuffer();
for (int i = 0; i < chars.length; i++) {
hex.append(Integer.toHexString((int) chars[i]));
}
return hex.toString();
}
Password: abcdef12
Here's the password as returned by a lot of SHA1-hash online generators and PHP SHA1()-function: d253e3bd69ce1e7ce6074345fd5faa1a3c2e89ef
Here's the password as encoded by MessageDigest: d253e3bd69ce1e7ce674345fd5faa1a3c2e2030ef
Am I forgetting something?
Igor.
Edit: I've found someone with a similar problem: C# SHA-1 vs. PHP SHA-1...Different Results? . The solution was to change encodings.. but I can't change encodings on the server-side since the passwords in that SQL-table are not created by my application.
I use client-side SHA1-encoding using a JavaScript SHA1-class (more precisely: a Google Web Toolkit-class). It works and encodes the string as expected, but apparently using ASCII characters?..
I have the same digest as PHP with my Java SHA-1 hashing function:
public static String computeSha1OfString(final String message)
throws UnsupportedOperationException, NullPointerException {
try {
return computeSha1OfByteArray(message.getBytes(("UTF-8")));
} catch (UnsupportedEncodingException ex) {
throw new UnsupportedOperationException(ex);
}
}
private static String computeSha1OfByteArray(final byte[] message)
throws UnsupportedOperationException {
try {
MessageDigest md = MessageDigest.getInstance("SHA-1");
md.update(message);
byte[] res = md.digest();
return toHexString(res);
} catch (NoSuchAlgorithmException ex) {
throw new UnsupportedOperationException(ex);
}
}
I've added to my unit tests:
String sha1Hash = StringHelper.computeSha1OfString("abcdef12");
assertEquals("d253e3bd69ce1e7ce6074345fd5faa1a3c2e89ef", sha1Hash);
Full source code for the class is on github.
Try this - it is working for me:
MessageDigest md = MessageDigest.getInstance(algorithm);
md.update(original.getBytes());
byte[] digest = md.digest();
StringBuffer sb = new StringBuffer();
for (byte b : digest) {
sb.append(Integer.toString((b & 0xff) + 0x100, 16).substring(1));
}
return sb.toString();
Regards,
Konki
It has nothing to do with the encodings. The output would be entirely different.
For starters, your function convertStringToHex() doesn't output leading zeros, that is, 07 becomes just 7.
The rest (changing 89 to 2030) is also likely to have something to do with that function. Try looking at the value of passbyte after passbyte = md.digest(passbyte);.
Or try this:
MessageDigest md = MessageDigest.getInstance("SHA-1");
md.update(clearPassword.getBytes("UTF-8"));
return new BigInteger(1 ,md.digest()).toString(16));
Cheers Roy