How SHA-1 hashing works on Java on Android - java

I searched around for how to hash device identifiers and stumbled on the following code.
I don't really understand what it's doing.
Why do I need to urlEncode the device id ?
Why do I need to hash the bytes, couldn't I just do that on a String ?
Why do I need to convert it to a BigInteger ?
Why do I need to shift bits to get a String with the hashed id ?
Can anyone explain what's going on line by line? I hope this will help other people understand this snippet that's getting passed around in blogs and forums, too.
String hashedId = "";
String deviceId = urlEncode(Secure.getString(context.getContentResolver(), Secure.ANDROID_ID));
try {
MessageDigest digest = MessageDigest.getInstance("SHA-1");
byte bytes[] = digest.digest(deviceId.getBytes());
BigInteger b = new BigInteger(1, bytes);
hashedId = String.format("%0" + (bytes.length << 1) + "x", b);
} catch (NoSuchAlgorithmException e) {
//ignored
}
return hashedId;

Why do I need to urlEncode the device id ?
Why do I need to hash the bytes, couldn't I just do that on a String ?
Most hashing algorithms, including SHA-1, work on binary data as input (i.e. bytes). Strings themselves don't have a specific binary representation; it changes depending on the encoding.
The line of code they provide uses the default encoding, which is a bit fragile. I would prefer to see something like
byte bytes[] = digest.digest(deviceId.getBytes(Charset.forName("UTF-8")));
Why do I need to convert it to a BigInteger ?
This is being used for convenience to help with the conversion to a hexadecimal representation.
Why do I need to shift bits to get a String with the hashed id ?
The format String being used is %0Nx, which causes the string to be zero-padded to N characters. Since it takes two characters to represent a byte in hexadecimal, N is bytes*2, which is the result as bytes << 1.
I don't really understand why you wouldn't just include Guava for Android and use the Hashing builder:
String hash = Hashing.sha1().hashString(deviceId, Charsets.UTF_8).toString();
It's one line and doesn't throw checked exceptions.

About the bit-shifting: shifting left by one is equivalent to multiplying by 2. Each byte in the string is represented by 2 hex characters, so the resulting string will be twice as long as the number of bytes in the hash.
This will create a format string that looks something like %032x, which will print an integral value as a zero-padded 32-character string.

You need to hash the bytes, rather than the String, so that you're hashing the character data rather than the String object, which may have unpredictable internal state for a given sequence of characters.
It's converted to BigInteger so it can be consistently formatted with two hex digits per byte. (This is why the length is multiplied by two with the left shift.)
Basically, the answer to all of your questions is: so that you get reliable, repeatable results, even on different platforms.

You Can Use this Code also :
public class sha1Calculate {
public static void main(String[] args)throws Exception
{
File file = new File("D:\\Android Links.txt");
String outputTxt= "";
String hashcode = null;
try {
FileInputStream input = new FileInputStream(file);
ByteArrayOutputStream output = new ByteArrayOutputStream ();
byte [] buffer = new byte [65536];
int l;
while ((l = input.read (buffer)) > 0)
output.write (buffer, 0, l);
input.close ();
output.close ();
byte [] data = output.toByteArray ();
MessageDigest digest = MessageDigest.getInstance( "SHA-1" );
byte[] bytes = data;
digest.update(bytes, 0, bytes.length);
bytes = digest.digest();
StringBuilder sb = new StringBuilder();
for( byte b : bytes )
{
sb.append( String.format("%02X", b) );
}
System.out.println("Digest(in hex format):: " + sb.toString());
}catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (NoSuchAlgorithmException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

Related

Failure encoding files in base64 java

I have this class to encode and decode a file. When I run the class with .txt files the result is successfully. But when I run the code with .jpg or .doc I can not open the file or it is not equals to original. I don’t know why this is happening. I have modified this class
http://myjeeva.com/convert-image-to-string-and-string-to-image-in-java.html. But i want change this line
byte imageData[] = new byte[(int) file.length()];
for
byte example[] = new byte[1024];
and read the file so many times how we need. Thanks.
import java.io.*;
import java.util.*;
public class Encode {
Input = Input file root - Output = Output file root - imageDataString =String encoded
String input;
String output;
String imageDataString;
public void setFileInput(String input){
this.input=input;
}
public void setFileOutput(String output){
this.output=output;
}
public String getFileInput(){
return input;
}
public String getFileOutput(){
return output;
}
public String getEncodeString(){
return imageDataString;
}
public String processCode(){
StringBuilder sb= new StringBuilder();
try{
File fileInput= new File( getFileInput() );
FileInputStream imageInFile = new FileInputStream(fileInput);
i have seen in examples that people create a byte[] with the same length than the file. I don´t want this because i will not know what length will have the file.
byte buff[] = new byte[1024];
int r = 0;
while ( ( r = imageInFile.read( buff)) > 0 ) {
String imageData = encodeImage(buff);
sb.append( imageData);
if ( imageInFile.available() <= 0 ) {
break;
}
}
} catch (FileNotFoundException e) {
System.out.println("File not found" + e);
} catch (IOException ioe) {
System.out.println("Exception while reading the file " + ioe);
}
imageDataString = sb.toString();
return imageDataString;
}
public void processDecode(String str) throws IOException{
byte[] imageByteArray = decodeImage(str);
File fileOutput= new File( getFileOutput());
FileOutputStream imageOutFile = new FileOutputStream( fileOutput);
imageOutFile.write(imageByteArray);
imageOutFile.close();
}
public static String encodeImage(byte[] imageByteArray) {
return Base64.getEncoder().withoutPadding().encodeToString( imageByteArray);
}
public static byte[] decodeImage(String imageDataString) {
return Base64.getDecoder().decode( imageDataString);
}
public static void main(String[] args) throws IOException {
Encode a = new Encode();
a.setFileInput( "C://Users//xxx//Desktop//original.doc");
a.setFileOutput("C://Users//xxx//Desktop//original-copied.doc");
a.processCode( );
a.processDecode( a.getEncodeString());
System.out.println("C O P I E D");
}
}
I tried changing
String imageData = encodeImage(buff);
for
String imageData = encodeImage(buff,r);
and the method encodeImage
public static String encodeImage(byte[] imageByteArray, int r) {
byte[] aux = new byte[r];
for ( int i = 0; i < aux.length; i++) {
aux[i] = imageByteArray[i];
if ( aux[i] <= 0 ) {
break;
}
}
return Base64.getDecoder().decode( aux);
}
But i have the error:
Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits
You have two problems in your program.
The first, as mentioned in by #Joop Eggen, is that you are not handling your input correctly.
In fact, Java does not promise you that even in the middle of the file, you'll be reading the entire 1024 bytes. It could just read 50 bytes, and tell you it read 50 bytes, and then the next time it will read 50 bytes more.
Suppose you read 1024 bytes in the previous round. And now, in the current round, you're only reading 50. Your byte array now contains 50 of the new bytes, and the rest are the old bytes from the previous read!
So you always need to copy the exact number of bytes copied to a new array, and pass that on to your encoding function.
So, to fix this particular problem, you'll need to do something like:
while ( ( r = imageInFile.read( buff)) > 0 ) {
byte[] realBuff = Arrays.copyOf( buff, r );
String imageData = encodeImage(realBuff);
...
}
However, this is not the only problem here. Your real problem is with the Base64 encoding itself.
What Base64 does is take your bytes, break them into 6-bit chunks, and then treat each of those chunks as a number between N 0 and 63. Then it takes the Nth character from its character table, to represent that chunk.
But this means it can't just encode a single byte or two bytes, because a byte contains 8 bits, and which means one chunk of 6 bits, and 2 leftover bits. Two bytes have 16 bits. Thats 2 chunks of 6 bits, and 4 leftover bits.
To solve this problem, Base64 always encodes 3 consecutive bytes. If the input does not divide evenly by three, it adds additional zero bits.
Here is a little program that demonstrates the problem:
package testing;
import java.util.Base64;
public class SimpleTest {
public static void main(String[] args) {
// An array containing six bytes to encode and decode.
byte[] fullArray = { 0b01010101, (byte) 0b11110000, (byte)0b10101010, 0b00001111, (byte)0b11001100, 0b00110011 };
// The same array broken into three chunks of two bytes.
byte[][] threeTwoByteArrays = {
{ 0b01010101, (byte) 0b11110000 },
{ (byte)0b10101010, 0b00001111 },
{ (byte)0b11001100, 0b00110011 }
};
Base64.Encoder encoder = Base64.getEncoder().withoutPadding();
// Encode the full array
String encodedFullArray = encoder.encodeToString(fullArray);
// Encode the three chunks consecutively
StringBuilder encodedStringBuilder = new StringBuilder();
for ( byte [] twoByteArray : threeTwoByteArrays ) {
encodedStringBuilder.append(encoder.encodeToString(twoByteArray));
}
String encodedInChunks = encodedStringBuilder.toString();
System.out.println("Encoded full array: " + encodedFullArray);
System.out.println("Encoded in chunks of two bytes: " + encodedInChunks);
// Now decode the two resulting strings
Base64.Decoder decoder = Base64.getDecoder();
byte[] decodedFromFull = decoder.decode(encodedFullArray);
System.out.println("Byte array decoded from full: " + byteArrayBinaryString(decodedFromFull));
byte[] decodedFromChunked = decoder.decode(encodedInChunks);
System.out.println("Byte array decoded from chunks: " + byteArrayBinaryString(decodedFromChunked));
}
/**
* Convert a byte array to a string representation in binary
*/
public static String byteArrayBinaryString( byte[] bytes ) {
StringBuilder sb = new StringBuilder();
sb.append('[');
for ( byte b : bytes ) {
sb.append(Integer.toBinaryString(Byte.toUnsignedInt(b))).append(',');
}
if ( sb.length() > 1) {
sb.setCharAt(sb.length() - 1, ']');
} else {
sb.append(']');
}
return sb.toString();
}
}
So, imagine my 6-byte array is your image file. And imagine that your buffer is not reading 1024 bytes but 2 bytes each time. This is going to be the output of the encoding:
Encoded full array: VfCqD8wz
Encoded in chunks of two bytes: VfAqg8zDM
As you can see, the encoding of the full array gave us 8 characters. Each group of three bytes is converted into four chunks of 6 bits, which in turn are converted into four characters.
But the encoding of the three two-byte arrays gave you a string of 9 characters. It's a completely different string! Each group of two bytes was extended to three chunks of 6 bits by padding with zeros. And since you asked for no padding, it produces only 3 characters, without the extra = that usually marks when the number of bytes is not divisible by 3.
The output from the part of the program that decodes the 8-character, correct encoded string is fine:
Byte array decoded from full: [1010101,11110000,10101010,1111,11001100,110011]
But the result from attempting to decode the 9-character, incorrect encoded string is:
Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits
at java.util.Base64$Decoder.decode0(Base64.java:734)
at java.util.Base64$Decoder.decode(Base64.java:526)
at java.util.Base64$Decoder.decode(Base64.java:549)
at testing.SimpleTest.main(SimpleTest.java:34)
Not good! A good base64 string should always have multiples of 4 characters, and we only have 9.
Since you chose a buffer size of 1024, which is not a multiple of 3, that problem will happen. You need to encode a multiple of 3 bytes each time to produce the proper string. So in fact, you need to create a buffer sized 3072 or something like that.
But because of the first problem, be very careful at what you pass to the encoder. Because it can always happen that you'll be reading less than 3072 bytes. And then, if the number is not divisible by three, the same problem will occur.
Look at:
while ( ( r = imageInFile.read( buff)) > 0 ) {
String imageData = encodeImage(buff);
read returns -1 on end-of-file or the actual number of bytes that were read.
So the last buff might not be totally read, and even contain garbage from any prior read. So you need to use r.
As this is an assignment, the rest is up to you.
By the way:
byte[] array = new byte[1024]
is more conventional in Java. The syntax:
byte array[] = ...
was for compatibility with C/C++.

JAVA Md5 returning non-deterministic results

I have written following function to compute Md5 checksum in Java.
class Utils {
public static String md5Hash(String input) {
String result = "";
try {
System.out.println("Input=" + input);
final MessageDigest md = MessageDigest.getInstance("MD5");
md.reset();
md.update(input.getBytes());
result = md.digest().toString();
} catch (Exception ee) {
System.err.println("Error computing MD5 Hash");
}
return result;
}
};
Calling Utils.md5Hash("abcde") multiple times gives different results. My understanding says md5 returns a deterministic and unique checksum for a string. Is that wrong? Else please let me know the bug in my implementation. Thanks
The toString() method of a byte array doesn't return a meaningful string. It returns the type of the array object, followed by the hashCode of the array.
Transform the byte array to a String using Hex or Base64 encoding if you want it printed. Apache commons-codec has methods to do that.
Also, make sure to specify en encoding which supports any kind of character to transform your string to a byte array. The method you're using uses the platform default encoding, which could fail if, for example, it's latin-1 and you're transforming non-latin-1 characters. UTF-8 is a good choice.
I have done using the following way :
public static String encryptedLoginPassword( String password )
{
String encryptedData="";
try{
MessageDigest algorithm = MessageDigest.getInstance("MD5");
byte[] defaultBytes = password.getBytes();
algorithm.reset();
algorithm.update(defaultBytes);
byte messageDigest[] = algorithm.digest();
StringBuffer hexString = new StringBuffer();
for (int i=0;i<messageDigest.length;i++) {
hexString.append(Integer.toHexString(0xFF & messageDigest[i]));
}
encryptedData=hexString.toString();
}catch(NoSuchAlgorithmException nsae){
}
return encryptedData;
}
int the code given by Dinup Kandel, I had to change this:
for (int i=0;i<messageDigest.length;i++) {
hexString.append(Integer.toHexString(0xFF & messageDigest[i]));
}
in to
if ((0xff & messageDigest[i]) < 0x10) {
hexString.append("0"
+ Integer.toHexString((0xFF & messageDigest[i])));
} else {
hexString.append(Integer.toHexString(0xFF & messageDigest[i]));
}
to get my unit tests working.
note: i used this to verify the correct answer:
echo -n MyTestString | md5sum

Strings are not equal? Testing out an md5 hashing thing I've been doing, and yes I am using .equals

Here is the code for my class:
public class Md5tester {
private String licenseMd5 = "?jZ2$??f???%?";
public Md5tester(){
System.out.println(isLicensed());
}
public static void main(String[] args){
new Md5tester();
}
public boolean isLicensed(){
File f = new File("C:\\Some\\Random\\Path\\toHash.txt");
if (!f.exists()) {
return false;
}
try {
BufferedReader read = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
//get line from txt
String line = read.readLine();
//output what line is
System.out.println("Line read: " + line);
//get utf-8 bytes from line
byte[] lineBytes = line.getBytes("UTF-8");
//declare messagedigest for hashing
MessageDigest md = MessageDigest.getInstance("MD5");
//hash the bytes of the line read
String hashed = new String(md.digest(lineBytes), "UTF-8");
System.out.println("Hashed as string: " + hashed);
System.out.println("LicenseMd5: " + licenseMd5);
System.out.println("Hashed as bytes: " + hashed.getBytes("UTF-8"));
System.out.println("LicenseMd5 as bytes: " + licenseMd5.getBytes("UTF-8"));
if (hashed.equalsIgnoreCase(licenseMd5)){
return true;
}
else{
return false;
}
} catch (FileNotFoundException e) {
return false;
} catch (IOException e) {
return false;
} catch (NoSuchAlgorithmException e) {
return false;
}
}
}
Here's the output I get:
Line read: Testing
Hashed as string: ?jZ2$??f???%?
LicenseMd5: ?jZ2$??f???%?
Hashed as bytes: [B#5fd1acd3
LicenseMd5 as bytes: [B#3ea981ca
false
I'm hoping someone can clear this up for me, because I have no clue what the issue is.
A byte[] returned by MD5 conversion is an arbitrary byte[], therefore you cannot treat it as a valid representation of String in some encoding.
In particular, ?s in ?jZ2$??f???%? correspond to bytes that cannot be represented in your output encoding. It means that content of your licenseMd5 is already damaged, therefore you cannot compare your MD5 hash with it.
If you want to represent your byte[] as String for further comparison, you need to choose a proper representation for arbitrary byte[]s. For example, you can use Base64 or hex strings.
You can convert byte[] into hex string as follows:
public static String toHex(byte[] in) {
StringBuilder out = new StringBuilder(in.length * 2);
for (byte b: in) {
out.append(String.format("%02X", (byte) b));
}
return out.toString();
}
Also note that byte[] uses default implementation of toString(). Its result (such as [B#5fd1acd3) is not related to the content of byte[], therefore it's meaningless in your case.
The ? symbols in the printed representation of hashed aren't literal question marks, they're unprintable characters.
You get this error when your java file format is not UTF-8 encoding while you encode a string using UTF-8, try remove UTF-8 and the md5 will output another result, you can copy to the string and see the result true.
Another way is set the file encoding to UTF-8, the string encode also be different

Converting part of a ByteBuffer to a String

I have a ByteBuffer containing bytes that were derived by String.getBytes(charsetName), where "containing" means that the string comprises the entire sequence of bytes between the ByteBuffer's position() and limit().
What's the best way for me to get the string back? (assuming I know the encoding charset) Is there anything better than the following (which seems a little clunky)
byte[] ba = new byte[bbuf.remaining()];
bbuf.get(ba);
try {
String s = new String(ba, charsetName);
}
catch (UnsupportedEncodingException e) {
/* take appropriate action */
}
String s = Charset.forName(charsetName).decode(bbuf).toString();

Compressing strings for client/server transport in Java

I work with a propriety client/server message format that restricts what I can send over the wire. I can't send a serialized object, I have to store the data in the message as a String. The data I am sending are large comma-separated values, and I want to compress the data before I pack it into the message as a String.
I attempted to use Deflater/Inflater to achieve this, but somewhere along the line I am getting stuck.
I am using the two methods below to deflate/inflate. However, passing the result of the compressString() method to decompressStringMethod() returns a null result.
public String compressString(String data) {
Deflater deflater = new Deflater();
byte[] target = new byte[100];
try {
deflater.setInput(data.getBytes(UTF8_CHARSET));
deflater.finish();
int deflateLength = deflater.deflate(target);
return new String(target);
} catch (UnsupportedEncodingException e) {
//TODO
}
return data;
}
public String decompressString(String data) {
String result = null;
try {
byte[] input = data.getBytes();
Inflater inflater = new Inflater();
int inputLength = input.length;
inflater.setInput(input, 0, inputLength);
byte[] output = new byte[100];
int resultLength = inflater.inflate(output);
inflater.end();
result = new String(output, 0, resultLength, UTF8_CHARSET);
} catch (DataFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return result;
}
From what I can tell, your current approach is:
Convert String to byte array using getBytes("UTF-8").
Compress byte array
Convert compressed byte array to String using new String(bytes, ..., "UTF-8").
Transmit compressed string
Receive compressed string
Convert compressed string to byte array using getBytes("UTF-8").
Decompress byte array
Convert decompressed byte array to String using new String(bytes, ..., "UTF-8").
The problem with this approach is in step 3. When you compress the byte array, you create a sequence of bytes which may no longer be valid UTF-8. The result will be an exception in step 3.
The solution is to use a "bytes to characters" encoding scheme like Base64 to turn the compressed bytes into a transmissible string. In other words, replace step 3 with a call to a Base64 encode function, and step 6 with a call to a Base64 decode function.
Notes:
For small strings, compressing and
encoding is likely to actually
increase the size of the transmitted string.
If the compacted String is going to be incorporated into a URL, you may want to pick a different encoding to Base64 that avoids characters that need to be URL escaped.
Depending on the nature of the data you are transmitting, you may find that a domain specific compression works better than a generic one. Consider compressing the data before creating the comma-separated string. Consider alternatives to comma-separated strings.
The problem is that you convert compressed bytes to a string, which breaks the data. Your compressString and decompressString should work on byte[]
EDIT: Here is revised version. It works
EDIT2: And about base64. you're sending bytes, not strings. You don't need base64.
public static void main(String[] args) {
String input = "Test input";
byte[] data = new byte[100];
int len = compressString(input, data, data.length);
String output = decompressString(data, len);
if (!input.equals(output)) {
System.out.println("Test failed");
}
System.out.println(input + " " + output);
}
public static int compressString(String data, byte[] output, int len) {
Deflater deflater = new Deflater();
deflater.setInput(data.getBytes(Charset.forName("utf-8")));
deflater.finish();
return deflater.deflate(output, 0, len);
}
public static String decompressString(byte[] input, int len) {
String result = null;
try {
Inflater inflater = new Inflater();
inflater.setInput(input, 0, len);
byte[] output = new byte[100]; //todo may oveflow, find better solution
int resultLength = inflater.inflate(output);
inflater.end();
result = new String(output, 0, resultLength, Charset.forName("utf-8"));
} catch (DataFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return result;
}
TO ME: write compress algorithm myself is difficult but writing binary to string is not. So if I were you, I will serialize the object normally and zip it with compression (as provided by ZipFile) then convert to string using something like Base64 Encode/Decode.
I actually have BASE64 ENCODE/DECODE functions. If you wanted I can post it here.
If you have a piece of code which seems to be silently failing, perhaps you shouldn't catch and swallow Exceptions:
catch (UnsupportedEncodingException e) {
//TODO
}
But the real reason why decompress returns null is because your exception handling doesn't specify what to do with result when you catch an exception - result is left as null. Are you checking the output to see if any Exceptions are occuring?
If I run your decompress() on a badly formatted String, Inflater throws me this DataFormatException:
java.util.zip.DataFormatException: incorrect header check
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:223)
at java.util.zip.Inflater.inflate(Inflater.java:240)
Inflator/Deflator is not a solution for compress string.
I think GZIPInputString and GZIPOutputString is the proper tool to compress the string
I was facing similar issue which was resolved by base64 decoding the input.
i.e instead of
data.getBytes(UTF8_CHARSET)
i tried
Base64.decodeBase64(data)
and it worked.

Categories