Original Python code:
import hashlib
return int(hashlib.md5("string").hexdigest(), 16) % 100
My attempt to translate into Java:
import java.security.*;
import java.math.*;
String s = "string";
MessageDigest m = MessageDigest.getInstance("MD5");
m.update(s.getBytes(), 0, s.length());
BigInteger i = BigInteger(1,m.digest());
return i % 100;
What am I doing wrong?
I see you haven't even tried to compile this code, since it's not working. BigInteger isn't a primitive type, therefore % operator doesn't work. You should use .mod method instead.
Your code, fixed:
String s = "string";
MessageDigest m = MessageDigest.getInstance("MD5");
m.update(s.getBytes(), 0, s.length());
BigInteger i = new BigInteger(1, m.digest());
return i.mod(BigInteger.valueOf(100));
And since you're updating 0 - s.length(), you can just pass all the bytes to m.digest, make it one line shorter:
String s = "string";
MessageDigest m = MessageDigest.getInstance("MD5");
BigInteger i = new BigInteger(1, m.digest(s.getBytes()));
return i.mod(BigInteger.valueOf(100));
And if you really want an one-liner... Warning, highly unreadable:
return new BigInteger(1, MessageDigest.getInstance("MD5").digest( s.getBytes()) ).mod(BigInteger.valueOf(100));
Suggestion
It's always recommended to specify the encoding of the input string while converting to byte stream.
Without specifying an encoding scheme, programming language implementations are free to choose their default format. This can vary across programming languages and even across different versions of the same programming language.
Python
An advantage of python is, the ability to handle arbitrarily large integer values
import hashlib
data: str = "string"
print(int(hashlib.md5(data.encode('utf-8')).hexdigest(), 16) % 100)
Java
BigInteger is needed to deal with long values outside the Long.MIN_VALUE and Long.MAX_VALUE boundaries.
MessageDigest instances are not thread safe, so use with care(new instances or ThreadLocal) in multithreaded use cases.
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.math.BigInteger;
public class TestMD5 {
private static final String DIGEST_ALGO = "MD5";
public static void main(String[] args) throws Exception {
String data = "string";
byte[] digest = MessageDigest.getInstance(DIGEST_ALGO).digest(data.getBytes(StandardCharsets.UTF_8));
int value = new BigInteger(1, digest).mod(BigInteger.valueOf(100)).intValueExact();
System.out.println(value);
}
}
Both these solutions will print 81 for the given input string
Note
The implementation can print the same value for the given input string even without adding the encoding. It should not relied upon as this string is within the ascii range.
Even for unicode character set, it is recommended to specify encoding as the same code point can be encoded in different encoding schemes like utf-8, utf-16
while converting a byte stream to string, always specify the (de)encoding scheme
Related
I searched google about this information but the answers I found do not apply to my case.
I have an HEX string like the following:
hexString = '7d940ef9790c31334ac6f116814148b9abe73f32'
Python can convert this string to a binary value using the following function:
unhexlify('7d940ef9790c31334ac6f116814148b9abe73f32')
whose result is:
binString = '}\x94\x0e\xf9y\x0c13J\xc6\xf1\x16\x81AH\xb9\xab\xe7?2'
That is, a string containing the binary information of the original hex string.
I tried to use the .getBytes("encoding") method in java, but I am not able to reproduce this result, and unfortunately this result is critical for my application (I need exactly the same result).
I'm not an encodings pro, so it could easily be me overlooking something.
I need to convert to the same kind of string as "binString" a byte[] array resulting from e.g. a md5 digest, so any insight on how to convert a byte[] to such a string would be most appreciated.
It's not a solution, but it could be helpful:
import java.nio.charset.Charset;
import javax.xml.bind.DatatypeConverter;
public class Main {
public static void main(String args[])
{
String hexString = "7d940ef9790c31334ac6f116814148b9abe73f32";
byte[] out = toByteArray(hexString);
String result = new String(out,Charset.forName("UTF-8"));
System.out.println(result);
}
public static byte[] toByteArray(String s) {
return DatatypeConverter.parseHexBinary(s);
}
}
The output in my machine is:
}??y13J???AH????2
It can print all the ascii chracters, but there are problems with the escape characters like \x.
Given the following Java code:
import java.math.BigInteger;
import java.util.Base64;
...
String myvar64 = "AQAB"; // assume this is much longer..
byte[] myvarB = Base64.getDecoder().decode(myvar64);
BigInteger myvar = new BigInteger(1, myvarB);
how do you convert the code to python?
I'm assuming the Java code is written that way because Java doesn't have bigint literals (I don't know much java..) If so I'm assuming a simple assignment is sufficient (perhaps with a code comment indicating the original source string):
myvar = 65537 # int("AQAB".decode('base64').encode('hex'), 16)
I tried the following code:
import java.math.BigInteger;
import org.apache.commons.codec.binary.Base32;
import org.junit.Test;
public class Sandbox
{
#Test
public void testSomething() {
String sInput = "GIYTINZUHAZTMNBX";
BigInteger bb = new BigInteger(new Base32().decode(sInput));
System.out.println("number = " + bb);
}
}
and heres the output:
number = 237025977136523702055991
using this website to convert between base 32 I get a different result than the actual output. Heres the result I expect to see based on what I got from the website:
expected output = 2147483647
Any idea why this is happening?
Edit:
Forgive me for making it confusing by purposefully attempting to convert 2^31-1.
Using the conversion website I linked to earlier, I changed the input:
String sInput = "GE4DE===";
Expected output:
number = 182
Actual output:
number = 3225650
What you're doing is correct... assuming that the Base32 string comes from Base32-encoding a byte array you get from calling BigInteger.toByteArray().
BigInteger(byte[] val) does not really take an array of arbitrary bytes. It takes the byte[] representation of a BigInteger. Also, it assumes the most-significant byte is in val[0]).
If it's base-32 the X, Y, and Z shouldn't be there. Are you sure it isn't base-36?
I am new to java but I am very fluent in C++ and C# especially C#. I know how to do xor encryption in both C# and C++. The problem is the algorithm I wrote in Java to implement xor encryption seems to be producing wrong results. The results are usually a bunch of spaces and I am sure that is wrong. Here is the class below:
public final class Encrypter {
public static String EncryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
public static String DecryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
}
Strings in Java are Unicode - and Unicode strings are not general holders for bytes like ASCII strings can be.
You're taking a string and converting it to bytes without specifying what character encoding you want, so you're getting the platform default encoding - probably US-ASCII, UTF-8 or one of the Windows code pages.
Then you're preforming arithmetic/logic operations on these bytes. (I haven't looked at what you're doing here - you say you know the algorithm.)
Finally, you're taking these transformed bytes and trying to turn them back into a string - that is, back into characters. Again, you haven't specified the character encoding (but you'll get the same as you got converting characters to bytes, so that's OK), but, most importantly...
Unless your platform default encoding uses a single byte per character (e.g. US-ASCII), then not all of the byte sequences you will generate represent valid characters.
So, two pieces of advice come from this:
Don't use strings as general holders for bytes
Always specify a character encoding when converting between bytes and characters.
In this case, you might have more success if you specifically give US-ASCII as the encoding. EDIT: This last sentence is not true (see comments below). Refer back to point 1 above! Use bytes, not characters, when you want bytes.
If you use non-ascii strings as keys you'll get pretty strange results. The bytes in the kbytes array will be negative. Sign-extension then means that val will come out negative. The cast to char will then produce a character in the FF80-FFFF range.
These characters will certainly not be printable, and depending on what you use to check the output you may be shown "box" or some other replacement characters.
PHP code:
echo hash('sha256', 'jake');
PHP output:
cdf30c6b345276278bedc7bcedd9d5582f5b8e0c1dd858f46ef4ea231f92731d
Java code:
String s = "jake";
MessageDigest md = MessageDigest.getInstance("SHA-256");
md.update(s.getBytes(Charset.forName("UTF-8")));
byte[] hashed = md.digest();
String s2 = "";
for (byte b : hashed) {
s2 += b;
}
System.out.println(s2);
Java output:
-51-1312107528211839-117-19-57-68-19-39-43884791-1141229-4088-12110-12-223531-11011529
I had expected the two to return the same result. Obviously, this is not the case. How can I get the two to match up or is it impossible?
EDIT: I had made a mistake, think I have the answer to the question now anyway.
Well, the very first thing you need to do is use a consistent string encoding. I've no idea what PHP will do, but "jake".getBytes() will use whatever your platform default encoding is for Java. That's a really bad idea. Using UTF-8 would probably be a good start, assuming that PHP copes with Unicode strings to start with. (If it doesn't, you'll need to work out what it is doing and try to make the two consistent.) In Java, use the overload of String.getBytes() which takes a Charset or the one which takes the name of a Charset. (Personally I like to use Guava's Charsets.UTF_8.)
Then persuade PHP to use UTF-8 as well.
Then output the Java result in hex. I very much doubt that the code you've given is the actual code you're running, as otherwise I'd expect output such as "[B#e48e1b". Whatever you're doing to convert the byte array into a string, change it to use hex.
They are printing the same .. convert your byte[] to a hex string, then you'll see CDF30C6B345276278BEDC7BCEDD9D5582F5B8E0C1DD858F46EF4EA231F92731D as Java output, too:
public void testSomething() throws Exception {
MessageDigest md = MessageDigest.getInstance("SHA-256");
md.update("jake".getBytes());
System.out.println(getHex(md.digest()));
}
static final String HEXES = "0123456789ABCDEF";
public static String getHex( byte [] raw ) {
if ( raw == null ) {
return null;
}
final StringBuilder hex = new StringBuilder( 2 * raw.length );
for ( final byte b : raw ) {
hex.append(HEXES.charAt((b & 0xF0) >> 4))
.append(HEXES.charAt((b & 0x0F)));
}
return hex.toString();
}
You need to convert the digest to a HEX string before printing it out. Example code can be found here.