PHP code:
echo hash('sha256', 'jake');
PHP output:
cdf30c6b345276278bedc7bcedd9d5582f5b8e0c1dd858f46ef4ea231f92731d
Java code:
String s = "jake";
MessageDigest md = MessageDigest.getInstance("SHA-256");
md.update(s.getBytes(Charset.forName("UTF-8")));
byte[] hashed = md.digest();
String s2 = "";
for (byte b : hashed) {
s2 += b;
}
System.out.println(s2);
Java output:
-51-1312107528211839-117-19-57-68-19-39-43884791-1141229-4088-12110-12-223531-11011529
I had expected the two to return the same result. Obviously, this is not the case. How can I get the two to match up or is it impossible?
EDIT: I had made a mistake, think I have the answer to the question now anyway.
Well, the very first thing you need to do is use a consistent string encoding. I've no idea what PHP will do, but "jake".getBytes() will use whatever your platform default encoding is for Java. That's a really bad idea. Using UTF-8 would probably be a good start, assuming that PHP copes with Unicode strings to start with. (If it doesn't, you'll need to work out what it is doing and try to make the two consistent.) In Java, use the overload of String.getBytes() which takes a Charset or the one which takes the name of a Charset. (Personally I like to use Guava's Charsets.UTF_8.)
Then persuade PHP to use UTF-8 as well.
Then output the Java result in hex. I very much doubt that the code you've given is the actual code you're running, as otherwise I'd expect output such as "[B#e48e1b". Whatever you're doing to convert the byte array into a string, change it to use hex.
They are printing the same .. convert your byte[] to a hex string, then you'll see CDF30C6B345276278BEDC7BCEDD9D5582F5B8E0C1DD858F46EF4EA231F92731D as Java output, too:
public void testSomething() throws Exception {
MessageDigest md = MessageDigest.getInstance("SHA-256");
md.update("jake".getBytes());
System.out.println(getHex(md.digest()));
}
static final String HEXES = "0123456789ABCDEF";
public static String getHex( byte [] raw ) {
if ( raw == null ) {
return null;
}
final StringBuilder hex = new StringBuilder( 2 * raw.length );
for ( final byte b : raw ) {
hex.append(HEXES.charAt((b & 0xF0) >> 4))
.append(HEXES.charAt((b & 0x0F)));
}
return hex.toString();
}
You need to convert the digest to a HEX string before printing it out. Example code can be found here.
Related
I am trying to translate one PHP encoding function to Android Java method. Because Java string length function handles UTF-8 string differently. I failed to make the translated Java codes consistent with PHP code in converting the second UTF-8 str2. The first non UTF-8 string does work.
The original PHP codes are :
function myhash_php($string,$key) {
$strLen = strlen($string);
$keyLen = strlen($key);
$j=0 ; $hash = "" ;
for ($i = 0; $i < $strLen; $i++) {
$ordStr = ord(substr($string,$i,1));
if ($j == $keyLen) { $j = 0; }
$ordKey = ord(substr($key,$j,1));
$j++;
$hash .= strrev(base_convert(dechex($ordStr + $ordKey),16,36));
}
return $hash;
}
$str1 = "good friend" ;
$str2 = "好友" ; // strlen($str2) == 6
$key = "iuyhjf476" ;
echo "php encode str1 '". $str1 ."'=".myhash_php($str1, $key)."<br>";
echo "php encode str2 '". $str2 ."'=".myhash_php($str2, $key)."<br>";
PHP output are:
php encode str1 'good friend'=s5c6g6o5u3o5m4g4b4z516
php encode str2 '好友'=a9u7m899x6p6
Current translated Java codes that produce wrong result are:
public static String hash_java(String string, String key) {
//Integer strLen = byteLenUTF8(string) ; // consistent with php strlen("好友")==6
//Integer keyLen = byteLenUTF8(key) ; // byteLenUTF8("好友") == 6
Integer strLen = string.length() ; // "好友".length() == 2
Integer keyLen = key.length() ;
int j=0 ;
String hash = "" ;
int ordStr, ordKey ;
for (int i = 0; i < strLen; i++) {
ordStr = ord_java(string.substring(i,i+1)); //string is String, php substr($string,$i,$n) == java string.substring(i, i+n)
// ordStr = ord_java(string[i]); //string is byte[], php substr($string,$i,$n) == java string.substring(i, i+n)
if (j == keyLen) { j = 0; }
ordKey = ord_java(key.substring(j,j+1));
j++;
hash += strrev(base_convert(dechex(ordStr + ordKey),16,36));
}
return hash;
}
// return the ASCII code of the first character of str
public static int ord_java( String str){
return( (int) str.charAt(0) ) ;
}
public static String dechex(int input ) {
String hex = Integer.toHexString(input ) ;
return hex ;
}
public static String strrev(String str){
return new StringBuilder(str).reverse().toString() ;
}
public static String base_convert(String str, int fromBase, int toBase) {
return Integer.toString(Integer.parseInt(str, fromBase), toBase);
}
String str1 = "good friend" ;
String str2 = "好友" ;
String key = "iuyhjf476" ;
Log.d(LogTag,"java encode str1 '"+ str1 +"'="+hash_java(str1, key)) ;
Log.d(LogTag,"java encode str2 '"+ str2 +"'="+hash_java(str2, key)) ;
Java output are:
java encode str1 'good friend'=s5c6g6o5u3o5m4g4b4z516
java encode str2 '好友'=arh4ng
The encoded output of UTF-8 str2 in Java method is not correct. How to fix the problem?
Do not use literals for testing - this is prone to yield unexpected results if not fully being aware of what you do and how the file is encoded. For UTF-8 you should everything treat as raw bytes and never use a String for en/decoding. Example in PHP:
$test1 = pack( 'H*', '414243' ); // "ABC" in hexadecimal: 2 digits per byte
$test2 = pack( 'H*', 'e5a5bde58f8b' ); // "好友" in hexadecimal, UTF-8 encoded, 3 bytes per character
Example in Java:
byte[] test1 = new byte[] { 0x41, 0x42, 0x43 }; // "ABC"
byte[] test2 = new byte[] { (byte)0xe5, (byte)0xa5, (byte)0xbd, (byte)0xe5, (byte)0x8f, (byte)0x8b }; // "好友"
Only this way you can make sure your test is set up correctly and unbound to how the source file is encoded. If your Java file is encoded in UTF-8 and your PHP file is encoded in UTF-16LE then you'd fail even worse, simply because you didn't separate between definition (raw bytes) and assumption (strings based on the text encoding) so far.
(This is also a big misunderstanding when people want to en/decrypt texts: they operate on (any programming language's) String rather than the actual bytes and then wonder why different results occur with a different programming language.)
In Java, convert the string to a byte array, using UTF-8 character encoding. Then, apply your encoding algorithm to this byte array instead of the string.
Your PHP program seems to implicitly do the same thing, to treat e.g. the character 好 as three individual byte values, according to UTF-8 encoding.
EDIT:
In the comments, you say you receive the string from the user entering it on Android. So, you start with a Java String coming from some UI widget.
And you need that Java String to give the same result that the given PHP function will produce when fed with the same UTF-8 string. The resulting string will only use ASCII characters, so its character encoding is less problematic (doesn't matter whetherit's e.g. ISO-8859-1 or UTF-8).
The PHP string datatype is ignorant about the encoding, just stores a sequence of bytes, so in general it might contain ISO-8859-1 bytes where one byte represents one character, or UTF-8 byte sequences, where characters often occupy multiple bytes, or any other encoding. The PHP string does not know how the bytes are meant to be interpreted as characters, it just sees and counts bytes.
So, what your PHP string calls "characters", effectively is the bytes of the UTF-8 encoding, and the Java side must emulate this behaviour when doing its algorithm.
Java has a String data type very different from PHP, not based on byte sequences, but (mainly) seeing a string as a sequence of characters. So, if you work with the characters of the Java String, you'll not see the same sequence of elements that PHP sees.
When Java iterates over a String like "好友", there are two steps, one for each of the two characters (seeing the character's Unicode code point number), while PHP has six steps, one for each byte of the UTF-8 representation, seeing the byte value.
So, to emulate PHP, in Java you have to convert the String to a byte[] array using UTF-8 encoding. This way, one Java byte will correspond to one PHP character.
Remark
By the way, the wording "UTF-8 string" does not make sense in Java.
That is different from PHP where e.g. "Maß" as ISO-8859-1 string (having a length of 3) differs from "Maß" as UTF-8 string (having a length of 4).
In Java, Strings are sequences of characters, and that's the reason why e.g. "好友" has a length of 2, as it's just two characters that happen to come from a non-Latin script. [This is true for most Unicode characters you'll typically encounter, but there are exceptions.] In Java, terms like UTF-8 matter only when converting between strings and byte sequences.
I searched google about this information but the answers I found do not apply to my case.
I have an HEX string like the following:
hexString = '7d940ef9790c31334ac6f116814148b9abe73f32'
Python can convert this string to a binary value using the following function:
unhexlify('7d940ef9790c31334ac6f116814148b9abe73f32')
whose result is:
binString = '}\x94\x0e\xf9y\x0c13J\xc6\xf1\x16\x81AH\xb9\xab\xe7?2'
That is, a string containing the binary information of the original hex string.
I tried to use the .getBytes("encoding") method in java, but I am not able to reproduce this result, and unfortunately this result is critical for my application (I need exactly the same result).
I'm not an encodings pro, so it could easily be me overlooking something.
I need to convert to the same kind of string as "binString" a byte[] array resulting from e.g. a md5 digest, so any insight on how to convert a byte[] to such a string would be most appreciated.
It's not a solution, but it could be helpful:
import java.nio.charset.Charset;
import javax.xml.bind.DatatypeConverter;
public class Main {
public static void main(String args[])
{
String hexString = "7d940ef9790c31334ac6f116814148b9abe73f32";
byte[] out = toByteArray(hexString);
String result = new String(out,Charset.forName("UTF-8"));
System.out.println(result);
}
public static byte[] toByteArray(String s) {
return DatatypeConverter.parseHexBinary(s);
}
}
The output in my machine is:
}??y13J???AH????2
It can print all the ascii chracters, but there are problems with the escape characters like \x.
Original Python code:
import hashlib
return int(hashlib.md5("string").hexdigest(), 16) % 100
My attempt to translate into Java:
import java.security.*;
import java.math.*;
String s = "string";
MessageDigest m = MessageDigest.getInstance("MD5");
m.update(s.getBytes(), 0, s.length());
BigInteger i = BigInteger(1,m.digest());
return i % 100;
What am I doing wrong?
I see you haven't even tried to compile this code, since it's not working. BigInteger isn't a primitive type, therefore % operator doesn't work. You should use .mod method instead.
Your code, fixed:
String s = "string";
MessageDigest m = MessageDigest.getInstance("MD5");
m.update(s.getBytes(), 0, s.length());
BigInteger i = new BigInteger(1, m.digest());
return i.mod(BigInteger.valueOf(100));
And since you're updating 0 - s.length(), you can just pass all the bytes to m.digest, make it one line shorter:
String s = "string";
MessageDigest m = MessageDigest.getInstance("MD5");
BigInteger i = new BigInteger(1, m.digest(s.getBytes()));
return i.mod(BigInteger.valueOf(100));
And if you really want an one-liner... Warning, highly unreadable:
return new BigInteger(1, MessageDigest.getInstance("MD5").digest( s.getBytes()) ).mod(BigInteger.valueOf(100));
Suggestion
It's always recommended to specify the encoding of the input string while converting to byte stream.
Without specifying an encoding scheme, programming language implementations are free to choose their default format. This can vary across programming languages and even across different versions of the same programming language.
Python
An advantage of python is, the ability to handle arbitrarily large integer values
import hashlib
data: str = "string"
print(int(hashlib.md5(data.encode('utf-8')).hexdigest(), 16) % 100)
Java
BigInteger is needed to deal with long values outside the Long.MIN_VALUE and Long.MAX_VALUE boundaries.
MessageDigest instances are not thread safe, so use with care(new instances or ThreadLocal) in multithreaded use cases.
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.math.BigInteger;
public class TestMD5 {
private static final String DIGEST_ALGO = "MD5";
public static void main(String[] args) throws Exception {
String data = "string";
byte[] digest = MessageDigest.getInstance(DIGEST_ALGO).digest(data.getBytes(StandardCharsets.UTF_8));
int value = new BigInteger(1, digest).mod(BigInteger.valueOf(100)).intValueExact();
System.out.println(value);
}
}
Both these solutions will print 81 for the given input string
Note
The implementation can print the same value for the given input string even without adding the encoding. It should not relied upon as this string is within the ascii range.
Even for unicode character set, it is recommended to specify encoding as the same code point can be encoded in different encoding schemes like utf-8, utf-16
while converting a byte stream to string, always specify the (de)encoding scheme
I have this operation I need to perform where I need to append a byte such as 0x10 to some String in Java. I was wondering how I could go about doing this?
For example:
String someString = "HELLO WORLD";
byte someByte = 0x10;
In this example, how would I go about appending someByte to someString?
The reason why I am asking this question is because the application I am developing is supposed to send commands to some server. The server is able to accept commands (base64 encoded), decode the command, and parse out these bytes that are not necessarily compatible with any sort of ASCII encoding standard for performing some special function.
If you want to concatenate the actual value of a byte to a String use the Byte wrapper and its toString() method, like this:
String someString = "STRING";
byte someByte = 0x10;
someString += Byte.toString(someByte);
If you want to have the String representation of the byte as ascii char then try this:
public static void main(String[] args) {
String a = "bla";
byte x = 0x21; // Ascii code for '!'
a += (char)x;
System.out.println(a); // Will print out 'bla!'
}
If you want to convert the byte value into it's hex representation as String then take a look at Integer.toHexString
If you just want to extend a String literal, then use this one:
System.out.println("Hello World\u0010");
otherwise:
String s1 = "Hello World";
String s2 = s1 + '\u0010';
And no - character are not bytes and vice versa. But here the approximation is close enough :-)
This is simply to error check my code, but I would like to convert a single byte out of a byte array to a string. Does anyone know how to do this? This is what I have so far:
recBuf = read( 5 );
Log.i( TAG, (String)recBuf[0] );
But of course this doesn't work.
I have googled around a bit but have only found ways to convert an entire byte[] array to a string...
new String( recBuf );
I know I could just do that, and then sift through the string, but it would make my task easier if I knew how to operate this way.
You can make a new byte array with a single byte:
new String(new byte[] { recBuf[0] })
Use toString method of Byte
String s=Byte.toString(recBuf[0] );
Try above , it works.
Example:
byte b=14;
String s=Byte.toString(b );
System.out.println("String value="+ s);
Output:
String value=14
There's a String constructor of the form String(byte[] bytes, int offset, int length). You can always use that for your conversion.
So, for example:
byte[] bite = new byte[]{65,67,68};
for(int index = 0; index < bite.length; index++)
System.out.println(new String(bite, index,1));
What about converting it to char? or simply
new String(buffer[0])
public static String toString (byte value)
Since: API Level 1
Returns a string containing a concise, human-readable description of the specified byte value.
Parameters
value the byte to convert to a string.
Returns
a printable representation of value.]1
this is how you can convert single byte to string try code as per your requirement
Edit:
Hows about
""+ recBuf[0];//Hacky... not sure if would work
((Byte)recBuf[0]).toString();
Pretty sure that would work.
Another alternate could be converting byte to char and finally string
Log.i(TAG, Character.toString((char) recBuf[0]));
Or
Log.i(TAG, String.valueOf((char) recBuf[0]));
You're assuming that you're using 8bit character encoding (like ASCII) and this would be wrong for many others.
But with your assumption you might just as well using simple cast to character like
char yourChar = (char) yourByte;
or if really need String:
String string = String.valueOf((char)yourByte);