BaseX XML database code - java

I'm a student of computer science and we have to use BaseX (a pure Java OSS XML database) in one of our courses. While browsing through the code I discovered the following piece of code:
/**
* Returns a md5 hash.
* #param pw password string
* #return hash
*/
public static String md5(final String pw) {
try {
final MessageDigest md = MessageDigest.getInstance("MD5");
md.update(Token.token(pw));
final TokenBuilder tb = new TokenBuilder();
for(final byte b : md.digest()) {
final int h = b >> 4 & 0x0F;
tb.add((byte) (h + (h > 9 ? 0x57 : 0x30)));
final int l = b & 0x0F;
tb.add((byte) (l + (l > 9 ? 0x57 : 0x30)));
}
return tb.toString();
} catch(final Exception ex) {
Main.notexpected(ex);
return pw;
}
}
(source: https://svn.uni-konstanz.de/dbis/basex/trunk/basex/src/main/java/org/basex/util/Token.java)
Just out of interest: what is happening there? Why these byte operations after the MD5? The docstring is saying it returns a MD5 hash...does it?

I didn't look up the definitions for the classes used, but the byte operations seem to be encoding the returned byte array into a string of hex characters.
for(final byte b : md.digest()) {
// get high 4 bytes of current byte
final int h = b >> 4 & 0x0F;
// convert into hex digit (0x30 is '0' while 0x57+10 is 'a')
tb.add((byte) (h + (h > 9 ? 0x57 : 0x30)));
// the same for the bottom 4 bits
final int l = b & 0x0F;
tb.add((byte) (l + (l > 9 ? 0x57 : 0x30)));
}
This is a great example of why using magic numbers is bad. I, for one, honestly couldn't remember that 0x57+10 is the ASCII/Unicode codepoint for 'a' without checking it in a Python interpreter.

I guess Matti is right - as the md.digest() returns an byte[] and BaseX uses Tokens in favor of Strings (thus the TokenBuilder).
So the conversion from md.digest() to String is done via a conversion of Digest-Hex to Token.
Not exactly easy to read but quite similar to what Apache Commons does in their Codec Library
to get the String value of a md5 hash.

This is a great example of why using magic numbers is bad.
Well, this is a core method, which isn't supposed to be modified by others – and this looks like the most efficient way to do it. But, true, the documentation could be better. Talking about core methods, it's worthwhile looking at code like Integer.getChars():
http://www.docjar.com/html/api/java/lang/Integer.java.html

Related

How to convert binary string to Java String encoded using UFT-8

In order to send a chunk of bits from a 4 words String, I'm doing getting the byte array from the String and calculating the bit string.
StringBuilder binaryStr = new StringBuilder();
byte[] bytesFromStr = str.getBytes("UTF-8");
for (int i = 0, l = bytesFromStr.length; i < l; i++) {
binaryStr.append(Integer.toBinaryString(bytesFromStr[i]));
}
String result = binaryStr.toString();
The problem appears when I want to do the reverse operation: converting a bit string to a Java String encoded using UTF-8.
Please, Is there someone that can explain me the best way to do that?
Thanks in advance!
TL;DR Don't use toBinaryString(). See solution at the end.
Your problem is that Integer.toBinaryString() doesn't return leading zeroes, e.g.
System.out.println(Integer.toBinaryString(1)); // prints: 1
System.out.println(Integer.toBinaryString(10)); // prints: 1010
System.out.println(Integer.toBinaryString(100)); // prints: 1100100
For your purpose, you want to always get 8 bits for each byte.
You also need to prevent negative values from causing errors, e.g.
System.out.println(Integer.toBinaryString((byte)129)); // prints: 11111111111111111111111110000001
Easiest way to accomplish that is like this:
Integer.toBinaryString((b & 0xFF) | 0x100).substring(1)
First, it coerces the byte b to int, then retains only lower 8 bits, and finally sets the 9th bit, e.g. 129 (decimal) becomes 1 1000 0001 (binary, spaces added for clarity). It then excludes that 9th bit, in effect ensuring that leading zeroes are in place.
It's better to have that as a helper method:
private static String toBinary(byte b) {
return Integer.toBinaryString((b & 0xFF) | 0x100).substring(1);
}
In which case your code becomes:
StringBuilder binaryStr = new StringBuilder();
for (byte b : str.getBytes("UTF-8"))
binaryStr.append(toBinary(b));
String result = binaryStr.toString();
E.g. if str = "Hello World", you get:
0100100001100101011011000110110001101111001000000101011101101111011100100110110001100100
You could of course just do it yourself, without resorting to toBinaryString():
StringBuilder binaryStr = new StringBuilder();
for (byte b : str.getBytes("UTF-8"))
for (int i = 7; i >= 0; i--)
binaryStr.append((b >> i) & 1);
String result = binaryStr.toString();
That will probably run faster too.
Thanks #Andreas for your code. I test using your function and "decoding" again to UTF-8 using this:
StringBuilder revealStr = new StringBuilder();
for (int i = 0; i < result.length(); i += 8) {
revealStr.append((char) Integer.parseUnsignedInt(result.substring(i, i + 8), 2));
}
Thanks for all folks to help me.

What is the purpose to add padding to hex?

Hi I read this post on how to implement salt and hashing to the password and I am stuck on specified code underneath the website I specified above.
private static String toHex(byte[] array)
{
BigInteger bi = new BigInteger(1, array);
String hex = bi.toString(16);
int paddingLength = (array.length * 2) - hex.length();
if(paddingLength > 0)
return String.format("%0" + paddingLength + "d", 0) + hex;
else
return hex;
}
My question is that why did they calculate the paddingLength and implement it to the hex if the result paddingLength is greater than zero?
BigInteger(byte[]) interprets the byte array into a two's complement value; this means that it has 2^(8*N) possible values for an N-length array (since each byte contains 8 bits).
Meanwhile, a hex string of length M has 16^M possible values (since each character encodes one of 16 values).
The authors want a one-to-one mapping between the byte[] and the String: given a String, you should be able to exactly determine the byte[] it came from. To get that, we have to make sure the string can encode exactly as many values as the byte[]. Plugging in the numbers from above, we get:
(# values for an N-length byte[]) == (# values for an M-length String)
2^(8*N) == 16^M
Let's solve for M in terms of N. The first step is to re-write that right-hand side. If you remember your exponent power rules, a^(b*c) == (a^b)^c. Let's get the base of the exponent on the right to be a 2:
== (2^4)^M
== 2^(4*M)
So we have 2^(8*N) == 2^(4*M). If 2^k == 2^j, that means k == j. So, 8*N == 4*M. Dividing both sides by 4 yields M = 2N.
To tie it back together, remember that N was the length of the byte array, and M was the length of the hex string. We've just figured out that for there to be a one-to-one mapping, M = 2N -- in other words, the hex string should be twice as long as the byte array.
The padding ensures that.
Because they wanted all the bytes in the array to be represented in the hex string, even if they are leading zero bytes.
It is not the most obvious way to write a toHex method though.
I find something like this much clearer:
private static String toHex(byte[] array) {
StringBuilder s = new StringBuilder();
for (byte b : array) {
s.append(String.format("%02x", b));
}
return s.toString();
}

Converting String type binary number to bit in java

I have a question about converting String type binary number to bit and write in the txt file.
For example we have String like "0101011" and want to convert to bit type "0101011"
then write in to the file on the disk.
I would like to know is there anyway to covert to string to bit..
i was searching on the web they suggest to use bitarray but i am not sure
thanks
Try this:
int value = Integer.parseInt("0101011", 2); // parse base 2
Then the bit pattern in value will correspond to the binary interpretation of the string "0101011". You can then write value out to a file as a byte (assuming the string is no more than 8 binary digits).
EDIT You could also use Byte.parseByte("0101011", 2);. However, byte values in Java are always signed. If you tried to parse an 8-bit value with the 8th bit set (like "10010110", which is 150 decimal), you would get a NumberFormatException because values above +127 do not fit in a byte. If you don't need to handle bit patterns greater than "01111111", then Byte.parseByte works just as well as Integer.parseInt.
Recall, though, that to write a byte to a file, you use OutputStream.write(int), which takes an int (not byte) value—even though it only writes one byte. Might as well go with an int value to start with.
You can try the below code to avoid overflows of the numbers.
long avoidOverflows = Long.parseLong("11000000000000000000000000000000", 2);
int thisShouldBeANegativeNumber = (int) avoidOverflows;
System.out.println("Currect value : " + avoidOverflows + " -> " + "Int value : " + thisShouldBeANegativeNumber);
you can see the output
Currect value : 3221225472 -> Int value : -1073741824
//Converting String to Bytes
bytes[] cipherText= new String("0101011").getBytes()
//Converting bytes to Bits and Convert to String
StringBuilder sb = new StringBuilder(cipherText.length * Byte.SIZE);
for( int i = 0; i < Byte.SIZE * cipherText .length; i++ )
sb.append((cipherText [i / Byte.SIZE] << i % Byte.SIZE & 0x80) == 0 ? '0' : '1');
//Byte code of input in Stirn form
System.out.println("Bytecode="+sb.toString()); // some binary data
//Convert Byte To characters
String bin = sb.toString();
StringBuilder b = new StringBuilder();
int len = bin.length();
int i = 0;
while (i + 8 <= len) {
char c = convert(bin.substring(i, i+8));
i+=8;
b.append(c);
}
//String format of Binary data
System.out.println(b.toString());

Java converting int to hex and back again

I have the following code...
int Val=-32768;
String Hex=Integer.toHexString(Val);
This equates to ffff8000
int FirstAttempt=Integer.parseInt(Hex,16); // Error "Invalid Int"
int SecondAttempt=Integer.decode("0x"+Hex); // Error "Invalid Int"
So, initially, it converts the value -32768 into a hex string ffff8000, but then it can't convert the hex string back into an Integer.
In .Net it works as I'd expect, and returns -32768.
I know that I could write my own little method to convert this myself, but I'm just wondering if I'm missing something, or if this is genuinely a bug?
int val = -32768;
String hex = Integer.toHexString(val);
int parsedResult = (int) Long.parseLong(hex, 16);
System.out.println(parsedResult);
That's how you can do it.
The reason why it doesn't work your way: Integer.parseInt takes a signed int, while toHexString produces an unsigned result. So if you insert something higher than 0x7FFFFFF, an error will be thrown automatically. If you parse it as long instead, it will still be signed. But when you cast it back to int, it will overflow to the correct value.
It overflows, because the number is negative.
Try this and it will work:
int n = (int) Long.parseLong("ffff8000", 16);
int to Hex :
Integer.toHexString(intValue);
Hex to int :
Integer.valueOf(hexString, 16).intValue();
You may also want to use long instead of int (if the value does not fit the int bounds):
Hex to long:
Long.valueOf(hexString, 16).longValue()
long to Hex
Long.toHexString(longValue)
It's worth mentioning that Java 8 has the methods Integer.parseUnsignedInt and Long.parseUnsignedLong that does what you wanted, specifically:
Integer.parseUnsignedInt("ffff8000",16) == -32768
The name is a bit confusing, as it parses a signed integer from a hex string, but it does the work.
Try using BigInteger class, it works.
int Val=-32768;
String Hex=Integer.toHexString(Val);
//int FirstAttempt=Integer.parseInt(Hex,16); // Error "Invalid Int"
//int SecondAttempt=Integer.decode("0x"+Hex); // Error "Invalid Int"
BigInteger i = new BigInteger(Hex,16);
System.out.println(i.intValue());
As Integer.toHexString(byte/integer) is not working when you are trying to convert signed bytes like UTF-16 decoded characters you have to use:
Integer.toString(byte/integer, 16);
or
String.format("%02X", byte/integer);
reverse you can use
Integer.parseInt(hexString, 16);
Java's parseInt method is actally a bunch of code eating "false" hex : if you want to translate -32768, you should convert the absolute value into hex, then prepend the string with '-'.
There is a sample of Integer.java file :
public static int parseInt(String s, int radix)
The description is quite explicit :
* Parses the string argument as a signed integer in the radix
* specified by the second argument. The characters in the string
...
...
* parseInt("0", 10) returns 0
* parseInt("473", 10) returns 473
* parseInt("-0", 10) returns 0
* parseInt("-FF", 16) returns -255
Using Integer.toHexString(...) is a good answer. But personally prefer to use String.format(...).
Try this sample as a test.
byte[] values = new byte[64];
Arrays.fill(values, (byte)8); //Fills array with 8 just for test
String valuesStr = "";
for(int i = 0; i < values.length; i++)
valuesStr += String.format("0x%02x", values[i] & 0xff) + " ";
valuesStr.trim();
Below code would work:
int a=-32768;
String a1=Integer.toHexString(a);
int parsedResult=(int)Long.parseLong(a1,16);
System.out.println("Parsed Value is " +parsedResult);
Hehe, curious. I think this is an "intentianal bug", so to speak.
The underlying reason is how the Integer class is written. Basically, parseInt is "optimized" for positive numbers. When it parses the string, it builds the result cumulatively, but negated. Then it flips the sign of the end-result.
Example:
66 = 0x42
parsed like:
4*(-1) = -4
-4 * 16 = -64 (hex 4 parsed)
-64 - 2 = -66 (hex 2 parsed)
return -66 * (-1) = 66
Now, let's look at your example
FFFF8000
16*(-1) = -16 (first F parsed)
-16*16 = -256
-256 - 16 = -272 (second F parsed)
-272 * 16 = -4352
-4352 - 16 = -4368 (third F parsed)
-4352 * 16 = -69888
-69888 - 16 = -69904 (forth F parsed)
-69904 * 16 = -1118464
-1118464 - 8 = -1118472 (8 parsed)
-1118464 * 16 = -17895552
-17895552 - 0 = -17895552 (first 0 parsed)
Here it blows up since -17895552 < -Integer.MAX_VALUE / 16 (-134217728).
Attempting to execute the next logical step in the chain (-17895552 * 16)
would cause an integer overflow error.
Edit (addition): in order for the parseInt() to work "consistently" for -Integer.MAX_VALUE <= n <= Integer.MAX_VALUE, they would have had to implement logic to "rotate" when reaching -Integer.MAX_VALUE in the cumulative result, starting over at the max-end of the integer range and continuing downwards from there. Why they did not do this, one would have to ask Josh Bloch or whoever implemented it in the first place. It might just be an optimization.
However,
Hex=Integer.toHexString(Integer.MAX_VALUE);
System.out.println(Hex);
System.out.println(Integer.parseInt(Hex.toUpperCase(), 16));
works just fine, for just this reason. In the sourcee for Integer you can find this comment.
// Accumulating negatively avoids surprises near MAX_VALUE

Which SHA-256 is correct? The Java SHA-256 digest or the Linux commandline tool

When I calculate in Java an SHA-256 of a string with the following method
public static void main(String[] args) throws NoSuchAlgorithmException {
MessageDigest md = MessageDigest.getInstance("SHA-256");
byte[] hash = md.digest("password".getBytes());
StringBuffer sb = new StringBuffer();
for(byte b : hash) {
sb.append(Integer.toHexString(b & 0xff));
}
System.out.println(sb.toString());
}
I get :
5e884898da2847151d0e56f8dc6292773603dd6aabbdd62a11ef721d1542d8
on the commandline I do the following (I need the -n to not add a newline) :
echo -n "password" | sha256sum
and get
5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
if we compare these more closely I find 2 subtle differences
5e884898da2847151d0e56f8dc6292773603dd6aabbdd62a11ef721d1542d8
5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
or :
5e884898da28 47151d0e56f8dc6292773603d d6aabbdd62a11ef721d1542d8
5e884898da28 0 47151d0e56f8dc6292773603d 0 d6aabbdd62a11ef721d1542d8
Which of the 2 is correct here?
Result: Both are but I was wrong...
fixed it by using :
StringBuffer sb = new StringBuffer();
for(byte b : hash) {
sb.append(String.format("%02x", b));
}
Thanks!
I'll take a reasonable guess: both are outputting the same digest, but in your Java code that outputs the byte[] result as a hex string, you outputting small byte values (less than 16) without a leading 0. So a byte with value "0x0d" is being written as "d" not "0d".
The culprit is the toHexString. It appears to be outputting 6 for the value 6 whereas the sha256sum one is outputting 06. The Java docs for Integer.toHexString() state:
This value is converted to a string of ASCII digits in hexadecimal (base 16) with no extra leading 0s.
The other zeros in the string aren't being affected since they're the second half of the bytes (e.g., 30).
One way to fix it would be to change:
for(byte b : hash) {
sb.append(Integer.toHexString(b & 0xff));
}
to:
for(byte b : hash) {
if (b < 16) sb.append("0");
sb.append(Integer.toHexString(b & 0xff));
}
They're both right - it's your Java code that is at fault, because it is not printing out the leading 0 for a hex value less than 0x10.
You still need "echo -n" to prevent the trailing \n
The one generated by sha256sum seems correct. Your implementation seems to drop those two zeroes.
Using #paxdiablo idea had problem with big number as it appear as negative, so
Instead of:
for(byte b : hash) {
sb.append(Integer.toHexString(b & 0xff));
}
you could do:
for(byte b : hash) {
if (b > 0 && b < 16) {
sb.append("0");
}
sb.append(Integer.toHexString(b & 0xff));
}
And read #Sean Owen answer.
You can also get the right result using this:
MessageDigest md = MessageDigest.getInstance("SHA-256");
byte[] hash = md.digest("password".getBytes());
BigInteger bI = new BigInteger(1, hash);
System.out.println(bI.toString(16));

Categories