BigInteger.toByteArray() returns purposeful leading zeros?

BigInteger.toByteArray() returns purposeful leading zeros? - java

I'm transforming bigints into binary, radix16 and radix64 encoding and seeing mysterious msb zero paddings. Is this a biginteger problem that I can workaround by stripping zero padding or perhaps doing something else?
My test code:
String s;
System.out.printf( "%s length %d\n", s = "123456789A", (new BigInteger( s, 16 )).toByteArray().length );
System.out.printf( "%s length %d\n", s = "F23456789A", (new BigInteger( s, 16 )).toByteArray().length );
Produces output:
123456789A length 5
F23456789A length 6
Of which the longer array has zero padding at the front. Upon inspection of BigInteger.toByteArray() I see:
public byte[] toByteArray() {
int byteLen = bitLength()/8 + 1;
byte[] byteArray = new byte[byteLen];
Now, I can find private int bitLength;, but I can't quite find where bitLength() is defined to figure out exactly why this class does this - connected to sign extension perhaps?

Yes, this is the documented behaviour:
The byte array will be in big-endian byte-order: the most significant byte is in the zeroth element. The array will contain the minimum number of bytes required to represent this BigInteger, including at least one sign bit, which is (ceil((this.bitLength() + 1)/8)).
bitLength() is documented as:
Returns the number of bits in the minimal two's-complement representation of this BigInteger, excluding a sign bit.
So in other words, two values with the same magnitude will always have the same bit length, regardless of sign. Think of a BigInteger as being an unsigned integer and a sign bit - and toByteArray() returns all the data from both parts, which is "the number of bits required for the unsigned integer, and one bit for the sign".

Thanks Jon Skeet for your answer. Here's some code I'm using to convert, very likely it can be optimized.
import java.math.BigInteger;
import java.util.Arrays;
public class UnsignedBigInteger {
public static byte[] toUnsignedByteArray(BigInteger value) {
byte[] signedValue = value.toByteArray();
if(signedValue[0] != 0x00) {
throw new IllegalArgumentException("value must be a psoitive BigInteger");
}
return Arrays.copyOfRange(signedValue, 1, signedValue.length);
}
public static BigInteger fromUnsignedByteArray(byte[] value) {
byte[] signedValue = new byte[value.length + 1];
System.arraycopy(value, 0, signedValue, 1, value.length);
return new BigInteger(signedValue);
}
}

Related

Difference in BigInteger values from a String and a byte array

Can someone please explain the difference between the below two initialisations of BigInteger.
Input:
BigInteger bi1 = new BigInteger("EF", 16);
byte[] ba = new byte[] {(byte)0xEF};
BigInteger bi2 = new BigInteger(ba);
Log.d("BIGINTEGER", "Big Integer1 = " + bi1.toString(16));
Log.d("BIGINTEGER", "Big Integer2 = " + bi2.toString(16));
Output:
Big Integer1 = ef
Big Integer2 = -11
How can I initialise a BigInteger with the value "EF" from a byte array?

From the BigInteger docs
Constructor and Description
BigInteger(byte[] val)
Translates a byte array containing the two's-complement binary
representation of a BigInteger into a BigInteger.
The Two's-complement is the real reason.
Lets see how...
(Byte)0xef in binary = 11101111
Now convert that back to Int and you get -17 (base 10) or -11 (base 16).
Now take a look at
byte[] ba = new byte[] {0, (byte)0xEF};
This has the (Byte)0xef but prepended by 0. Which means this array has 00000000 11101111, which when converted gives the correct result.
Why was the previous case different?
Check out 2's complement rules - SO Answer, Mandatory Wikipedia link
Another way of thinking about this
0xEF in Decimal = 239
Range of Byte = -127 to 128
We have Overflow.
239 - 128 = 111
Now count this 111 from back (Numeric data types have this circular behaviour, again due to 2's complement representation).
For example: 129.toByte = -127
(129 - 128 = 1, count from back the 1st value = -127)
Shortcut to counting from back if x>128 && x<256 then x.toByte = (x - 128) - 128
Here x = 239 so x.toByte = -17

Put a leading zero into the byte[]:
byte[] ba = new byte[] {0, (byte)0xEF};
Ideone demo

You need to add a zero into the byte[] array:
byte[] myByteArray = new byte[] {0, (byte)0xEF};
BigInteger bi2 = new BigInteger(ba);
Log.d("BIGINTEGER", "Big Integer1 = " + bi1.toString(16));
Log.d("BIGINTEGER", "Big Integer2 = " + bi2.toString(16));
why?
well the reason is related to the language specification:
Decimal literals have a particular property that is not shared by hexadecimal, i.e Decimal literals are all positive [JLS 3.10.1].
To write a negative decimal constant, you need to use the unary negation operator (-) in combination with a decimal literal.
In this way, you can write any int or long value, whether positive
or negative, in decimal form, and negative decimal constants are clearly identifiable by the presence of a minus sign.
Not so for hexadecimal nor octal literals.
They can take on both positive and negative values. Hex and octal literals are
negative if their high-order bit is set.
So after having said that, 0xFE is actually a negative number...

public BigInteger(byte[] val)
Translates a byte array containing the two's-complement binary representation of a BigInteger into a BigInteger. The input array is assumed to be in big-endian byte-order: the most significant byte is in the zeroth element.
public BigInteger(String val,
int radix)
Translates the String representation of a BigInteger in the specified radix into a BigInteger. [...]
Source: Oracle Java 7 Docs
Your Initialization from a bytearray does not behave as expected, because 0xEF casted to a bytearray returns {1, 1, 1, 0, 1, 1, 1, 1}.
Made to an integer according to the specs mentioned above is done as follows:
1*2^0 + 1*2^1 + 1*2^2 + 1*2^3 + 0*2^4 + 1*2^5 + 1*2^6 - 1*2^7 = -17 = -0x11
The two's-compliment causes the highest byte to be substracted, rather than being added. So adding a 0 to the beginningthe byte array should probably fix the problem:
byte[] ba = new byte[] {0, (byte)0xEF};

Java (BigInteger from byte array)

I'm using following code to create a BigInteger from hexadecimal string and print in to output.
package javaapplication2;
import java.math.BigInteger;
import javax.xml.bind.DatatypeConverter;
public class JavaApplication2 {
public static void main(String[] args) {
// Number in hexadecimal form
String HexString = "e04fd020ea3a6910a2d808002b30309d";
// Convertation from string to byte array
byte[] ByteArray = toByteArray(HexString);
// Creation of BigInteger from byte array
BigInteger BigNumber = new BigInteger(ByteArray);
// Print result
System.out.print(BigNumber + "\n");
}
public static String toHexString(byte[] array) {
return DatatypeConverter.printHexBinary(array);
}
public static byte[] toByteArray(String s) {
return DatatypeConverter.parseHexBinary(s);
}
}
After execution of this code I'm get a following result:
-42120883064304190395265794005525319523
But I'm expected to see this result:
298161483856634273068108813426242891933
What I'm doing wrong?

You're passing in a byte array where the first byte has a top bit that is set - making it negative. From the constructor documentation:
Translates a byte array containing the two's-complement binary representation of a BigInteger into a BigInteger. The input array is assumed to be in big-endian byte-order: the most significant byte is in the zeroth element.
A two's-complement binary representation with a leading set bit is negative.
To get the result you want, you can do any of:
Prefix the hex string with "00" so that you'll always get a top byte of 0
Pass the hex string straight into the BigInteger(String, int) constructor, where the sign is inferred from the presence or absence of "-" at the start of the string. (Obviously you'd pass in 16 as the base.)
Use the BigInteger(int, byte[]) constructor, passing 1 as the signum value
If your real context is that you've already got the byte array, and you were only parsing it from a hex string for test purposes, I'd use the third option. If you've genuinely got a hex string as input, I'd use the second option.

try
BigInteger bigInt = new BigInteger(HexString, 16);

Hash a String into fixed bit hash value

I want to hash a word into fixed bit hash value say 64 bit,32 bit (binary).
I used the following code
long murmur_hash= MurmurHash.hash64(word);
Then murmur_hash value is converted into binary by the following function
public static String intToBinary (int n, int numOfBits) {
String binary = "";
for(int i = 0; i < numOfBits; ++i) {
n/=2;
if(n%2 == 0)
{
binary="0"+binary;
}
else
binary="1"+binary;
}
return binary;
}
Is there any direct hash method to convert into binary?

Just use this
Integer.toBinaryString(int i)

If you want to convert into a fixed binary string, that is, always get a 64-character long string with zero padding, then you have a couple of options. If you have Apache's StringUtils, you can use:
StringUtils.leftPad( Long.toBinaryString(murmurHash), Long.SIZE, "0" );
If you don't, you can write a padding method yourself:
public static String paddedBinaryFromLong( long val ) {
StringBuilder sb = new StringBuilder( Long.toBinaryString(val));
char[] zeros = new char[Long.SIZE - sb.length()];
Arrays.fill(zeros, '0');
sb.insert(0, zeros);
return sb.toString();
}
This method starts by using the Long.toBinaryString(long) method, which conveniently does the bit conversion for you. The only thing it doesn't do is pad on the left if the value is shorter than 64 characters.
The next step is to create an array of 0 characters with the missing zeros needed to pad to the left.
Finally, we insert that array of zeros at the beginning of our StringBuilder, and we have a 64-character, zero-padded bit string.
Note: there is a difference between using Long.toBinaryString(long) and Long.toString(long,radix). The difference is in negative numbers. In the first, you'll get the full, two's complement value of the number. In the second, you'll get the number with a minus sign:
System.out.println(Long.toString(-15L,2));
result:
-1111
System.out.println(Long.toBinaryString(-15L));
result:
1111111111111111111111111111111111111111111111111111111111110001

Another other way is using
Integer.toString(i, radix)
you can get string representation of the first argument i in the radix ( Binary - 2, Octal - 8, Decimal - 10, Hex - 16) specified by the second argument.

parseInt on a string of 8 bits returns a negative value when the first bit is 1

I've got a huge string of bits (with some \n in it too) that I pass as a parameter to a method, which should isolate the bits 8 by 8, and convert them all to bytes using parseInt().
Thing is, every time the substring of 8 bits starts with a 1, the resulting byte is a negative number. For example, the first substring is '10001101', and the resulting byte is -115. I can't seem to figure out why, can someone help? It works fine with other substrings.
Here's my code, if needed :
static String bitsToBytes(String geneString) {
String geneString_temp = "", sub;
for(int i = 0; i < geneString.length(); i = i+8) {
sub = geneString.substring(i, i+8);
if (sub.indexOf("\n") != -1) {
if (sub.indexOf("\n") != geneString.length())
sub = sub.substring(0, sub.indexOf("\n")) + sub.substring(sub.indexOf("\n")+1, sub.length()) + geneString.charAt(i+9);
}
byte octet = (byte) Integer.parseInt(sub, 2);
System.out.println(octet);
geneString_temp = geneString_temp + octet;
}
geneString = geneString_temp + "\n";
return geneString;
}

In Java, byte is a signed type, meaning that when the most significant bit it set to 1, the number is interpreted as negative.
This is precisely what happens when you print your byte here:
System.out.println(octet);
Since PrintStream does not have an overload of println that takes a single byte, the overload that takes an int gets called. Since octet's most significant bit is set to 1, the number gets sign-extended by replicating its sign bit into bits 9..32, resulting in printout of a negative number.

byte is a signed two's complement integer. So this is a normal behavior: the two's complement representation of a negative number has a 1 in the most-significant bit. You could think of it like a sign bit.
If you don't like this, you can use the following idiom:
System.out.println( octet & 0xFF );
This will pass the byte as an int while preventing sign extension. You'll get an output as if it were unsigned.
Java doesn't have unsigned types, so the only other thing you could do is store the numbers in a wider representation, e.g. short.

In Java, all integers are signed, and the most significant bit is the sign bit.

Because parseInt parse signed int that means it converts the binary if it begins with 0 its positive and if 1 its negative try to use parseUnsignedInt instead

What is the purpose to add padding to hex?

Hi I read this post on how to implement salt and hashing to the password and I am stuck on specified code underneath the website I specified above.
private static String toHex(byte[] array)
{
BigInteger bi = new BigInteger(1, array);
String hex = bi.toString(16);
int paddingLength = (array.length * 2) - hex.length();
if(paddingLength > 0)
return String.format("%0" + paddingLength + "d", 0) + hex;
else
return hex;
}
My question is that why did they calculate the paddingLength and implement it to the hex if the result paddingLength is greater than zero?

BigInteger(byte[]) interprets the byte array into a two's complement value; this means that it has 2^(8*N) possible values for an N-length array (since each byte contains 8 bits).
Meanwhile, a hex string of length M has 16^M possible values (since each character encodes one of 16 values).
The authors want a one-to-one mapping between the byte[] and the String: given a String, you should be able to exactly determine the byte[] it came from. To get that, we have to make sure the string can encode exactly as many values as the byte[]. Plugging in the numbers from above, we get:
(# values for an N-length byte[]) == (# values for an M-length String)
2^(8*N) == 16^M
Let's solve for M in terms of N. The first step is to re-write that right-hand side. If you remember your exponent power rules, a^(b*c) == (a^b)^c. Let's get the base of the exponent on the right to be a 2:
== (2^4)^M
== 2^(4*M)
So we have 2^(8*N) == 2^(4*M). If 2^k == 2^j, that means k == j. So, 8*N == 4*M. Dividing both sides by 4 yields M = 2N.
To tie it back together, remember that N was the length of the byte array, and M was the length of the hex string. We've just figured out that for there to be a one-to-one mapping, M = 2N -- in other words, the hex string should be twice as long as the byte array.
The padding ensures that.

Because they wanted all the bytes in the array to be represented in the hex string, even if they are leading zero bytes.
It is not the most obvious way to write a toHex method though.
I find something like this much clearer:
private static String toHex(byte[] array) {
StringBuilder s = new StringBuilder();
for (byte b : array) {
s.append(String.format("%02x", b));
}
return s.toString();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

BigInteger.toByteArray() returns purposeful leading zeros? - java

Related

Difference in BigInteger values from a String and a byte array

Java (BigInteger from byte array)

Hash a String into fixed bit hash value

parseInt on a string of 8 bits returns a negative value when the first bit is 1

What is the purpose to add padding to hex?

Categories

Resources