Hash a String into fixed bit hash value - java

I want to hash a word into fixed bit hash value say 64 bit,32 bit (binary).
I used the following code
long murmur_hash= MurmurHash.hash64(word);
Then murmur_hash value is converted into binary by the following function
public static String intToBinary (int n, int numOfBits) {
String binary = "";
for(int i = 0; i < numOfBits; ++i) {
n/=2;
if(n%2 == 0)
{
binary="0"+binary;
}
else
binary="1"+binary;
}
return binary;
}
Is there any direct hash method to convert into binary?

Just use this
Integer.toBinaryString(int i)

If you want to convert into a fixed binary string, that is, always get a 64-character long string with zero padding, then you have a couple of options. If you have Apache's StringUtils, you can use:
StringUtils.leftPad( Long.toBinaryString(murmurHash), Long.SIZE, "0" );
If you don't, you can write a padding method yourself:
public static String paddedBinaryFromLong( long val ) {
StringBuilder sb = new StringBuilder( Long.toBinaryString(val));
char[] zeros = new char[Long.SIZE - sb.length()];
Arrays.fill(zeros, '0');
sb.insert(0, zeros);
return sb.toString();
}
This method starts by using the Long.toBinaryString(long) method, which conveniently does the bit conversion for you. The only thing it doesn't do is pad on the left if the value is shorter than 64 characters.
The next step is to create an array of 0 characters with the missing zeros needed to pad to the left.
Finally, we insert that array of zeros at the beginning of our StringBuilder, and we have a 64-character, zero-padded bit string.
Note: there is a difference between using Long.toBinaryString(long) and Long.toString(long,radix). The difference is in negative numbers. In the first, you'll get the full, two's complement value of the number. In the second, you'll get the number with a minus sign:
System.out.println(Long.toString(-15L,2));
result:
-1111
System.out.println(Long.toBinaryString(-15L));
result:
1111111111111111111111111111111111111111111111111111111111110001

Another other way is using
Integer.toString(i, radix)
you can get string representation of the first argument i in the radix ( Binary - 2, Octal - 8, Decimal - 10, Hex - 16) specified by the second argument.

Related

java - Enforce 4 digit hex representation of a binary number

Below is a snippet of my java code.
//converts a binary string to hexadecimal
public static String binaryToHex (String binaryNumber)
{
BigInteger temp = new BigInteger(binaryNumber, 2);
return temp.toString(16).toUpperCase();
}
If I input "0000 1001 0101 0111" (without the spaces) as my String binaryNumber, the return value is 957. But ideally what I want is 0957 instead of just 957. How do I make sure to pad with zeroes if hex number is not 4 digits?
Thanks.
You do one of the following:
Manually pad with zeroes
Use String.format()
Manually pad with zeroes
Since you want extra leading zeroes when shorter than 4 digits, use this:
BigInteger temp = new BigInteger(binaryNumber, 2);
String hex = temp.toString(16).toUpperCase();
if (hex.length() < 4)
hex = "000".substring(hex.length() - 1) + hex;
return hex;
Use String.format()
BigInteger temp = new BigInteger(binaryNumber, 2);
return String.format("%04X", temp);
Note, if you're only expecting the value to be 4 hex digits long, then a regular int can hold the value. No need to use BigInteger.
In Java 8, do it by parsing the binary input as an unsigned number:
int temp = Integer.parseUnsignedInt(binaryNumber, 2);
return String.format("%04X", temp);
The BigInteger becomes an internal machine representation of whatever value you passed in as a String in binary format. Therefore, the machine does not know how many leading zeros you would like in the output format. Unfortunately, the method toString in BigInteger does not allow any kind of formatting, so you would have to do it manually if you attempt to use the code you showed.
I customized your code a bit to include leading zeros based on input string:
public static void main(String[] args) {
System.out.println(binaryToHex("0000100101010111"));
}
public static String binaryToHex(String binaryNumber) {
BigInteger temp = new BigInteger(binaryNumber, 2);
String hexStr = temp.toString(16).toUpperCase();
int b16inLen = binaryNumber.length()/4;
int b16outLen = hexStr.length();
int b16padding = b16inLen - b16outLen;
for (int i=0; i<b16padding; i++) {
hexStr=('0'+hexStr);
}
return hexStr;
}
Notice that the above solution counts up the base16 digits in the input and calculates the difference with the base16 digits in the output. So, it requires the user to input a full '0000' to be counted up. That is '000 1111' will be displayed as 'F' while '0000 1111' as '0F'.

parseInt on a string of 8 bits returns a negative value when the first bit is 1

I've got a huge string of bits (with some \n in it too) that I pass as a parameter to a method, which should isolate the bits 8 by 8, and convert them all to bytes using parseInt().
Thing is, every time the substring of 8 bits starts with a 1, the resulting byte is a negative number. For example, the first substring is '10001101', and the resulting byte is -115. I can't seem to figure out why, can someone help? It works fine with other substrings.
Here's my code, if needed :
static String bitsToBytes(String geneString) {
String geneString_temp = "", sub;
for(int i = 0; i < geneString.length(); i = i+8) {
sub = geneString.substring(i, i+8);
if (sub.indexOf("\n") != -1) {
if (sub.indexOf("\n") != geneString.length())
sub = sub.substring(0, sub.indexOf("\n")) + sub.substring(sub.indexOf("\n")+1, sub.length()) + geneString.charAt(i+9);
}
byte octet = (byte) Integer.parseInt(sub, 2);
System.out.println(octet);
geneString_temp = geneString_temp + octet;
}
geneString = geneString_temp + "\n";
return geneString;
}
In Java, byte is a signed type, meaning that when the most significant bit it set to 1, the number is interpreted as negative.
This is precisely what happens when you print your byte here:
System.out.println(octet);
Since PrintStream does not have an overload of println that takes a single byte, the overload that takes an int gets called. Since octet's most significant bit is set to 1, the number gets sign-extended by replicating its sign bit into bits 9..32, resulting in printout of a negative number.
byte is a signed two's complement integer. So this is a normal behavior: the two's complement representation of a negative number has a 1 in the most-significant bit. You could think of it like a sign bit.
If you don't like this, you can use the following idiom:
System.out.println( octet & 0xFF );
This will pass the byte as an int while preventing sign extension. You'll get an output as if it were unsigned.
Java doesn't have unsigned types, so the only other thing you could do is store the numbers in a wider representation, e.g. short.
In Java, all integers are signed, and the most significant bit is the sign bit.
Because parseInt parse signed int that means it converts the binary if it begins with 0 its positive and if 1 its negative try to use parseUnsignedInt instead

What is the purpose to add padding to hex?

Hi I read this post on how to implement salt and hashing to the password and I am stuck on specified code underneath the website I specified above.
private static String toHex(byte[] array)
{
BigInteger bi = new BigInteger(1, array);
String hex = bi.toString(16);
int paddingLength = (array.length * 2) - hex.length();
if(paddingLength > 0)
return String.format("%0" + paddingLength + "d", 0) + hex;
else
return hex;
}
My question is that why did they calculate the paddingLength and implement it to the hex if the result paddingLength is greater than zero?
BigInteger(byte[]) interprets the byte array into a two's complement value; this means that it has 2^(8*N) possible values for an N-length array (since each byte contains 8 bits).
Meanwhile, a hex string of length M has 16^M possible values (since each character encodes one of 16 values).
The authors want a one-to-one mapping between the byte[] and the String: given a String, you should be able to exactly determine the byte[] it came from. To get that, we have to make sure the string can encode exactly as many values as the byte[]. Plugging in the numbers from above, we get:
(# values for an N-length byte[]) == (# values for an M-length String)
2^(8*N) == 16^M
Let's solve for M in terms of N. The first step is to re-write that right-hand side. If you remember your exponent power rules, a^(b*c) == (a^b)^c. Let's get the base of the exponent on the right to be a 2:
== (2^4)^M
== 2^(4*M)
So we have 2^(8*N) == 2^(4*M). If 2^k == 2^j, that means k == j. So, 8*N == 4*M. Dividing both sides by 4 yields M = 2N.
To tie it back together, remember that N was the length of the byte array, and M was the length of the hex string. We've just figured out that for there to be a one-to-one mapping, M = 2N -- in other words, the hex string should be twice as long as the byte array.
The padding ensures that.
Because they wanted all the bytes in the array to be represented in the hex string, even if they are leading zero bytes.
It is not the most obvious way to write a toHex method though.
I find something like this much clearer:
private static String toHex(byte[] array) {
StringBuilder s = new StringBuilder();
for (byte b : array) {
s.append(String.format("%02x", b));
}
return s.toString();
}

How to check if bit is set in Hex-String?

shifters...
I've to do something, that twist my mind.
I'm getting a hex value as String (for example: "AFFE") and have to decide, if bit 5 of Byte one is set.
public boolean isBitSet(String hexValue) {
//enter your code here
return "no idea".equals("no idea")
}
Any hints?
Regards,
Boskop
The simplest way is to convert String to int, and use bit arithmetic:
public boolean isBitSet(String hexValue, int bitNumber) {
int val = Integer.valueOf(hexValue, 16);
return (val & (1 << bitNumber)) != 0;
} ^ ^--- int value with only the target bit set to one
|--------- bit-wise "AND"
Assuming that byte one is represented by the last two digits, and the size of the string fixed to 4 characters, then the answer may be:
return (int)hexValue[2] & 1 == 1;
As you an see, you don't need to convert the whole string to binary to evaluate the 5th bit, it is indeed the LSB of the 3rd character.
Now, if the size of the hex string is variable, then you will need something like:
return (int)hexValue[hexValue.Length-2] & 1 == 1;
But as the string can have a length smaller than 2, it would be safer:
return hexValue.Length < 2 ? 0 : (int)hexValue[hexValue.Length-2] & 1 == 1;
The correct answer may vary depending on what you consider to be byte 1 and bit 5.
How about this?
int x = Integer.parseInt(hexValue);
String binaryValue = Integer.toBinaryString(x);
Then you can examine the String to check the particular bits you care about.
Use the BigInteger and it's testBit built-in function
static public boolean getBit(String hex, int bit) {
BigInteger bigInteger = new BigInteger(hex, 16);
return bigInteger.testBit(bit);
}

Left padding integers (non-decimal format) with zeros in Java

The question has been answered for integers printed in decimal format, but I'm looking for an elegant way to do the same with integers in non-decimal format (like binary, octal, hex).
Creation of such Strings is easy:
String intAsString = Integer.toString(12345, 8);
would create a String with the octal represenation of the integer value 12345. But how to format it so that the String has like 10 digits, apart from calculating the number of zeros needed and assembling a new String 'by hand'.
A typical use case would be creating binary numbers with a fixed number of bits (like 16, 32, ...) where one would like to have all digits including leading zeros.
For oct and hex, it's as easy as String.format:
assert String.format("%03x", 16) == "010";
assert String.format("%03o", 8) == "010";
With Guava you could just write:
String intAsString = Strings.padStart(Integer.toString(12345, 8), 10, '0');
How about this (standard Java):
private final static String ZEROES = "0000000000";
// ...
String s = Integer.toString(12345, 8);
String intAsString = s.length() <= 10 ? ZEROES.substring(s.length()) + s : s;
Printing out a HEX number, for example, with ZERO padding:
System.out.println(String.format("%08x", 1234));
Will give the following output, with the padding included:
000004d2
Replacing x with OCTAL's associated formatting character will do the same, probably.
Here's a more reuseable alternative with help of StringBuilder.
public static String padZero(int number, int radix, int length) {
String string = Integer.toString(number, radix);
StringBuilder builder = new StringBuilder().append(String.format("%0" + length + "d", 0));
return builder.replace(length - string.length(), length, string).toString();
}
The Guava example as posted by ColinD is by the way pretty slick.

Categories