java data types to byte array - java

I have a Java class
public class MsgLayout{
int field1;
String field2;
long field3;
}
I have to write this object as a byte array in a Socket output stream. The three fields (instance variables) have a layout. i.e. field1 must occupy 1 byte, field2 must occupy 4 bytes and field3 must occupy 8 bytes.
ByteBuffer bbf = ByteBuffer.allocate(TOTAL_SIZE);
bbf.put(Integer.toString(this.getField1()).getBytes(), 0, FIELD1_SIZE);
bbf.position(FIELD2_OFFSET);
bbf.put(Long.toString(this.getField2()).getBytes(), 0, FIELD2_SIZE);
bbf.position(FIELD3_OFFSET);
bbf.put(Long.toString(this.getField3()).getBytes(), 0, FIELD3_SIZE);
byte[] msg = bbf.array();
Using the above code, I am trying to fit each field in the byte array according to its desired size. But I am getting IndexOutOfBoundException
In short, the problem is about how to fit the fields in the layout-defined size. For Example FIELD1_OFFSET = 0, FIELD1_SIZE=1, FIELD2_OFFSET=1, FIELD2_SIZE=4, FIELD3_OFFSET=5, FIELD3_SIZE=8.
Now when I convert field1 into String, it does not fit into 1 byte when converted into byte[]. If I do not convert to String, and use putInt(int) it writes 4 bytes into the resulting byte array.

What your code is currently doing is encoding your numeric fields as strings and then outputting the bytes of those characters.
I would suggest using the DataOutputStream class to wrap your SocketOutput stream and write your binary data as so:
DataOutput output = new DataOutputStream(socketOutputStream);
int field1 = 1;
String field2 = "Hello";
long field3 = 5000000000L;
output.writeByte(field1);
output.writeBytes(field2.substring(0, 3));
output.writeLong(field3);
There are a couple assumptions in this code. First I'm assuming for field 2 you want 4 characters serialized as a single byte each. If you want to do any multibyte encoding using something like UTF-8, then you need to do something a little differently. Second, I'm assuming that field 2 will always have at least 4 characters.

field1 may only have one byte of data, but its string representation will be one or more characters (e.g. "0", "63", "127"). Each character in the String is in fact a char (a two byte value). So I would expect one byte of data to inflate to two to six bytes of data when it goes through a byte->String->byte[] conversion.

Related

Converting a byte array with range of -128 to 127 to String array

I have a function for hashing passwords, that returns a byte[] with entries using the full range of the byte datatype from -128 to 127. I have tried to convert the byte[] to a String using new String(byte_array, StandardCharsets.UTF_8);. This does return a String - however it can not properly encode negative numbers - hence it encodes them to a "�" character. When comparing two of those characters using: new String(new byte[]{-1}, StandardCharsets.UTF_8).equals(new String(new byte[]{-2}, StandardCharsets.UTF_8)) it turns out the String representation for all negative numbers is equal as the expression above returns true. While this doesn't fully ruin my hashing functionality as the hash of the same expression will still always yield the same result, this is obviously not what I want as it increases the chance of two different inputs yielding the same output drastically.
Is there some easy fix for this or any alternative idea how to convert the byte[] to a String? For context I want to use the String to later write it to a file to store it in a file and later read it again to compare it to other hashes.
Edit: After a bit of trying around with the tips from the comments my solution is to convert the byte[] to a char[] and add 128 to every value. The char array can then easily be converted to a String or be written to a file directly (byteHash is the byte[]):
char[] charHash = new char[byteHash.length];
for(int i = 0; i < byteHash.length; i++){
charHash[i] = (char) (byteHash[i]+128);
}
return new String(charHash);
I do not really like the solution but it works.
The appropriate solution to this is to use an encoding like hexadecimal (https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HexFormat.html) or Base64 (https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html) to convert an arbitrary byte sequence to a string reversibly.

Convert byte array to string with equivalent number of bytes

Is it possible to convert a byte array to a string but where the length of the string is exactly the same length as the number of bytes in the array? If I use the following:
byte[] data; // Fill it with data
data.toString();
The length of the string is different than the length of the array. I believe that this is because Java and/or Android takes some kind of default encoding into account. The values in the array can be negative as well. Theoretically it should be possible to convert any byte to some character. I guess I need to figure out how to specify an encoding that generates a fixed single byte width for each character.
EDIT:
I tried the following but it didn't work:
byte[] textArray; // Fill this with some text.
String textString = new String(textArray, "ASCII");
textArray = textString.getBytes("ASCII"); // textArray ends up with different data.
You can use the String constructor String(byte[] data) to create a string from the byte array. If you want to specify the charset as well, you can use String(byte[] data, Charset charset) constructor.
Try your code sample with US-ASCII or ISO-8859-1 in place of ASCII. ASCII is not a built-in Character encoding for Java or Android, but one of those two are. They are guaranteed single-byte encodings, with a caveat that characters not in the character set will be silently truncated.
This should work fine!
public static byte[] stringToByteArray(String pStringValue){
int length= pStringValue.length();
byte[] bytes = new byte[length];
for(int index=0; index<length; index++){
char ch= pStringValue.charAt(index);
bytes[index]= (byte)ch;
}
return bytes;
}
since JDK 1.6:
You can also use:
stringValue.getBytes() which will return you a byte array.
In case of passing a NULL string, you need to handle that by either throwing the nullPointerException or handling it inside the method itself.

How to convert from ByteBuffer to Integer and String?

I converted an int to a byte array using ByteBuffer's putInt() method. How do I do the opposite? So convert those bytes to an int?
Furthermore, I converted a string to an array of bytes using the String's getBytes() method. How do I convert it the other way round? The bytesArray.getString() does not return a readable string. I get things like BF#DDAD
You can use the ByteBuffer.getInt method, specifying the offset at which the integer occurs, to convert a series of bytes into an integer. Alternatively, if you happen to know the byte ordering, you can use bitwise operators to explicitly reconstruct the 32-bit integer from its 8-bit octets.
To convert an array of bytes into a String, you can use the String(byte[]) constructor to construct a new String out of the byte array. For example:
byte[] bytes = /* ... get array of bytes ... */
String fromBytes = new String(bytes);

Bytes of a string in Java

In Java, if I have a String x, how can I calculate the number of bytes in that string?
A string is a list of characters (i.e. code points). The number of bytes taken to represent the string depends entirely on which encoding you use to turn it into bytes.
That said, you can turn the string into a byte array and then look at its size as follows:
// The input string for this test
final String string = "Hello World";
// Check length, in characters
System.out.println(string.length()); // prints "11"
// Check encoded sizes
final byte[] utf8Bytes = string.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "11"
final byte[] utf16Bytes= string.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "24"
final byte[] utf32Bytes = string.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "44"
final byte[] isoBytes = string.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "11"
final byte[] winBytes = string.getBytes("CP1252");
System.out.println(winBytes.length); // prints "11"
So you see, even a simple "ASCII" string can have different number of bytes in its representation, depending which encoding is used. Use whichever character set you're interested in for your case, as the argument to getBytes(). And don't fall into the trap of assuming that UTF-8 represents every character as a single byte, as that's not true either:
final String interesting = "\uF93D\uF936\uF949\uF942"; // Chinese ideograms
// Check length, in characters
System.out.println(interesting.length()); // prints "4"
// Check encoded sizes
final byte[] utf8Bytes = interesting.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "12"
final byte[] utf16Bytes= interesting.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "10"
final byte[] utf32Bytes = interesting.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "16"
final byte[] isoBytes = interesting.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "4" (probably encoded "????")
final byte[] winBytes = interesting.getBytes("CP1252");
System.out.println(winBytes.length); // prints "4" (probably encoded "????")
(Note that if you don't provide a character set argument, the platform's default character set is used. This might be useful in some contexts, but in general you should avoid depending on defaults, and always use an explicit character set when encoding/decoding is required.)
If you're running with 64-bit references:
sizeof(string) =
8 + // object header used by the VM
8 + // 64-bit reference to char array (value)
8 + string.length() * 2 + // character array itself (object header + 16-bit chars)
4 + // offset integer
4 + // count integer
4 + // cached hash code
In other words:
sizeof(string) = 36 + string.length() * 2
On a 32-bit VM or a 64-bit VM with compressed OOPs (-XX:+UseCompressedOops), the references are 4 bytes. So the total would be:
sizeof(string) = 32 + string.length() * 2
This does not take into account the references to the string object.
The pedantic answer (though not necessarily the most useful one, depending on what you want to do with the result) is:
string.length() * 2
Java strings are physically stored in UTF-16BE encoding, which uses 2 bytes per code unit, and String.length() measures the length in UTF-16 code units, so this is equivalent to:
final byte[] utf16Bytes= string.getBytes("UTF-16BE");
System.out.println(utf16Bytes.length);
And this will tell you the size of the internal char array, in bytes.
Note: "UTF-16" will give a different result from "UTF-16BE" as the former encoding will insert a BOM, adding 2 bytes to the length of the array.
According to How to convert Strings to and from UTF8 byte arrays in Java:
String s = "some text here";
byte[] b = s.getBytes("UTF-8");
System.out.println(b.length);
A String instance allocates a certain amount of bytes in memory. Maybe you're looking at something like sizeof("Hello World") which would return the number of bytes allocated by the datastructure itself?
In Java, there's usually no need for a sizeof function, because we never allocate memory to store a data structure. We can have a look at the String.java file for a rough estimation, and we see some 'int', some references and a char[]. The Java language specification defines, that a char ranges from 0 to 65535, so two bytes are sufficient to keep a single char in memory. But a JVM does not have to store one char in 2 bytes, it only has to guarantee, that the implementation of char can hold values of the defines range.
So sizeof really does not make any sense in Java. But, assuming that we have a large String and one char allocates two bytes, then the memory footprint of a String object is at least 2 * str.length() in bytes.
There's a method called getBytes(). Use it wisely .
Try this :
Bytes.toBytes(x).length
Assuming you declared and initialized x before
To avoid try catch, use:
String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
System.out.println(b.length);
Try this using apache commons:
String src = "Hello"; //This will work with any serialisable object
System.out.println(
"Object Size:" + SerializationUtils.serialize((Serializable) src).length)

Comparison of byte arrays

I try to compare 2 byte arrays.
Byte array 1 is an array with the last 3 bytes of a sha1 hash:
private static byte[] sha1SsidGetBytes(byte[] sha1)
{
return new byte[] {sha1[17], sha1[18], sha1[19]};
}
Byte array 2 is an array that I fill with 3 bytes coming from an hexadecimal string:
private static byte[] ssidGetBytes(String ssid)
{
BigInteger ssidBigInt = new BigInteger(ssid, 16);
return ssidBigInt.toByteArray();
}
How is it possible that this comparison:
if (Arrays.equals(ssidBytes, sha1SsidGetBytes(snSha1)))
{
}
works most of the times but sometimes not. Byte Order?
e.g. for "6451E6" (hex string) it works fine, for "ABED74" it does not...
The problem is pretty obvious if you try this:
BigInteger b1 = new BigInteger("6451E6", 16);
BigInteger b2 = new BigInteger("ABED74", 16);
System.out.println(b1.toByteArray().length);
System.out.println(b2.toByteArray().length);
Specifically, ABED74 creates a BigInteger whose byte array is 4 bytes long--so of course it's not going to be equal to any three byte array.
The straightforward fix is to change the return statement in ssidGetBytes from
return ssidBigInt.toByteArray();
to
byte[] ba = ssidBigInt.toByteArray();
return new byte[] { ba[ba.length - 3], ba[ba.length - 2], ba[ba.length - 1] };
Your approach of parsing a hex string via BigInteger is flawed, basically. For example, new BigInteger("ABED74").toByteArray() returns an array of 4 bytes, not three. While you could hack around this, you're fundamentally not trying to do anything involving BigInteger values... you're just trying to parse hex.
I suggest you use the Apache Codec library to do the parsing:
byte[] array = (byte[]) new Hex().decode(text);
(The API for Apache Codec leaves something to be desired, but it does work.)
From the javadoc's (emphasis mine):
http://download.oracle.com/javase/1.5.0/docs/api/java/math/BigInteger.html#toByteArray%28%29
Returns a byte array containing the
two's-complement representation of
this BigInteger. The byte array will
be in big-endian byte-order: the most
significant byte is in the zeroth
element. The array will contain the
minimum number of bytes required to
represent this BigInteger, including
at least one sign bit, which is
(ceil((this.bitLength() + 1)/8)).
(This representation is compatible
with the (byte[]) constructor.)
There is a lot of computations going on inside the ByteInteger(String,radix) constructor that you are using, which does not guarantee the constructed BigInteger will produce a byte array (via its toByteArray() method) comparable to the result of a String's getBytes() encoding.
The output of toByteArray() is intended to be used (mostly) as input to the (byte[]) constructor of BigInteger. It makes no guarantee for uses other than those.
Look at it like this: the output of toByteArray() is the byte representation of the BigInteger object and everything in it including internal attributes like magnitude. Those attributes do not exist in the input String, but are computed during construction of the BitInteger object.
That will be incompatible to the byte representation of the input String which only carries the initial numeric value with which to create a BigInteger.

Categories