This is my very first post on Stackoverflow so please go easy on me :)
I am looking for a more processor-efficient way of performing byte array conversions to primitive types in Java.
The byte array contains a data stream which is made up of many different primitive types. For the purpose of this question, let's just assume that is contains INT's and for brevity, ignore endian.
In C, I can extract an INT very efficiently via pointers: IE
int *value;
unsigned char data[MAX_LEN_DATA];
value = (int *)&data[10];
To perform the same in Java, is it true that I need to compute the value. IE, something like:
int value;
byte[] data;
value = (data[10]<<24) | (data[11]<<16) | (data[12]<<8) | data[13];
Is there a more processor-efficient method such as the C example or does Java utilise CPU barrel shifting (Intel & AMD in my case) in which case, my question becomes superfluous ... but maybe useful for others :)
Related
I'm reading a file format that specifies some types are unsigned integers and shorts. When I read the values, I get them as a byte array. The best route to turning them into shorts/ints/longs I've seen is something like this:
ByteBuffer wrapped = ByteBuffer.wrap(byteArray);
int x = wrapped.getInt();
That looks like it could easily overflow for unsigned ints. Is there a better way to handle this scenario?
Update: I should mention that I'm using Groovy, so I absolutely don't care if I have to use a BigInteger or something like that. I just want the maximum safety on keeping the value intact.
A 32bit value, signed or unsigned, can always be stored losslessly in an int*. This means that you never have to worry about putting unsigned values in signed types from a data safety point of view.
The same is true for 8bit values in bytes, 16bit values in shorts and 64bit values in longs.
Once you've read an unsigned value into the corresponding signed type, you can promote them to signed values of a larger types to more easily work with the intended value:
Integer.toUnsignedLong(int)
Short.toUnsignedInt(short)
Byte.toUnsignedInt(byte)
Since there's no primitive type larger than long, you can either go via BigInteger, or use the convenience methods on Long to do unsigned operations:
BigInteger.valueOf(Long.toUnsignedString(long))
Long.divideUnsigned(long,long) and friends
* This is thanks to the JVM requiring integer types to be two's complement.
To hold an unsigned int/short/byte, you need to use the next "bigger" type, i.e. long/int/short. If you already hold the value in the signed type that can overflow, the conversion can be done by doing the following:
int unsignedVal = byteVal & 0xff
If you just cast them, the negative-bit will be regarded and you will still end up with the negative value.
If you have to handle unsigned longs you need to "switch" to java.math.BigInteger.
Unsigned primitives are a pain in Java.
There's no clean way of handing them, except using larger types with more bits, and taking care to avoid automatic sign extension when casting.
In your case, you can do something like this:
ByteBuffer wrapped = ByteBuffer.wrap(byteArray);
int signedInt = wrapped.getInt();
long unsigned = signedInt & 0xffffffffL;
I usually write the required conversion(s) in a utility class someplace, since they're easy to get wrong. If you copy & paste that one liner conversion everywhere, eventually one will be wrong.
Note that if you need unsigned longs, the only larger type is BigInteger.
If you need anything more than simple conversions, I suggest using Guava since it has some nice classes for dealing with unsigned types. See documentation here.
I am receiving some numerical data from a Java client via socket connection on C++ server. When I receive 4 byte int type data, what I need is just using ntohl() function or reverse the bit order to convert to c++ int type. However, I'am having trouble trying to convert long data type from Java. No matter what I tried, I could not recover the correct value. I used LONG64, ULONG64 and int64_t as well, and none of them worked.
For example, when I send long s = 1 from Java, on C++ side I did:
int64_t size;
recv(client, (char *)&size, sizeof int64_t, 0);
if I do
size = ntohl(size)
Then size will become 0 whatever the original long value is in Java !
If I don't do ntohl() conversion, then size = 72057594037927936 for s = 1
I have hardly found any useful information on this topic and I would appreciate any suggestion.
The value 72057594037927936 is 0x0100000000000000 in Hex. As you may have guessed, that's simply backwards byte ordering, the 1 is in front instead of back.
ntohl() is 32-bit, so it is throwing out those top four bytes (the first 8 hex digits), giving you zero. You could possibly use htonll instead, but that isn't quite right. The best thing is to reverse the order of the bytes yourself.
int64_t size;
recv(client, (char *)&size, sizeof int64_t, 0);
char *start = (char *)&size, *end = start + sizeof(size);
std::reverse(start, end);
There are a ton of ways of reversing the bytes, and a ton of ways of dealing with little/big endian problems in general.
Is it possible to index a Java array based on a byte?
i.e. something like
array[byte b] = x;
I have a very performance-critical application which reads b (in the code above) from a file, and I don't want the overhead of converting this to an int. What is the best way to achieve this? Is there a performance-decrease as a result of using this method of indexing rather than an int?
With many thanks,
Froskoy.
There's no overhead for "converting this to an int." At the Java bytecode level, all bytes are already ints.
In any event, doing array indexing will automatically upcast to an int anyway. None of these things will improve performance, and many will decrease performance. Just leave your code using an int.
The JVM specification, section 2.11.1:
Note that most instructions in Table 2.2 do not have forms for the integral types byte, char, and short. None have forms for the boolean type. Compilers encode loads of literal values of types byte and short using Java virtual machine instructions that sign-extend those values to values of type int at compile-time or runtime. Loads of literal values of types boolean and char are encoded using instructions that zero-extend the literal to a value of type int at compile-time or runtime. Likewise, loads from arrays of values of type boolean, byte, short, and char are encoded using Java virtual machine instructions that sign-extend or zero-extend the values to values of type int. Thus, most operations on values of actual types boolean, byte, char, and short are correctly performed by instructions operating on values of computational type int.
As all integer types in java are signed you have anyway to mask out 8 bits of b's value provided you do expect to read from the file values greater than 0x7F:
byte b;
byte a[256];
a [b & 0xFF] = x;
No; array indices are non-negative integers (JLS 10.4), but byte indices will be promoted.
No, there is no performance decrease, because on the moment you read the byte, you store it in a CPU register sometime. Those registers always works with WORDs, which means that the byte is always "converted" to an int (or a long, if you are on a 64 bit machine).
So, simply read your byte like this:
int b = (in.readByte() & 0xFF);
If your application is that performance critical, you should be optimizing elsewhere.
I have a program that I made in Python to find specific tags in TIFF IFD's and return the values. It was just a proof of concept thing in python, and now I need to move the functionality to java. I think I can just use the String(byteArray[]) constructor for the ASCII data types, but I still need to get Unsigned short (2 byte) and unsigned long (4 byte) values. I don't need to write them back to the file or modify them, all I need to do is get a Java Integer or Long object from them. This is easy in python with the struct and mmap classes, does any one know of a similar way in java? I looked at the DataInput class, but the readUnsignedLong method reads 8 bytes.
DataInputStream allows you to read shorts and longs. You should mask them with the appropriate bit mask (0xFFFF for short, 0xFFFFFFFF for 32 bit) in order to account for the difference between signed/unsigned types.
e.g.
// omits error handling
FileInputStream fis = ...;
DataInputStream stream = new DataInputStream(fis);
int short_value = 0xFFFF & stream.readShort();
long long_value = 0xFFFFFFFF & stream.readInt();
If you're sure that the data won't be towards the high end of the 2 byte field, or 4 byte field, you can forego the bit masking. Otherwise, you need to use a wider data type to account for the fact that unsigned values hold a larger range of values than their signed counterparts.
I looked at the DataInput class, but the readUnsignedLong method reads 8 bytes.
Java does not have unsigned types. It takes 4 bytes to make an int, and 8 bytes to make a long, unsigned or otherwise.
If you don't want to use DataInput, you can read the bytes into byte arrays (byte[]) and use a ByteBuffer to turn those byte values into ints and longs with left padding. See ByteBuffer#getInt() and ByteBuffer#getLong().
DataInput would be the preferred method. You can use readUnsignedShort for the two byte values. For the 4 byte values you'll have to use this workaround...
long l = dis.readInt() & 0xffffffffL;
You could use Javolution's Struct class which provides structure to regions of data. You set up a wrapper and then use the wrapper to access the data. Simples. Java really needs this super-useful class in its default classpath TBQH.
Preon Library is good to create struct in Java. I have tried Javolution's Struct but it was not help full my case. It is open source and very good library.
What will be equivalent of this in Java?
for (i = (sizeof(num)*8-1); i; i--)
num is given number, not array. I want to reverse bits in integer.
Java does not have sizeof. Arrays have the length property, and many collections have size() and similar things like that, but a linguistic sizeof for any arbitrary object is both not supported and not needed.
Related questions
Is there any sizeof-like method in Java?
In Java, what is the best way to determine the size of an object?
Getting bits of an integer in LSB-MSB order
To get the bits of an integer from its least significant bit to its most significant bit, you can do something like this:
int num = 0xABCD1234;
System.out.println(Integer.toBinaryString(num));
for (int i = 0; i < Integer.SIZE; i++) {
System.out.print((num >> i) & 1);
}
This prints:
10101011110011010001001000110100 // MSB-LSB order from toBinaryString
00101100010010001011001111010101 // LSB-MSB order from the loop
So in this specific case, the sizeof * 8 translates to Integer.SIZE, "the number of bits used to represent an int value in two's complement binary form". In Java, this is fixed at 32.
JLS 4.2.1 Integral types and values
For int, from -2147483648 to 2147483647, inclusive
This loop is likely iterating over an array in reverse order. In this case, it is an array of 'num' objects, and there are 8 elements in the array (the '-1' is necessary because an array of 8 elements has valid indices 0...7).
To do that in Java, the equivalent would be:
for(int i = array.length-1; i >= 0; --i)
in C/C++, the sizeof operator tells you how many bytes a variable or type takes on the current target platform. That is to say, it depends on the target platform, and therefore there is a keyword for discovering it.
Java targets a virtual machine, and the size of types is constant.
If num is an int, it is 4 bytes (32-bits). If it is long, it is 8 bytes (64 bits).
Furthermore, you cannot treat a variable as an array of bytes. You have to use bitwise operators (shifts, and, or etc) to manipulate the bits in a primitive like an int or long.
There isn't a direct equivalent. The sizeof returns the size of a type or the type of the expression in bytes, and this information is not available in Java.
It's not required as the sizes in bytes of the built-in types are fixed, lengths of arrays are obtained using the .length psuedo-field, and memory for objects is allocated using new, so the object size is not required.
If you tell use what the type of num is, then it can be translated.
In addition to polygenelubricants' answer, there's another way to reverse the bits of an integer in Java:
int reversed = Integer.reverse(input);
Easy!
It's worth checkout the source code for Integer.reverse, it's rather nifty (and extremely scary).