How to read unsigned values from files

How to read unsigned values from files - java

I am trying to read binary data (Doom WAD files), which contain a lot of unsigned short and unsigned byte values.
At the moment I read the file into a byte[], wrap a ByteBuffer with little-endian order around it and access the values by bbuf.getShort() etc. respectively.
Converting those e. g. to 2D-coordinates is not a problem, because in the end it won't matter if they range eg. from -128 to 128 or from 0 to 256, but more often the short values are used as array indices and short/byte values as flags/, so I need a fast way to treat them as signed types.
I know, Java doesn't have unsigned types "for sake of simplicity...".
Can you make any suggestions?

In order to save unsigned ints you need a long. Then you need to truncate last 32 bits. You can use following trick to do it.
final long UNSIGNED_INT_BITS = 0xffffffffL;
int a = -3;
long b = UNSIGNED_INT_BITS & a;
System.out.println(a);
System.out.println(b);
System.out.println(Long.toHexString(UNSIGNED_INT_BITS));
Output:
-3
4294967293
ffffffff

If all else fails, you could always store them internally as ints and make sure you do proper conversion when reading/writing.
(Read as byte/short, cast to int, add 2^bits if negative. Just truncate to 8/16 bits when writing.)
Hardly the most elegant solution, I admit.

If you need to interprete 0xFF byte as 256 do the following
int n = b & 0xFF;

Related

java - Why is 0x000F stored as unsigned?

I was reading through examples trying to understand how to convert signed bytes to unsigned integer counter parts.
The most popular method that I have come across is:
a & 0xFF
Where a is the signed byte.
My question is why is 0xFF stored as unsigned? Are all hex values stored as unsigned? If so why?
And how does "and"-ing turn off the sign bit in the sign integer?
It would be great if someone could break down the process step by step.

You probably saw this in code that converted a byte to an integer, where they wanted to treat the byte as an unsigned value in the range 0-255. It does not apply to integers in general. If you want to make an integer a "unsigned", you can do:
int unsignedA = a & 0x7FFFFFFF;
This will ensure that unsignedA is positive - but it does that by chopping off the high bit, so for example if a was -1, then unsignedA is Integer.MAX_VALUE.
There is no way to turn a 32-bit signed Java integer into a 32-bit unsigned Java integer because there is no datatype in Java for a 32-bit unsigned integer. The only unsigned integral datatype in Java is 16 bits long: char.
If you want to store a 32-bit unsigned integral value in Java, you need to store it in a long:
long unsignedA = a & 0xFFFFFFFFL;

To elaborate on Erwin's answer about converting a byte to an integer: In Java, byte is a signed integer type. That means it has values in the range -128 to 127. If you say:
byte a;
int b;
a = -64;
b = a;
The language will preserve the value; that is, it will set b to -64.
But if you really want to convert your byte to a value from 0 to 255 (which I guess you call the "unsigned counterpart" of the byte value), you can use a & 0xFF. Here's what happens:
Java does not do arithmetic directly on byte or short types. So when it sees a & 0xFF, it converts both sides to an int. The hex value of a, which is a byte, looks like
a = C0
When it's converted to a 32-bit integer, the value (-64) has to be preserved, so that means the 32-bit integer has to have 1 bits in the upper 24 bits. Thus:
a = C0
(int)a = FFFFFFC0
But then you "and" it with 0xFF:
a = C0
(int)a = FFFFFFC0
& 000000FF
--------
a & FF = 000000C0
And the result is an integer in the range 0 to 255.

In Java, literals (1, 0x2A, etc) are positive unless you explicitly indicate that they are negative. It's how we intuitively write numbers.
This previous question answers you question about converting to unsigned. Understanding Java unsigned numbers

Java how to parse uint8 in java?

I have a uint8 (unsigned 8 bit integer) coming in from a UDP packet. Java only uses signed primitives. How do I parse this data structure correctly with java?

Simply read it as as a byte and then convert to an int.
byte in = udppacket.getByte(0); // whatever goes here
int uint8 = in & 0xFF;
The bitmask is needed, because otherwise, values with bit 8 set to 1 will be converted to a negative int. Example:
This: 10000000
Will result in: 11111111111111111111111110000000
So when you afterwards apply the bitmask 0xFF to it, the leading 1's are getting cancelled out. For your information: 0xFF == 0b11111111

0xFF & number will treat the number as unsigned byte. But the resultant type is int

You can store 8-bit in a byte If you really need to converted it to an unsigned value (and often you don't) you can use a mask
byte b = ...
int u = b & 0xFF; // unsigned 0 .. 255 value

You can do something like this:
int value = eightBits & 0xff;
The & operator (like all integer operators in Java) up-casts eightBits to an int (by sign-extending the sign bit). Since this would turn values greater than 0x7f into negative int values, you need to then mask off all but the lowest 8 bits.

You could simply parse it into a short or an int, which have enough range to hold all the values of an unsigned byte.

Declaring an unsigned int in Java

Is there a way to declare an unsigned int in Java?
Or the question may be framed as this as well:
What is the Java equivalent of unsigned?
Just to tell you the context I was looking at Java's implementation of String.hashcode(). I wanted to test the possibility of collision if the integer were 32 unsigned int.

Java does not have a datatype for unsigned integers.
You can define a long instead of an int if you need to store large values.
You can also use a signed integer as if it were unsigned. The benefit of two's complement representation is that most operations (such as addition, subtraction, multiplication, and left shift) are identical on a binary level for signed and unsigned integers. A few operations (division, right shift, comparison, and casting), however, are different. As of Java SE 8, new methods in the Integer class allow you to fully use the int data type to perform unsigned arithmetic:
In Java SE 8 and later, you can use the int data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 2^32-1. Use the Integer class to use int data type as an unsigned integer. Static methods like compareUnsigned, divideUnsigned etc have been added to the Integer class to support the arithmetic operations for unsigned integers.
Note that int variables are still signed when declared but unsigned arithmetic is now possible by using those methods in the Integer class.

Whether a value in an int is signed or unsigned depends on how the bits are interpreted - Java interprets bits as a signed value (it doesn't have unsigned primitives).
If you have an int that you want to interpret as an unsigned value (e.g. you read an int from a DataInputStream that you know should be interpreted as an unsigned value) then you can do the following trick.
int fourBytesIJustRead = someObject.getInt();
long unsignedValue = fourBytesIJustRead & 0xffffffffL;
Note, that it is important that the hex literal is a long literal, not an int literal - hence the 'L' at the end.

We needed unsigned numbers to model MySQL's unsigned TINYINT, SMALLINT, INT, BIGINT in jOOQ, which is why we have created jOOU, a minimalistic library offering wrapper types for unsigned integer numbers in Java. Example:
import static org.joou.Unsigned.*;
// and then...
UByte b = ubyte(1);
UShort s = ushort(1);
UInteger i = uint(1);
ULong l = ulong(1);
All of these types extend java.lang.Number and can be converted into higher-order primitive types and BigInteger. Hope this helps.
(Disclaimer: I work for the company behind these libraries)

For unsigned numbers you can use these classes from Guava library:
UnsignedInteger
UnsignedLong
They support various operations:
plus
minus
times
mod
dividedBy
The thing that seems missing at the moment are byte shift operators. If you need those you can use BigInteger from Java.

Perhaps this is what you meant?
long getUnsigned(int signed) {
return signed >= 0 ? signed : 2 * (long) Integer.MAX_VALUE + 2 + signed;
}
getUnsigned(0) → 0
getUnsigned(1) → 1
getUnsigned(Integer.MAX_VALUE) → 2147483647
getUnsigned(Integer.MIN_VALUE) → 2147483648
getUnsigned(Integer.MIN_VALUE + 1) → 2147483649

Use char for 16 bit unsigned integers.

There are good answers here, but I don’t see any demonstrations of bitwise operations. Like Visser (the currently accepted answer) says, Java signs integers by default (Java 8 has unsigned integers, but I have never used them). Without further ado, let‘s do it...
RFC 868 Example
What happens if you need to write an unsigned integer to IO? Practical example is when you want to output the time according to RFC 868. This requires a 32-bit, big-endian, unsigned integer that encodes the number of seconds since 12:00 A.M. January 1, 1900. How would you encode this?
Make your own unsigned 32-bit integer like this:
Declare a byte array of 4 bytes (32 bits)
Byte my32BitUnsignedInteger[] = new Byte[4] // represents the time (s)
This initializes the array, see Are byte arrays initialised to zero in Java?. Now you have to fill each byte in the array with information in the big-endian order (or little-endian if you want to wreck havoc). Assuming you have a long containing the time (long integers are 64 bits long in Java) called secondsSince1900 (Which only utilizes the first 32 bits worth, and you‘ve handled the fact that Date references 12:00 A.M. January 1, 1970), then you can use the logical AND to extract bits from it and shift those bits into positions (digits) that will not be ignored when coersed into a Byte, and in big-endian order.
my32BitUnsignedInteger[0] = (byte) ((secondsSince1900 & 0x00000000FF000000L) >> 24); // first byte of array contains highest significant bits, then shift these extracted FF bits to first two positions in preparation for coersion to Byte (which only adopts the first 8 bits)
my32BitUnsignedInteger[1] = (byte) ((secondsSince1900 & 0x0000000000FF0000L) >> 16);
my32BitUnsignedInteger[2] = (byte) ((secondsSince1900 & 0x000000000000FF00L) >> 8);
my32BitUnsignedInteger[3] = (byte) ((secondsSince1900 & 0x00000000000000FFL); // no shift needed
Our my32BitUnsignedInteger is now equivalent to an unsigned 32-bit, big-endian integer that adheres to the RCF 868 standard. Yes, the long datatype is signed, but we ignored that fact, because we assumed that the secondsSince1900 only used the lower 32 bits). Because of coersing the long into a byte, all bits higher than 2^7 (first two digits in hex) will be ignored.
Source referenced: Java Network Programming, 4th Edition.

It seems that you can handle the signing problem by doing a "logical AND" on the values before you use them:
Example (Value of byte[] header[0] is 0x86 ):
System.out.println("Integer "+(int)header[0]+" = "+((int)header[0]&0xff));
Result:
Integer -122 = 134

Just made this piece of code, wich converts "this.altura" from negative to positive number. Hope this helps someone in need
if(this.altura < 0){
String aux = Integer.toString(this.altura);
char aux2[] = aux.toCharArray();
aux = "";
for(int con = 1; con < aux2.length; con++){
aux += aux2[con];
}
this.altura = Integer.parseInt(aux);
System.out.println("New Value: " + this.altura);
}

You can use the Math.abs(number) function. It returns a positive number.

Sign(+/-) error in byte in Java byte setting operations

I am declearing in Java
public byte[] orbits = new byte[38];
Now if I am doing
orbits[24] = (byte)0xFF;
orbits[24] should get populated by 11111111 i.e FF(in hexadecimal) but instead its getting populated with -1.
This operation in C++ working perfectly
char orbits[38]
orbits[24] = (char)0xFF;
How to replicate the similar situation in Java using byte?
Thanks

Well, it just happens that -1 is 0xFF. Everything is correct. byte stores values from -128 to 127 using two's complement.
In Java there are no unsigned types. If you want to use bit patterns, then use byte. 0xFF and -1 are the same thing in this situation. If you want to use numbers, that is, 0xFF is actually 255 and not -1, then you need to use a bigger type, like short.

When casting a small integer type to a wider one, is it safe to rely on &ing with a mask to remove the sign?

I have code that stores values in the range 0..255 in a Java byte to save space in large data collections (10^9 records spread over a couple hundred arrays).
Without additional measures on recovery, the larger values are interpreted as being negative (because the Java integer types use two's complement representation).
I got this helpful hint from starblue in response to a related question, and I'm wondering if this technique is safe to rely on:
int iOriginal = 128, iRestore;
byte bStore = (byte) iOriginal; // reading this value directly would yield -128
iRestore = 0xff & bStore;

Yes, it's safe, indeed it's the most effective way of converting a byte into an (effectively) unsigned integer.
The byte half of the and operation will be sign-extended to an int, i.e. whatever was in bit 7 will be expanded into bits 8-31.
Masking off the bottom eight bits (i.e. & 0xff) then gives you an int that has zero in every bit from 8 - 31, and must therefore be in the range 0 ... 255.
See a related answer I gave here.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read unsigned values from files - java

If all else fails, you could always store them internally as ints and make sure you do proper conversion when reading/writing. (Read as byte/short, cast to int, add 2^bits if negative. Just truncate to 8/16 bits when writing.) Hardly the most elegant solution, I admit.

If you need to interprete 0xFF byte as 256 do the following int n = b & 0xFF;

Related

java - Why is 0x000F stored as unsigned?

Java how to parse uint8 in java?

Declaring an unsigned int in Java

Sign(+/-) error in byte in Java byte setting operations

When casting a small integer type to a wider one, is it safe to rely on &ing with a mask to remove the sign?

Categories

Resources