Perform bitwise OR without sign extension in Java - java

In Java, I have the following code:
byte lo = 0x88;
byte hi = 0x0;
short x = 0;
x |= lo;
x |= (short) ((short) hi << 8); // FIXME double cast ??
return x;
After the first operation, the value in the short is 0xff88. I understand this is because Java does not support unsigned numbers.
How can I force it to ignore the sign? I'm not using this data directly in Java code so it shouldn't matter, right?

The sign is appearing when the byte is converted to an int prior to performing the bitwise operation. The conversion must sign extend the byte value because byte is a signed type in Java. The conversion must occur because bitwise operations are either operations on int values or long values.
There are two solutions:
Mask the top bits, either after the conversion to an int, or after performing the bitwise OR; e.g.
x |= (lo & 0xff);
x |= (hi & 0xff) << 8;
(It turns out that the masking is unnecessary for hi ... because of what you do later. However, the chances are that the JIT compiler will optimize away any unnecessary masking instructions. And if it doesn't, the overhead is so small that it it is unlikely to matter.)
Use Byte.toUnsignedInt(byte) to convert the byte to an int.

Related

Negative of UInt8

I have this code in Objective-C and I want to write it in Java. However there is the statement (-(crc & 1)) that gives me problems. After some googeling it seems that the negative of an unsigned is generally not well defined. I didn't quite understand it though. Maybe somebody who knows CRC can tell me what the equivalent in Java would be.
I need exactly an equivalent method as a device with this Objective-C code checks my calculated crc against its self calculated.
+ (UInt32) crc32:(UInt32) crc ofByte: (UInt8) byte
{
int8_t i;
crc = crc ^ byte;
for (i = 7; i >= 0; i--)
{
crc = (crc >> 1) ^ (0xedb88320ul & (-(crc & 1)));
}
return(crc);
}
The solved code with help of comments. Seems like I searched the mistake in the wrong place.
private void addByteToCRC(byte oneByte) {
crc ^= oneByte & 0xFF;
for (int i = 7; i >= 0; i--)
{
crc = (crc >>> 1) ^ (0xedb88320 & (-(crc & 1)));
}
}
Both checksums start of with a crc value of 0xFFFFFFFF.
The -(crc & 1) works fine, and is not your problem. What it does is to copy the low bit of crc to all of the bits, since in two's complement representation, โ€“1 is all ones.
There are two things you need to fix for Java, which does not have unsigned integers. First is that the right shift needs to be a logical right shift, >>>. That avoids copying the sign bit when shifting down, and instead shifts in a zero bit. Second, you need to guard against a character being signed-extended to an int, so byte (which will need a different name in Java) needs to be anded with 0xff. So:
crc ^= octet & 0xff;
and
crc = (crc >>> 1) ^ (0xedb88320ul & (-(crc & 1)));
There are some differences between (Objective-)C and Java, but at a guess you may have identified the wrong problem. Java does not have unsigned integer types. However it does have bitwise operators which treat integers as just strings of bits, and later versions of Java do have unsigned operators which when applied to integers operate on the underlying bit patterns as though they represented unsigned values.
The expression crc & 1 will yield a result of 0 or 1 depending on whether crc was even or odd.
The expression -(crc & 1) therefore evaluates to 0 or -1. The bit pattern of 0 is all zeros and for -1 it is all ones - the latter is true in (Objective-)C regardless of the underlying representing of integers as 1's or 2's complement, see Is it safe to use -1 to set all bits to true? for an explanation.
The expression 0xedb88320ul & (-(crc & 1)) therefore evaluates to 0 or 0xedb88320.
The expression crc >> 1 is the logical, or unsigned, shift right in (Objective-)C as src is defined as unsigned (right shifts for signed integers are "implementation defined" in (Objective-)C).
The expression x ^ 0 for any x evaluates to x.
Put that all together and your whole expression (crc >> 1) ^ (0xedb88320ul & (-(crc & 1))) conditionally xors crc >> 1 with 0xedb88320 base don whether crc is odd or even.
Now you just need to reproduce the same behaviour in Java.
Java uses 32-bit 2's complement integers for int (JLS: Primitive Types and Values) and provides bitwise operators which treat them as just bit-strings.
The hexadecimal constant comes just 0xedb88320 and has int (32-bit) type.
The expression -(crc & 1) therefore evaluates as in (Objective-)C.
However the expression crc >> 1 will not evaluate the same as in (Objective-)C. In Java the >> is an arithmetic shift right, which replicates the sign-bit during the shift. This choice stems from Java using signed types. Fortunately Java also provides the >>> operator to perform a logical, or unsigned, right-shift.
So my guess is you may have identified the wrong problem in your conversion and you need to use the >>> operator.
HTH

How to shift 32 bit int by 32 (yet again)

Ok, so I know that typically left- and right- shifts are well defined only for values 0..31. I was thinking how to best extend this to include 32, which simplifies some algorithms. I came up with:
int32 << n & (n-32) >> 5
Which seems to work. Question is, is it guaranteed to work on any architecture (C, C++, Java), and can it be done more effectively?
In Java it's guaranteed to work if those variables are of type int, since >> in Java does an arithmetic right shift and shifting more than 31 also has defined behavior. But beware of operator precedence
int lshift(int x, int n)
{
return (x << n) & ((n-32) >> 5);
}
This will work for shift count up to 32. But it can be modified to include any int values with shift counts larger than 31 return 0
return (x << n) & ((n-32) >> 31);
However in C and C++ the size of int type and the behavior of >> operator is implementation defined. Most (if not all) modern implementations implement it as arithmetic shift for signed types though. Besides, the behavior of shifting more than the variable width is undefined. Worse yet, signed overflow invokes UB so even a left shift by 31 is also UB (until C++14). Therefore to get a well defined output you need to
Use a unsigned fixed-width type like uint32_t (so x << 31 isn't UB)
Use a compiler that emits arithmetic right shift instruction for >> and use a signed type for n, or implement the arithmetic shift yourself
Mask the shift amount to limit it to 5 bits for int32_t
The result would be
uint32_t lshift(uint32_t x, int32_t n)
{
return (x << (n & 0x1F)) & ((n-32) >> 31);
}
If the architecture supports conditional instructions like x86 or ARM then the following way may be faster
return n < 32 ? x << n : 0;
On a 64-bit platform you can made this even simpler by shifting in a 64-bit type and then mask. Some 32-bit platforms like ARM does support shifting by 32 so this method is also efficient
return ((uint64_t)x << (n & 0x3F)) & 0xFFFFFFFFU;
You can see the output assembly here. I don't see how it can be improved further

Which real use cases exist for arithmetic right bit shifting?

I stumbled upon a question that asks whether you ever had to use bit shifting in real projects. I have used bit shifts quite extensively in many projects, however, I never had to use arithmetic bit shifting, i.e., bit shifting where the left operand could be negative and the sign bit should be shifted in instead of zeros. For example, in Java, you would do arithmetic bit shifting with the >> operator (while >>> would perform a logical shift). After thinking a lot, I came to the conclusion that I have never used the >> with a possibly negative left operand.
As stated in this answer arithmetic shifting is even implementation defined in C++, so โ€“ in contrast to Java โ€“ there is not even a standardized operator in C++ for performing arithmetic shifting. The answer also states an interesting problem with shifting negative numbers that I was not even aware of:
+63 >> 1 = +31 (integral part of quotient E1/2E2)
00111111 >> 1 = 00011111
-63 >> 1 = -32
11000001 >> 1 = 11100000
So -63>>1 yields -32 which is obvious when looking at the bits, but maybe not what most programmers would anticipate on first sight. Even more surprising (but again obvious when looking at the bits) is that -1>>1 is -1, not 0.
So, what are concrete use cases for arithmetic right shifting of possibly negative values?
Perhaps the best known is the branchless absolute value:
int m = x >> 31;
int abs = x + m ^ m;
Which uses an arithmetic shift to copy the signbit to all bits. Most uses of arithmetic shift that I've encountered were of that form. Of course an arithmetic shift is not required for this, you could replace all occurrences of x >> 31 (where x is an int) by -(x >>> 31).
The value 31 comes from the size of int in bits, which is 32 by definition in Java. So shifting right by 31 shifts out all bits except the signbit, which (since it's an arithmetic shift) is copied to those 31 bits, leaving a copy of the signbit in every position.
It has come in handy for me before, in the creation of masks that were then used in '&' or '|' operators when manipulating bit fields, either for bitwise data packing or bitwise graphics.
I don't have a handy code sample, but I do recall using that technique many years ago in black-and-white graphics to zoom in (by extending a bit, either 1 or 0). For a 3x zoom, '0' would become '000' and '1' would become '111' without having to know the initial value of the bit. The bit to be expanded would be placed in the high order position, then an arithmetic right shift would extend it, regardless of whether it was 0 or 1. A logical shift, either left or right, always brings in zeros to fill vacated bit positions. In this case the sign bit was the key to the solution.
Here's an example of a function that will find the least power of two greater than or equal to the input. There are other solutions to this problem that are probably faster, namly any hardware oriented solution or just a series of right shifts and ORs. This solution uses arithmetic shift to perform a binary search.
unsigned ClosestPowerOfTwo(unsigned num) {
int mask = 0xFFFF0000;
mask = (num & mask) ? (mask << 8) : (mask >> 8);
mask = (num & mask) ? (mask << 4) : (mask >> 4);
mask = (num & mask) ? (mask << 2) : (mask >> 2);
mask = (num & mask) ? (mask << 1) : (mask >> 1);
mask = (num & mask) ? mask : (mask >> 1);
return (num & mask) ? -mask : -(mask << 1);
}
Indeed logical right shift is much more commonly used. However there are many operations that require an arithmetic shift (or are solved much more elegantly with an arithmetic shift)
Sign extension:
Most of the time you only deal with the available types in C and the compiler will automatically sign extend when casting/promoting a narrower type to a wider one (like short to int) so you may not notice it, but under the hood a left-then-right shift is used if the architecture doesn't have an instruction for sign extension. For "odd" number of bits you'll have to do the sign extension manually so this would be much more common. For example if a 10-bit pixel or ADC value is read into the top bits of a 16-bit register: value >> 6 will move the bits to the lower 10 bit positions and sign extend to preserve the value. If they're read into the low 10 bits with the top 6 bits being zero you'll use value << 6 >> 6 to sign extend the value to work with it
You also need signed extension when working with signed bit fields
struct bitfield {
int x: 15;
int y: 12;
int z: 5;
};
int f(bitfield b) {
return (b.x/8 + b.y/5) * b.z;
}
Demo on Godbolt. The shifts are generated by the compiler but usually you don't use bitfields (as they're not portable) and operate on raw integer values instead so you'll need to do arithmetic shifts yourself to extract the fields
Another example: sign-extend a pointer to make a canonical address in x86-64. This is used to store additional data in the pointer: char* pointer = (char*)((intptr_t)address << 16 >> 16). You can think of this as a 48-bit bitfield at the bottom
V8 engine's SMI optimization stores the value in the top 31 bits so it needs a right shift to restore the signed integer
Round signed division properly when converting to a multiplication, for example x/12 will be optimized to x*43691 >> 19 with some additional rounding. Of course you'll never do this in normal scalar code because the compiler already does this for you but sometimes you may need to vectorize the code or make some related libraries then you'll need to calculate the rounding yourself with arithmetic shift. You can see how compilers round the division results in the output assembly for bitfield above
Saturated shift or shifts larger than bit width, i.e. the value becomes zero when the shift count >= bit width
uint32_t lsh_saturated(uint32_t x, int32_t n) // returns 0 if n == 32
{
return (x << (n & 0x1F)) & ((n-32) >> 5);
}
uint32_t lsh(uint32_t x, int32_t n) // returns 0 if n >= 32
{
return (x << (n & 0x1F)) & ((n-32) >> 31);
}
Bit mask, useful in various cases like branchless selection (i.e. muxer). You can see lots of ways to conditionally do something on the famous bithacks page. Most of them are done by generating a mask of all ones or all zeros. The mask is usually calculated by propagating the sign bit of a subtraction like this (x - y) >> 31 (for 32-bit ints). Of course it can be changed to -(unsigned(x - y) >> 31) but that requires 2's complement and needs more operations. Here's the way to get the min and max of two integers without branching:
min = y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1)));
max = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1)));
Another example is m = m & -((signed)(m - d) >> s); in Compute modulus division by (1 << s) - 1 in parallel without a division operator
I am not too sure what you mean. BUt i'm going to speculate that you want to use the bit shift as an arithmetic function.
One interesting thing i have seen is this property of binary numbers.
int n = 4;
int k = 1;
n = n << k; // is the same as n = n * 2^k
//now n = (4 * 2) i.e. 8
n = n >> k; // is the same as n = n / 2^k
//now n = (8 / 2) i.e. 4
hope that helps.
But yes you want to be careful of negative numbers
i would mask and then turn it back accordingly
In C when writing device drivers, bit shift operators are used extensively since bits are used as switches that need to be turned on and off. Bit shift allow one to easily and correctly target the right switch.
Many hashing and cryptographic functions make use of bit shift. Take a look at Mercenne Twister.
Lastly, it is sometimes useful to use bitfields to contain state information. Bit manipulation functions including bit shift are useful for these things.

Java how to parse uint8 in java?

I have a uint8 (unsigned 8 bit integer) coming in from a UDP packet. Java only uses signed primitives. How do I parse this data structure correctly with java?
Simply read it as as a byte and then convert to an int.
byte in = udppacket.getByte(0); // whatever goes here
int uint8 = in & 0xFF;
The bitmask is needed, because otherwise, values with bit 8 set to 1 will be converted to a negative int. Example:
This: 10000000
Will result in: 11111111111111111111111110000000
So when you afterwards apply the bitmask 0xFF to it, the leading 1's are getting cancelled out. For your information: 0xFF == 0b11111111
0xFF & number will treat the number as unsigned byte. But the resultant type is int
You can store 8-bit in a byte If you really need to converted it to an unsigned value (and often you don't) you can use a mask
byte b = ...
int u = b & 0xFF; // unsigned 0 .. 255 value
You can do something like this:
int value = eightBits & 0xff;
The & operator (like all integer operators in Java) up-casts eightBits to an int (by sign-extending the sign bit). Since this would turn values greater than 0x7f into negative int values, you need to then mask off all but the lowest 8 bits.
You could simply parse it into a short or an int, which have enough range to hold all the values of an unsigned byte.

How to convert a (short) sample in a soundfile to array of bytes

In converting from short to a byte array I found the following solution on the web but could not quite understand the logic involved.
//buffer is an array of bytes, bytes[]
buffer[position] = (byte)(sample & 0xff);
buffer[position+1] = (byte)((sample >> 8) & 0xff);
Can someone tell me why 0xff (256) is being anded to the sample which is a short?
This code probably comes from C code (or was written by a C programmer who don't parse Java as well as erickson does). This is because in Java a cast from a type with more information to a type with less information will discard the higher order bits and thus the & 0xff is unnecessary in both cases.
A short has 16 bits, two bytes. So it needs to take two slots in the byte array, because if we just casted a short into a byte one byte would be lost.
So, the code you have does
1110001100001111 sample
0000000011111111 0xff
0000000000001111 sample & 0xff => first byte`
Then, displaces the sample to get the second byte
0000000011100011 sample >> 8
0000000011111111 0xff
0000000011100011 (sample >> 8 ) & 0xff => second byte
Which can be a lot better written as erickson shows below (hopefully it'll be above soon).
The other answers have some good information, but unfortunately, they both promote an incorrect idea about the cast to byte.
The & 0xFF is unnecessary in both cases in your original code.
A narrowing cast discards the high-order bits that don't fit into the narrower type. In fact, the & 0xFF actually first causes the short to be promoted to an int with the most significant 24 bits cleared, which is then chopped down and stuffed in a byte by the cast. See the Java Language Specification Section ยง5.1.3 for details.
buffer[position] = (byte) sample;
buffer[position+1] = (byte) (sample >>> 8);
Also, note my use of the right shift with zero extension, rather than sign extension. In this case, since you're immediately casting the result of the shift to a byte, it doesn't matter. In general, however, the operators give different results, and you should be deliberate in what you choose.
It makes sure there's no overflow; specifically, the first line there is taking the LSByte of "sample" and masking OUT your upper 8 bits, giving you only values in the range of 0-255; the second line there is taking the MSByte of "sample" (by performing the right-shift) and doing the same thing. It shouldn't be necessary on the second line, since the right-shift by 8 should drop out the 8 least significant bits, but it does make the code a little bit more symmetric.
I would assume that this is because since sample is a short (2 bytes) any values in the range of 256-6553 would be interpreted as 255 by the byte conversion.
buffer[position] = (byte)(sample & 0xff);
buffer[position+1] = (byte)((sample >> 8) & 0xff);
must be :
buffer[position] = (byte)((sample >> 8) & 0xff);
buffer[position+1] = (byte)(sample & 0xff);

Categories