Sign extension, bit shifting in JAVA. Help understanding a C-code bit

Sign extension, bit shifting in JAVA. Help understanding a C-code bit - java

I have the following C-code (from FFMPEG):
static inline av_const int sign_extend(int val, unsigned bits)
{
unsigned shift = 8 * sizeof(int) - bits;
union { unsigned u; int s; } v = { (unsigned) val << shift };
return v.s >> shift;
}
I'm trying to reproduce this in JAVA. But I have difficulties understanding this. No matter how I toss the bits around, I don't get very close.
As for the value parameter: it takes unsigned byte value as int.
Bits parameter: 4
If the value is 255 and bits is 4. It returns -1. I can't reproduce this in JAVA. Sorry for such fuzzy question. But can you help me understand this code?
The big picture is that I'm trying to encode EA ADPCM audio in JAVA. In FFMPEG:
https://gitorious.org/ffmpeg/ffmpeg/source/c60caa5769b89ab7dc4aa41a21f87d5ee177bd30:libavcodec/adpcm.c#L981

Strictly speaking, the result of running this code with this input data has unspecified results because signed bitshift in C is only properly defined in circumstances that this scenario does not meet. From the C99 standard:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has unsigned type or if E1 has signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and negative value, the resulting value is implementation-defined.
(Emphasis mine)
But let's assume that your implementation defines signed right shift to extend the sign, meaning that the space on the left will be filled with ones if the sign bit is set and zeroes otherwise; the ffmpeg code clearly expects this to be the case. The following is happening: shift has the value of 28 (assuming 32-bit integers). In binary notation:
00000000 00000000 00000000 11111111 = val
11110000 00000000 00000000 00000000 = (unsigned) val << shift
Note that when interpreting (unsigned) val << shift as signed integer, as the code proceeds to do (assuming two's complement representation, as today's computers all use1), the sign bit of that integer is set, so a signed shift to the right fills up with zeroes from the left, and we get
11110000 00000000 00000000 00000000 = v.s
11111111 11111111 11111111 11111111 = v.s >> shift
...and in two's complement representation, that is -1.
In Java, this trick works the same way -- except better, because there the behavior is actually guaranteed. Simply:
public static int sign_extend(int val, int bits) {
int shift = 32 - bits; // int always has 32 bits in Java
int s = val << shift;
return s >> shift;
}
Or, if you prefer:
public static int sign_extend(int val, int bits) {
int shift = 32 - bits;
return val << shift >> shift;
}
1 Strictly speaking, this conversion does not have a well-defined value in the C standard either, for historical reasons. There used to be computers that used different representations, and the same bit pattern with a set sign bit has a completely different meaning in (for example) signed magnitude representation.

The reason why the code looks so odd is that C language is full of undefined behaviours that in Java are well-defined. For example in C bit-shifting a signed integer left so that the sign-bit changes is undefined behaviour and at that point the program can do anything - whatever the compiler causes the program to do - crash, print 42, make true = false, anything can happen, and the compiler still compiled it correctly.
Now the code uses a 1 trick to shift the integer left: it uses an union that lays the bytes of members top of each other - making an unsigned and an signed integer to occupy the same bytes; the bitshift is defined with the unsigned integer; so we do the unsigned shift using it; then shift back using the signed shift (the code assumes that the right shift of a negative number produces properly sign-extended negative numbers, which is also not guaranteed by standard but usually these kinds of libraries have a configuration utility that can refuse compilation on such a quite esoteric platform; likewise this program assumes that CHAR_BIT is 8; however C only makes a guarantee that a char is at least 8 bits wide.
In Java, you do not need anything like a union to accomplish this; instead you do:
static int signExtend(int val, int bits) {
int shift = 32 - bits; // fixed size
int v = val << shift;
return v >> shift;
}
In Java the width of an int is always 32 bits; << can be used for both signed and unsigned shift; and there is no undefined behaviour for extending to the sign bit; >> can be used for signed shift (>>> would be unsigned).

given this code:
static inline av_const int sign_extend(int val, unsigned bits)
{
unsigned shift = 8 * sizeof(int) - bits;
union { unsigned u; int s; } v = { (unsigned) val << shift };
return v.s >> shift;
}
the 'static' modifier says the function is not visible outside the current file.
The 'inline' modifier is a 'request' to the compiler to place the code 'inline' whereever the function is called rather than having a separate function with the associated call/return code sequences
the 'sign_extend' is the name of the function
in C, a right shift, for a signed value will propagate the sign bit,
In C, a right shift, for a unsigned value will zero fill.
It looks like your java is doing the zero fill.
regarding this line:
unsigned shift = 8 * sizeof(int) - bits;
on a 32bit machine, an integer is 32 bits and size of int is 4
so the variable 'shift' will contain (8*4)-bits
regarding this line:
union { unsigned u; int s; } v = { (unsigned) val << shift };
left shift of unsigned will shift the bits left,
with the upper bits being dropped into the bit bucket
and the lower bits being zero filled.
regarding this line:
return v.s >> shift;
this shifts the bits back to their original position,
while propagating the (new) sign bit

Related

Non cryptographic hashing in Java using unsigned integers

I am looking for a hashing function that can be used for non cryptographic purposes in Java. The challenge is that most of the hashing functions return signed integer values (-,0,+) that cannot be used as identifier in every context (for example negative integers cannot be used in URLs). One solution to this problem is that I come up with is to use a 32 bit signed int and convert it to a 32 bit unsigned int and store it in a long. This works pretty well. However, 32 bit random information makes hash collisions too frequent in our setup. One way of solving this is to use a 64 bit hashing function (same SipHash works fine) and convert that signed integer to unsigned by shifting one to the right and having 0 in the MSB position. I was trying to achieve that with the Java >> operator but the results does not make sense.
//Using Guava
private final static HashFunction hashFunction = Hashing.sipHash24();
private static int getRandomInt() {
return hashFunction.newHasher().putLong(rnd.nextLong()).hash().asInt();
}
private static long getRandomLong(){
return hashFunction.newHasher().putLong(rnd.nextLong()).hash().asLong();
}
Bitshifting:
System.out.println(Long.toBinaryString(-2147483648L >> 1));
1111111111111111111111111111111111000000000000000000000000000000
What am I missing and how could I have a 62 bit unsigned integer hash value stored in a 64 bit int (long) in Java?
UPDATE1:
After doing some research I finally found a way to correctly display the effect of >>> on a Long value:
System.out.println(
String.format("%64s", Long.toBinaryString(-2147483648L))
.replace(' ', '0'));
System.out.println(
String.format("%64s", Long.toBinaryString(-2147483648L >>> 1))
.replace(' ', '0'));
1111111111111111111111111111111110000000000000000000000000000000
0111111111111111111111111111111111000000000000000000000000000000

a >> b
shifts a to the right by b bits. On the left it repeats the bit that was already there (sign extending!).
Examples:
101010 >> 1 = 110101
010101 >> 1 = 001010
a >>> b
also shifts a to the right by b bits, but doesn't sign extend. It always adds in zeros on the left:
101010 >>> 1 = 010101
010101 >>> 1 = 001010

Storing unsigned long value in two 16bit register

I want to store unsigned long value in two 16bit register.For example if I have long value (-2,147,483,648 to 2,147,483,647) then I'm using formula like:
v[0] = myValue % 65536
v[1] = myValue / 65536
To get value from register
outVal = reg0 + (reg1 * 65536)
But how to do for unsigned long which value range is from 0 to 4,294,967,295?

As commenter harold pointed out already, your formula doesn't even work correctly for negative numbers.
You should do it bitwise instead of using math to avoid surprises (and speed things up in case the compiler didn't optimize it for you already).
Splitting:
v[0] = myValue & 0xFFFF
v[1] = myValue >> 16 // this implicitly cuts off the lower 16 bits
// by shifting them away into the nirvana
Joining:
outVal = reg0 | (reg1 << 16)
This now applies to both signed and unsigned (provided that all your variables have the same "sign type").
Legend, in case your language (which you didn't specify) uses different operators:
& is bitwise AND, | is bitwise OR, << and >> are bitwise shifting left/right (SHL/SHR), 0x marks a hexadecimal literal (you could use 65536 instead of 0xFFFF, but I think the hex literal makes it clearer where this magic number comes from).

Narrowing from int to short [duplicate]

This question already has answers here:
Java - Explicit Conversion from Int to Short
(3 answers)
Closed 7 years ago.
I am working on narrowing and checked the following code :-
int i = 131072;
short s = (short)i;
System.out.println(s); //giving 0
This narrowing is outputting 0. I am not able to get the logic behind.

131072 int is 00000000 00000010 00000000 00000000 in binary.
When you case it to short, only the lowest 16 bits remain - 00000000 00000000.

When you cast a primitive to a smaller primitive the top bits are dropped.
Written another way you can see what is happening.
int i = 0x20000;
short s = (short) (i & 0xFFFF);
Note: the lower 16 bits of your integer are all zero, so the answer is 0.
As binary casting to (short) keeps only the lower 16 bits.
00000000 00000010 (00000000 00000000)
If you were to cast a longer number, it would still take the lower bits in each case. Note: the & in each case is redundant and only to help clarity.
long l = 0x0FEDCBA987654321L;
// i = 0x87654321
int i = (int) (l & 0xFFFFFFFFL);
// c = \u4321
char c = (char) (l & 0xFFFF);
// s = 0x4321
short s = (short) (l & 0xFFFF);
// b = 0x21
byte b = (byte) (l & 0xFF);

Primitive values (int, short,...) are stored as binary values. int uses more bits than short. When you try to downcast you're cutting away bits which truncates (and potentially ruins) the value.

This is not a down cast (which refers to objects), it's a narrowing cast, or truncation. When you perform such a cast, you just copy the two least significant bytes of the int to your short. If the integer if smaller than 215 you'd just ignore bytes containing zeroes, so it would just work.
This is not the case here, however. If you examine the binary representation of 131072 you'll see it's 100000000000000000. So, the two least significant bytes are clearly 0, which is exactly what you're getting.

BigInteger unsigned left or right shift

I am reimplementing a function using BigInteger in place in int. Now there is step
h = n >>> log2n--
But I am facing trouble here. In original code h, n, log2n all are int type, if I set h, n, and log2n to BigInteger what will be the equivalent expression of the above code? How do I perform an unsigned right shift (>>>) in BigInteger?
Edit:
The code block is :
int log2n = 31 - Integer.numberOfLeadingZeros(n);
int h = 0, shift = 0, high = 1;
while (h != n)
{
shift += h;
h = n >>> log2n--;
int len = high;
high = (h & 1) == 1 ? h : h - 1;
len = (high - len) / 2;
if (len > 0)
{
p = p.multiply(product(len));
r = r.multiply(p);
}
}

Quoting from the Java docs:
The unsigned right shift operator
(>>>) is omitted, as this operation
makes little sense in combination with
the "infinite word size" abstraction
provided by this class.
An 32-bit integer representation of -1 is (in binary)
11111111 11111111 11111111 11111111
If you use the signed right-shift operator (>>) on this, you'll get
11111111 11111111 11111111 11111111
i.e. the same thing. If you use the unsigned right-shift operator on this, shifting by 1, you'll get
01111111 11111111 11111111 11111111.
But BigInteger has an unlimited length. The representation of -1 in a BigInteger is theoretically
11111111 111... infinite 1s here..... 11111111
The unsigned right-shift operator would imply that you were putting a 0 at the leftmost point - which is at infinity. Since this makes little sense, the operator is omitted.
As regards your actual code, what you need to do now depends on what the surrounding code is doing and why an unsigned shift was chosen for the original code. Something like
n.negate().shiftRight(log2n)
might work, but it all depends on the circumstances.

I finally found a solution, it's awful, but it works:
public BigInteger srl(BigInteger l, int width, int shiftBy) {
if (l.signum() >= 0)
return l.shiftRight(shiftBy);
BigInteger opener = BigInteger.ONE.shiftLeft(width + 1);
BigInteger opened = l.subtract(opener);
BigInteger mask = opener.subtract(BigInteger.ONE).shiftRight(shiftBy + 1);
BigInteger res = opened.shiftRight(shiftBy).and(mask);
return res;
}
The case that your integer is positive is trivial, as shiftRight will return the correct result anyway. But for negative numbers this gets tricky. The negate version mentioned earlier does not work as -1 in BigInteger negated is 1. Shift it and you have 0. But you need to know what the width of your BigInteger is. You then basically force the BigInteger to have at least width+1 bits by subtracting an opener. Then you perform the shifting, and mask away the extra bit that you introduced. It doesn't really matter what opener you use, as long as it doesn't alter the lower bits.
How the opener works:
The BigInteger implementation does only store the highest 0 position for negative numbers. A -3 is represented as:
1111_1111_1111_1111_1101
But only some bits are stored, I marked the others as X.
XXXX_XXXX_XXXX_XXXX_XX01
Shifting to the right does nothing as there are always 1's coming from the left. So the idea is to substract a 1 to generate a 0 outside of the width that you are interested in. Assuming you care about the lowest twelve bit:
XXXX_XXXX_XXXX_XXXX_XX01
- 0001_0000_0000_0000
========================
XXXX_XXX0_1111_1111_1101
This forced the generation of real 1s. You then shift right by lets say 5.
XXXX_XXX0_1111_1111_1101
>>5 XXXX_XXX0_1111_111
And then mask it:
XXXX_XXX0_1111_111
0000_0000_1111_111
And therewith receive the correct result:
0000_0000_1111_111
So the introduction of the zero forced the BigInteger implementation to update the stored 0 position to a width that is higher than the one you are interested in and forced the creation of stored 1s.

The BigInteger class has the following operations
BigInteger shiftLeft(int n)
BigInteger shiftRight(int n)

Behaviour of unsigned right shift applied to byte variable

Consider the following snip of java code
byte b=(byte) 0xf1;
byte c=(byte)(b>>4);
byte d=(byte) (b>>>4);
output:
c=0xff
d=0xff
expected output:
c=0x0f
how?
as b in binary 1111 0001
after unsigned right shift 0000 1111 hence 0x0f but why is it 0xff how?

The problem is that all arguments are first promoted to int before the shift operation takes place:
byte b = (byte) 0xf1;
b is signed, so its value is -15.
byte c = (byte) (b >> 4);
b is first sign-extended to the integer -15 = 0xfffffff1, then shifted right to 0xffffffff and truncated to 0xff by the cast to byte.
byte d = (byte) (b >>> 4);
b is first sign-extended to the integer -15 = 0xfffffff1, then shifted right to 0x0fffffff and truncated to 0xff by the cast to byte.
You can do (b & 0xff) >>> 4 to get the desired effect.

I'd guess that b is sign extended to int before shifting.
So this might work as expected:
(byte)((0x000000FF & b)>>4)

According to Bitwise and Bit Shift Operators:
The unsigned right shift operator ">>>" shifts a zero into the leftmost position, while the leftmost position after ">>" depends on sign extension.
So with b >> 4 you transform 1111 0001 to 1111 1111 (b is negative, so it appends 1) which is 0xff.

Java tries to skimp on having explicit support for unsigned basic types by defining the two different shift operators instead.
The question talks about unsigned right shift, but the examples does both (signed and unsigned), and shows the value of the signed shift (>>).
Your calculations would be right for unsigned shift (>>>).

The byte operand is promoted to an int before the shift.
See https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.19
Unary numeric promotion (§5.6.1) is performed on each operand separately. (Binary numeric promotion (§5.6.2) is not performed on the operands.)
And https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html#jls-5.6.1
Otherwise, if the operand is of compile-time type byte, short, or char, it is promoted to a value of type int by a widening primitive conversion (§5.1.2).

byte b=(byte) 0xf1;
if (b<0)
d = (byte) ((byte) ((byte)(b>>1)&(byte)(0x7F)) >>>3);
else
d = (byte)(b>>>4);
First, check the value:
If the value is negative. Make one right shift, then & 0x7F, It will be changed to positive. then you can make the rest of right shift (4-1=3) easily.
If the value is positive, make all right shift with >>4 or >>>4. It does'nt make no difference in result nor any problem of right shift.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Sign extension, bit shifting in JAVA. Help understanding a C-code bit - java

Related

Non cryptographic hashing in Java using unsigned integers

Storing unsigned long value in two 16bit register

Narrowing from int to short [duplicate]

BigInteger unsigned left or right shift

Behaviour of unsigned right shift applied to byte variable

Categories

Resources