Is there a way in Java to use unsigned numbers like in (My)SQL?
For example: I want to use an 8-bit variable (byte) with a range like: 0 ... 256; instead of -128 ... 127.
No, Java doesn't have any unsigned primitive types apart from char (which has values 0-65535, effectively). It's a pain (particularly for byte), but that's the way it is.
Usually you either stick with the same size, and overflow into negatives for the "high" numbers, or use the wider type (e.g. short for byte) and cope with the extra memory requirements.
You can use a class to simulate an unsigned number. For example
public class UInt8 implements Comparable<UInt8>,Serializable
{
public static final short MAX_VALUE=255;
public static final short MIN_VALUE=0;
private short storage;//internal storage in a int 16
public UInt8(short value)
{
if(value<MIN_VALUE || value>MAX_VALUE) throw new IllegalArgumentException();
this.storage=value;
}
public byte toByte()
{
//play with the shift operator ! <<
}
//etc...
}
You can mostly use signed numbers as if they were unsigned. Most operations stay the same, some need to be modified. See this post.
Internally, you shouldn't be using the smaller values--just use int. As I understand it, using smaller units does nothing but slow things down. It doesn't save memory because internally Java uses the system's word size for all storage (it won't pack words).
However if you use a smaller size storage unit, it has to mask them or range check or something for every operation.
ever notice that char (any operation) char yields an int? They just really don't expect you to use these other types.
The exceptions are arrays (which I believe will get packed) and I/O where you might find using a smaller type useful... but masking will work as well.
Nope, you can't change that. If you need something larger than 127 choose something larger than a byte.
If you need to optimize your storage (e.g. large matrix) you can u can code bigger positive numbers with negatives numbers, so to save space. Then, you have to shift the number value to get the actual value when needed. For instance, I want to manipulate short positive numbers only. Here how this is possible in Java:
short n = 32767;
n = (short) (n + 10);
System.out.println(n);
int m = (int) (n>=0?n:n+65536);
System.out.println(m);
So when a short integer exceeds range, it becomes negative. Yet, at least you can store this number in 16 bits, and restore its correct value by adding shift value (number of different values that can be coded). The value should be restored in a larger type (int in our case). This may not be very convenient, but I find it's so in my case.
I'm quite new to Java and to programming.
Yet, I encountered the same situation recently the need of unsigned values.
It took me around two weeks to code everything I had in mind, but I'm a total noob, so you could spend much less.
The general idea is to create interface, I have named it: UnsignedNumber<Base, Shifted> and to extend Number.class whilst implementing an abstract AbstractUnsigned<Base, Shifted, Impl extends AbstractUnsigned<Base, Shifted, Impl>> class.
So, Base parameterized type represents the base type, Shifted represents actual Java type. Impl is a shortcut for Implementation of this abstract class.
Most of the time consumed boilerplate of Java 8 Lambdas and internal private classes and safety procedures. The important thing was to achieve the behavior of unsigned when mathematical operation like subtraction or negative addition spawns the zero limit: to overflow the upper signed limit backwards.
Finally, it took another couple of days to code factories and implementation sub classes.
So far I have know:
UByte and MUByte
UShort and MUShort
UInt and MUInt
... Etc.
They are descendants of AbstractUnsigned:
UByte or MUByte extend AbstractUnsigned<Byte, Short, UByte> or AbstractUnsigned<Byte, Short, MUByte>
UShort or MUShort extend AbstractUnsigned<Short, Integer, UShort> or AbstractUnsigned<Short, Integer, MUShort>
...etc.
The general idea is to take unsigned upper limit as shifted (casted) type and code transposition of negative values as they were to come not from zero, but the unsigned upper limit.
UPDATE:
(Thanks to Ajeans kind and polite directions)
/**
* Adds value to the current number and returns either
* new or this {#linkplain UnsignedNumber} instance based on
* {#linkplain #isImmutable()}
*
* #param value value to add to the current value
* #return new or same instance
* #see #isImmutable()
*/
public Impl plus(N value) {
return updater(number.plus(convert(value)));
}
This is an externally accessible method of AbstractUnsigned<N, Shifted, Impl> (or as it was said before AbstractUnsigned<Base, Shifted, Impl>);
Now, to the under-the-hood work:
private Impl updater(Shifted invalidated){
if(mutable){
number.setShifted(invalidated);
return caster.apply(this);
} else {
return shiftedConstructor.apply(invalidated);
}
}
In the above private method mutable is a private final boolean of an AbstractUnsigned. number is one of the internal private classes which takes care of transforming Base to Shifted and vice versa.
What matters in correspondence with previous 'what I did last summer part'
is two internal objects: caster and shiftedConstructor:
final private Function<UnsignedNumber<N, Shifted>, Impl> caster;
final private Function<Shifted, Impl> shiftedConstructor;
These are the parameterized functions to cast N (or Base) to Shifted or to create a new Impl instance if current implementation instance of the AbstractUnsigned<> is immutable.
Shifted plus(Shifted value){
return spawnBelowZero.apply(summing.apply(shifted, value));
}
In this fragment is shown the adding method of the number object. The idea was to always use Shifted internally, because it is uncertain when the positive limits of 'original' type will be spawned. shifted is an internal parameterized field which bears the value of the whole AbstractUnsigned<>. The other two Function<> derivative objects are given below:
final private BinaryOperator<Shifted> summing;
final private UnaryOperator<Shifted> spawnBelowZero;
The former performs addition of two Shifted values. And the latter performs spawning below zero transposition.
And now an example from one of the factory boilerplates 'hell' for AbstractUnsigned<Byte, Short> specifically for the mentioned before spawnBelowZero UnaryOperator<Shifted>:
...,
v-> v >= 0
? v
: (short) (Math.abs(Byte.MIN_VALUE) + Byte.MAX_VALUE + 2 + v),
...
if Shifted v is positive nothing really happens and the original value is being returned. Otherwise: there's a need to calculate the upper limit of the Base type which is Byte and add up to that value negative v. If, let's say, v == -8 then Math.abs(Byte.MIN_VALUE) will produce 128 and Byte.MAX_VALUE will produce 127 which gives 255 + 1 to get the original upper limit which was cut of by the sign bit, as I got that, and the so desirable 256 is in the place. But the very first negative value is actually that 256 that's why +1 again or +2 in total. Finally, 255 + 2 + v which is -8 gives 255 + 2 + (-8) and 249
Or in a more visual way:
0 1 2 3 ... 245 246 247 248 249 250 251 252 253 254 255 256
-8 -7 -6 -5 -4 -3 -2 -1
And to finalize all that: this definitely does not ease your work or saves memory bytes, but you have a pretty much desirable behaviour when it is needed. And you can use that behaviour pretty much with any other Number.class subclasses. AbstractUnsigned being subclass of Number.class itself provides all the convenience methods and constants
similar to other 'native' Number.class subclasses, including MIN_VALUE and MAX_VALUE and a lot more, for example, I coded convenience method for mutable subclasses called makeDivisibileBy(Number n) which performs the simplest operation of value - (value % n).
My initial endeavour here was to show that even a noob, such as I am, can code it. My initial endeavour when I was coding that class was to get conveniently versatile tool for constant using.
Related
In the Random class, define a nextByte method that returns a value of the primitive type
byte. The values returned in a sequence of calls should be uniformly distributed over all the
possible values in the type.
In the Random class, define a nextInt method that returns a value of the primitive type
int. The values returned in a sequence of calls should be uniformly distributed over all the possible
values in the type.
(Hint: Java requires implementations to use the twos-complement representation for integers.
Figure out how to calculate a random twos-complement representation from four random byte
values using Java’s shift operators.)
Hi I was able to do part 3 and now I need to use 3. to solve 4. but I do not know what to do. I was thinking of using nextByte to make an array of 4 bytes then would I take twos complement of each so I wouldn't have negative numbers and then I would put them together into one int.
byte[] bytes = {42,-15,-7, 8} Suppose nextByte returns this bytes.
Then I would take the twos complement of each which i think would be {42, 241, 249, 8}. Is this what it would look like and why doesn't this code work:
public static int twosComplement(int input_value, int num_bits){
int mask = (int) Math.pow(2, (num_bits - 1));
return -(input_value & mask) + (input_value & ~mask);
}
Then I would use the following to put all four bytes into an int, would this work:
int i= (bytes[0]<<24)&0xff000000|
(bytes[1]<<16)&0x00ff0000|
(bytes[2]<< 8)&0x0000ff00|
(bytes[3]<< 0)&0x000000ff;
Please be as specific as possible.
The assignment says that Java already uses two's complement integers. This is a useful property that simplifies the rest of the code: it guarantees that if you group together 32 random bits (or in general however many bits your desired output type has), then this covers all possible values exactly once and there are no invalid patterns.
That might not be true of some other integer representations, which might only have 2³²-1 different values (leaving an invalid pattern that you would have to avoid) or have 2³² valid patterns but both a "positive" and a "negative" zero, which would cause a random bit pattern to have a biased "interpreted value" (with zero occurring twice as often as it should).
So that it not something for you to do, it is a convenient property for you to use to keep the code simple. Actually you already used it. This code:
int i= (bytes[0]<<24)&0xff000000|
(bytes[1]<<16)&0x00ff0000|
(bytes[2]<< 8)&0x0000ff00|
(bytes[3]<< 0)&0x000000ff;
Works properly thanks to those properties. By the way it can be simplified a bit: after shifting left by 24, there is no more issue with sign-extension, all the extended bits have been shifted out. And shifting left by 0 is obviously a no-op. So (bytes[0]<<24)&0xff000000 can be written as (bytes[0]<<24), and (bytes[3]<< 0)&0x000000ff as bytes[3]&0xff. But you can keep it as it was, with the nice regular structure.
The twosComplement function is not necessary.
Regarding my previous Question, Why do == comparisons with Integer.valueOf(String) give different results for 127 and 128? , we know that Integer class has a cache which stores values between -128 and 127.
Just wondering, why between -128 and 127?
Integer.valueOf() documentation stated that it "caching frequently requested values" . But does values between -128 and 127 are frequently requested for real? I thought frequently requested values are very subjective.
Is there any possible reason behind this?
From the documentation also stated: "..and may cache other values outside of this range."
How is this can be achieved?
Just wondering, why between -128 and 127?
A larger range of integers may be cached, but at least those between -128 and 127 must be cached because it is mandated by the Java Language Specification (emphasis mine):
If the value p being boxed is true, false, a byte, or a char in the range \u0000 to \u007f, or an int or short number between -128 and 127 (inclusive), then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.
The rationale for this requirement is explained in the same paragraph:
Ideally, boxing a given primitive value p, would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rules above are a pragmatic compromise. The final clause above requires that certain common values always be boxed into indistinguishable objects. [...]
This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. Less memory-limited implementations might, for example, cache all char and short values, as well as int and long values in the range of -32K to +32K.
How can I cache other values outside of this range.?
You can use the -XX:AutoBoxCacheMax JVM option, which is not really documented in the list of available Hotspot JVM Options. However it is mentioned in the comments inside the Integer class around line 590:
The size of the cache may be controlled by the -XX:AutoBoxCacheMax=<size> option.
Note that this is implementation specific and may or may not be available on other JVMs.
-128 to 127 is the default size. But javadoc also says that the size of the Integer cache may be controlled by the -XX:AutoBoxCacheMax=<size> option. Note that it sets only high value, low value is always -128. This feature was introduced in 1.6.
As for why -128 to 127 - this is byte value range and it is natural to use it for a very small cache.
The reason for caching small integers, if that's what you're asking, is that many algorithms use small integers in their calculations, so avoiding the object-creation overhead for these values tends to be worthwhile.
The question then becomes which Integers to cache. Again, speaking in general, the frequency with which constant values are used tends to decrease as the absolute value of the constant increases -- everyone spends a lot of time using the values 1 or 2 or 10, relatively few few use the value 109 very intensively; fewer will have performance depend on how quickly one can obtain an Integer for 722.. Java chose to allocate 256 slots spanning the range of a signed byte value. This decision may have been informed by analyzing programs in existence at the time, but is just as likely to have been a purely arbitrary one. It's a reasonable amount of space to invest, it can be accessed rapidly (mask to find out if the value's in the cache's range, then a quick table lookup to access the cache), and it will definitely cover the most common cases.
In other words, I think the answer to your question is "it isn't as subjective as you thought, but the exact bounds are largely a rule-of-thumb decision ... and experiemental evidence has been that it was good enough."
Max high integer value that can be cached can be configured through system property i.e java.lang.Integer.IntegerCache.high(-XX:AutoBoxCacheMax) . The cache is implemented using an array.
private static class IntegerCache {
static final int high;
static final Integer cache[];
static {
final int low = -128;
// high value may be configured by property
int h = 127;
if (integerCacheHighPropValue != null) {
// Use Long.decode here to avoid invoking methods that
// require Integer's autoboxing cache to be initialized
int i = Long.decode(integerCacheHighPropValue).intValue();
i = Math.max(i, 127);
// Maximum array size is Integer.MAX_VALUE
h = Math.min(i, Integer.MAX_VALUE - -low);
}
high = h;
cache = new Integer[(high - low) + 1];
int j = low;
for(int k = 0; k < cache.length; k++)
cache[k] = new Integer(j++);
}
private IntegerCache() {}
}
When you encounter with Integer class and always boxed within the range -128 to 127 it's always better to convert the Integer object into int value as below.
<Your Integer Object>.intValue()
I am trying to write a bitwise calculator in java, something that you could input an expression such as ~101 and it would give back 10 however when i run this code
import java.util.Scanner;
public class Test
{
public static void main(String[] args)
{
Integer a = Integer.valueOf("101", 2);
System.out.println(Integer.toString(~a,2));
}
}
it outputs -110 why?
You are assuming that 101 is three bits long. Java doesn't support variable length bit operations, it operates on a whole int of bits, so ~ will be the not of a 32 bit long "101".
--- Edited after being asked "How can I fix this?" ---
That's a really good question, but the answer is a mix of "you can't" and "you can achieve the same thing by different means".
You can't fix the ~ operator, as it does what it does. It would sort of be like asking to fix + to only add the 1's place. Just not going to happen.
You can achieve the desired operation, but you need a bit more "stuff" to get it going. First you must have something (another int) that specifies the bits of interest. This is typically called a bit mask.
int mask = 0x00000007; // just the last 3 bits.
int masked_inverse = (~value) & mask;
Note that what we did was really invert 32 bits, then zeroed out 29 of those bits; because, they were set to zero in the mask, which means "we don't care about them". This can also be imagined as leveraging the & operator such that we say "if set and we care about it, set it".
Now you will still have 32 bits, but only the lower 3 will be inverted. If you want a 3 bit data structure, then that's a different story. Java (and most languages) just don't support such things directly. So, you might be tempted to add another type to Java to support that. Java adds types via a class mechanism, but the built-in types are not changeable. This means you could write a class to represent a 3 bit data structure, but it will have to handle ints internally as 32 bit fields.
Fortunately for you, someone has already done this. It is part of the standard Java library, and is called a BitSet.
BitSet threeBits = new BitSet(3);
threeBits.set(2); // set bit index 2
threeBits.set(0); // set bit index 0
threeBits.flip(0,3);
However, such bit manipulations have a different feel to them due to the constraints of the Class / Object system in Java, which follows from defining classes as the only way to add new types in Java.
If a = ...0000101 (bin) = 5 (dec)
~a = ~...0000101(bin) = ...1111010(bin)
and Java uses "Two's complement" form to represent negative numbers so
~a = -6 (dec)
Now difference between Integer.toBinaryString(number) and Integer.toString(number, 2) for negative number is that
toBinaryString returns String in "Two's complement" form but
toString(number, 2) calculates binary form as if number was positive and add "minus" mark if argument was negative.
So toString(number, 2) for ~a = -6 will
calculate binary value for 6 -> 0000110,
trim leading zeros -> 110,
add minus mark -> -110.
101 in integer is actually represented as 00000000000000000000000000000101 negate this and you get 11111111111111111111111111111010 - this is -6.
The toString() method interprets its argument as a signed value.
To demonstrate binary operations its better to use Integer.toBinaryString(). It interprets its argument as unsigned, so that ~101 is output as 11111111111111111111111111111010.
If you want fewer bits of output you can mask the result with &.
Just to elaborate on Edwin's answer a bit - if you're looking to create a variable length mask to develop the bits of interest, you might want some helper functions:
/**
* Negate a number, specifying the bits of interest.
*
* Negating 52 with an interest of 6 would result in 11 (from 110100 to 001011).
* Negating 0 with an interest of 32 would result in -1 (equivalent to ~0).
*
* #param number the number to negate.
* #param bitsOfInterest the bits we're interested in limiting ourself to (32 maximum).
* #return the negated number.
*/
public int negate(int number, int bitsOfInterest) {
int negated = ~number;
int mask = ~0 >>> (32 - bitsOfInterest);
logger.info("Mask for negation is [" + Integer.toBinaryString(mask) + "]");
return negated & mask;
}
/**
* Negate a number, assuming we're interesting in negation of all 31 bits (exluding the sign).
*
* Negating 32 in this case would result in ({#link Integer#MAX_VALUE} - 32).
*
* #param number the number to negate.
* #return the negated number.
*/
public int negate(int number) {
return negate(number, 31);
}
I would like to know which one is the best way to work with binary numbers in java.
I need a way to create an array of binary numbers and do some calculations with them.
For example, I would like to X-or the values or multiply matrix of binary numbers.
Problem solved:
Thanks very much for all the info.
I think for my case I'm going to use the BitSet mentioned by #Jarrod Roberson
In Java edition 7, you can simply use binary numbers by declaring ints and preceding your numbers with 0b or 0B:
int x=0b101;
int y=0b110;
int z=x+y;
System.out.println(x + "+" + y + "=" + z);
//5+6=11
/*
* If you want to output in binary format, use Integer.toBinaryString()
*/
System.out.println(Integer.toBinaryString(x) + "+" + Integer.toBinaryString(y)
+ "=" + Integer.toBinaryString(z));
//101+110=1011
What you are probably looking for is the BitSet class.
This class implements a vector of bits that grows as needed. Each
component of the bit set has a boolean value. The bits of a BitSet are
indexed by nonnegative integers. Individual indexed bits can be
examined, set, or cleared. One BitSet may be used to modify the
contents of another BitSet through logical AND, logical inclusive OR,
and logical exclusive OR operations.
By default, all bits in the set initially have the value false.
Every bit set has a current size, which is the number of bits of space
currently in use by the bit set. Note that the size is related to the
implementation of a bit set, so it may change with implementation. The
length of a bit set relates to logical length of a bit set and is
defined independently of implementation.
Unless otherwise noted, passing a null parameter to any of the methods
in a BitSet will result in a NullPointerException.
There's a difference between the number itself and
it's representation in the language. For instance, "0xD" (radix 16), "13" (radix 10), "015" (radix 8) and "b1101" (radix 2) are four
different representations referring to the same number.
That said, you can use the "int" primitive data type in the Java language to represent any binary number (as well as any number in any radix), but only in Java 7 you are able to use a binary literal as you were previously able to use the octal (0) and hexa (0x) literals to represent those numbers, if I understood correctly your question.
You can store them as byte arrays, then access the bits individually. Then to XOR them you can merely XOR the bytes (it is a bitwise operation).
Of course it doesn't have to be a byte array (could be an array of int types or whatever you want), since everything is stored in binary in the end.
I've never seen a computer that uses anything but binary numbers.
The XOR operator in Java is ^. For example, 5 ^ 3 = 6. The default radix for most number-to-string conversions is 10, but there are several methods which allow you to specify another base, like 2:
System.out.println(Integer.toString(5 ^ 3, 2));
If you are using Java 7, you can use binary literals in your source code (in addition to the decimal, hexadecimal, and octal forms previously supported).
Example code:
int a = 255;
byte b = (byte) a;
int c = b & 0xff; // Here be dragons
System.out.println(a);
System.out.println(b);
System.out.println(c);
So we start with an integer value of 255, convert it to a byte (becoming -1) and then converting it back to an int by using a magic formula. The expected output is:
255
-1
255
I'm wondering if this a & 0xff is the most elegant way to to this conversion. checkstyle for example complains about using a magic number at this place and it's not a good idea to ignore this value for this check because in other places 255 may really be a magic number which should be avoided. And it's quite annoying to define a constant for stuff like this on my own. So I wonder if there is a standard method in JRE which does this conversion instead? Or maybe an already defined constant with the highest unsigned byte value (similar to Byte.MAX_VALUE which is the highest signed value)
So to keep the question short: How can I convert a byte to an int without using a magic number?
Ok, so far the following possibilities were mentioned:
Keep using & 0xff and ignore the magic number 255 in checkstyle. Disadvantage: Other places which may use this number in some other scope (not bit operations) are not checked then, too. Advantage: Short and easy to read.
Define my own constant for it and then use code like & SomeConsts.MAX_UNSIGNED_BYTE_VALUE. Disadvantage: If I need it in different classes then I have to define my own constant class just for this darn constant. Advantage: No magic numbers here.
Do some clever math like b & ((1 << Byte.SIZE) - 1). The compiler output is most likely the same because it gets optimized to a constant value. Disadvantage: Pretty much code, difficult to read. Advantage: As long as 1 is not defined as magic number (checkstyle ignores it by default) we have no magic number here and we don't need to define custom constants. And when bytes are redefined to be 16 bit some day (Just kidding) then it still works because then Byte.SIZE will be 16 and not 8.
Are there more ideas? Maybe some other clever bit-wise operation which is shorter then the one above and only uses numbers like 0 and 1?
This is the standard way to do that transformation. If you want to get rid of the checkstyle complaints, try defining a constant, it could help:
public final static int MASK = 0xff;
BTW - keep in mind, that it is still a custom conversion. byte is a signed datatype so a byte can never hold the value 255. A byte can store the bit pattern 1111 1111 but this represents the integer value -1.
So in fact you're doing bit operations - and bit operations always require some magic numbers.
BTW-2 : Yes, there is a Byte.MAX_VALUE constant but this is - because byte is signed - defined as 27-1 (= 127). So it won't help in your case. You need a byte constant for -1.
Ignore checkstyle. 0xFF is not a magic number. If you define a constant for it, the constant is a magic constant, which is much less understandable than 0xFF itself. Every programmer educated in the recent centuries should be more familiar with 0xFF than with his girlfriend, if any.
should we write code like this?
for(int i = Math.ZERO; ... )
Guava to the rescue.
com.google.common.primitives.UnsignedBytes.toInt
Java 8 provides Byte.toUnsignedInt and Byte.toUnsignedLong (probably for really big bytes) methods:
byte b = (byte)255;
int c = Byte.toUnsignedInt(b); // 255
long asLong = Byte.toUnsignedLong(b); // 255
I wrote a method for this like
public static int unsigned(byte x) {
return int (x & 0xFF);
}
which is overloaded for short and int parameters, too (where int gets extended to long).
Instead of 0xFF you could use Byte.MAX_VALUE+Byte.MAX_VALUE+1 to keep FindBug shut, but I'd consider it to be an obfuscation. And it's too easy to get it wrong (s. previous versions).