Bit data type in Java

Bit data type in Java - java

I am writing a program, in Java, which need bit processing, and manipulations. But I could not find any "bit" level primitive datatype. Is there any way in Java to define "bit" datatype?
In C/C++, we can tell compiler how many bits to allocate for variables storage as
struct bit {
unsigned int value : 1; // 1 bit to store value
};
How can I do same thing in Java?

boolean or Boolean with true or false (or Boolean.TRUE / Boolean.FALSE). Both of which can be used with autoboxing and unboxing. There is also a BitSet for handling a vector of bits.

Depends on what you mean by "bit".
If you want single bits use a boolean and true/false.
If you want multi-bit fields you'll have to use an integer type (byte, short, int or long) and do the masking yourself. There's also BitSet for an OO implementation, but it's likely to be slower than masking for small (<64 bits) sets, although better for readability.

Related

Why isn't BigInteger a primitive

If you use BigInteger (or BigDecimal) and want to perform arithmetic on them, you have to use the methods add or subtract, for example. This may sound fine until you realize that this
i += d + p + y;
would be written like this for a BigInteger:
i = i.add(d.add(p.add(y)));
As you can see it is a little easier to read the first line. This could be solved if Java allowed operator overloading but it doesn't, so this begs the question:
Why isn't BigInteger a primitive type so it can take advantage of the same operators as other primitive types?

That's because BigInteger is not, in fact, anything that is close to being a primitive. It is implemented using an array and some additional fields, and the various operations include complex operations. For example, here is the implementation of add:
public BigInteger add(BigInteger val) {
if (val.signum == 0)
return this;
if (signum == 0)
return val;
if (val.signum == signum)
return new BigInteger(add(mag, val.mag), signum);
int cmp = compareMagnitude(val);
if (cmp == 0)
return ZERO;
int[] resultMag = (cmp > 0 ? subtract(mag, val.mag)
: subtract(val.mag, mag));
resultMag = trustedStripLeadingZeroInts(resultMag);
return new BigInteger(resultMag, cmp == signum ? 1 : -1);
}
Primitives in Java are types that are usually implemented directly by the CPU of the host machine. For example, every modern computer has a machine-language instruction for integer addition. Therefore it can also have very simple byte code in the JVM.
A complex type like BigInteger cannot usually be handled that way, and it cannot be translated into simple byte code. It cannot be a primitive.
So your question might be "Why no operator overloading in Java". Well, that's part of the language philosophy.
And why not make an exception, like for String? Because it's not just one operator that is the exception. You need to make an exception for the operators *, /, +,-, <<, ^ and so on. And you'll still have some operations in the object itself (like pow which is not represented by an operator in Java), which for primitives are handled by speciality classes (like Math).

Fundamentally, because the informal meaning of "primitive" is that it's data that can be handled directly with a single CPU instruction. In other words, they are primitives because they fit in a 32 or 64 bits word, which is the data architecture that your CPU works with, so they can explicitely be stored in the registers.
And thus your CPU can make the following operation:
ADD REGISTER_3 REGISTER_2 REGISTER_1 ;;; REGISTER_3 = REGISTER_1 + REGISTER_2
A BigInteger which can occupy an arbitrarily large amount of memory can't be stored in a single REGISTER and will need to perform multiple instructions to make a simple sum.
This is why they couldn't possibly be a primitive type, and now they actually are objects with methods and fields, a much more complex structure than simple primitive types.
Note: The reason why I called this informal is because ultimately the Java designers could define a "Java primitive type" as anything they wanted, they own the word, however this is vaguely the agreed use of the word.

int and boolean and char aren't primitives so that you can take advantage of operators like + and /. They are primitives for historical reasons, the biggest of which is performance.
In Java, primitives are defined as just those things that are not full-fledged Objects. Why create these unusual structures (and then re-implement them as proper objects, like Integer, later on)? Primarily for performance: operations on Objects were (and are) slower than operations on primitive types. (As other answers mention, hardware support made these operations faster, but I'd disagree that hardware support is an "essential property" of primitives.)
So some types received "special treatment" (and were implemented as primitives), and others didn't. Think of it this way: if even the wildly-popular String is not a primitive type, why would BigInteger be?

It's because primitive types have a size limit. For instance int is 32 bits and long is 64 bits. So if you create a variable of type int the JVM allocates 32 bits of memory on the stack for it. But as for BigInteger, it "theoretically" has no size limit. Meaning it can grow arbitrarily in size. Because of this, there is no way to know its size and allocate a fixed block of memory on the stack for it. Therefore it is allocated on the heap where the JVM can always increase the size if needed.

Primitive types are normally historic types defined by processor architecture. Which is why byte is 8-bit, short is 16-bit, int is 32-bit and long is 64-bit. Maybe when there's more 128-bit architectures, an extra primitive will be created...but I can't see there being enough drive for this...

Java - indexing into array with a byte

Is it possible to index a Java array based on a byte?
i.e. something like
array[byte b] = x;
I have a very performance-critical application which reads b (in the code above) from a file, and I don't want the overhead of converting this to an int. What is the best way to achieve this? Is there a performance-decrease as a result of using this method of indexing rather than an int?
With many thanks,
Froskoy.

There's no overhead for "converting this to an int." At the Java bytecode level, all bytes are already ints.
In any event, doing array indexing will automatically upcast to an int anyway. None of these things will improve performance, and many will decrease performance. Just leave your code using an int.
The JVM specification, section 2.11.1:
Note that most instructions in Table 2.2 do not have forms for the integral types byte, char, and short. None have forms for the boolean type. Compilers encode loads of literal values of types byte and short using Java virtual machine instructions that sign-extend those values to values of type int at compile-time or runtime. Loads of literal values of types boolean and char are encoded using instructions that zero-extend the literal to a value of type int at compile-time or runtime. Likewise, loads from arrays of values of type boolean, byte, short, and char are encoded using Java virtual machine instructions that sign-extend or zero-extend the values to values of type int. Thus, most operations on values of actual types boolean, byte, char, and short are correctly performed by instructions operating on values of computational type int.

As all integer types in java are signed you have anyway to mask out 8 bits of b's value provided you do expect to read from the file values greater than 0x7F:
byte b;
byte a[256];
a [b & 0xFF] = x;

No; array indices are non-negative integers (JLS 10.4), but byte indices will be promoted.

No, there is no performance decrease, because on the moment you read the byte, you store it in a CPU register sometime. Those registers always works with WORDs, which means that the byte is always "converted" to an int (or a long, if you are on a 64 bit machine).
So, simply read your byte like this:
int b = (in.readByte() & 0xFF);
If your application is that performance critical, you should be optimizing elsewhere.

Why HashMap internal work helper variables are int, which can be byte datatype

HashMap internally has its own static final variables for its working.
static final int DEFAULT_INITIAL_CAPACITY = 16;
Why can't they use byte datatype instead of using int since the value is too small.

They could, but it would be a micro-optimization, and the tradeoff would be less readable and maintainable code (Premature optimization, anyone?).
This is a static final variable, so it's allocated only once per classloader. I'd say we can spare those 3 (I'm guessing here) bytes.

I think this is because the capacity for a Map is expressed in terms of an int. When you try to work with a byte and an int, because of promotion rules, the byte will anyways be converted to an int. The default capacity is expressed in terms of an int to maybe avoid those needless promotions.

Using byte or short for variables and constants instead of int is a premature optimization that has next to no effect.
Most arithmetic and logical instructions of the JVM work only with int, long, float and double, other data types have to be cast to (usually) ints in order for these instructions to be executed on them.
The default type of number literals is int for integral and double for floating point numbers. Using byte, short and float types can thus cause some subtle programming bugs and generally worsens code readability.
A little example from the Java Puzzlers book:
public static void main(String[] args) {
for (byte b = Byte.MIN_VALUE; b < Byte.MAX_VALUE; b++) {
if (b == 0x90)
System.out.print("Joy!");
}
}
This program doesn't print Joy!, because the hex value 0x90 is implicitly promoted to an int with the value 144. Since bytes in Java are signed (which itself is very inconvenient), the variable b is never assigned to this value (Byte.MAX_VALUE = 127) and therefore, the condition is never satisfied.
All in all, the reduction of the memory footprint is simply too small (next to none) to justify such micro-optimisation. Generally, explicit numeric types of different size are not necessary and suitable for higher level programming. I personally think that only case where smaller numeric types are acceptable are byte arrays.

The byte values still taking the same space in the JVM and it will also need to be converted to int to the practical purposes explicitly or implicitly, including array sizes, indexes, etc.

Converting from a byte to an int(as it needs to be anint` in any case) would make the code slower if anything. The cost of memory is pretty trivial in the overall scheme of things.
Given the default could be any int value, I think int makes sense.

A lot of data can be represented as a series of Bytes.
Int is the default data type that most users will use when counting or workign with whole numbers.
the issue with using Byte is that the compiler will not recognize it for type conversion.
anytime you tried
int variablename = bytevariable;
it wouldnt complete the assignment however
double variablename = intVariable;
would work.

Objective-C and Java Primitive Data Types

I need to convert a piece of code from Objective-C to Java, but I have a problem understanding the Primitive Types in Objective-C. So I had their data types in my Objective-C code :
UInt64, Uint32, UInt8 ,
which are unsigned integers (as I understand from internet). So my question is, can I use Java primitive types like byte (8bit) - instead of UInt8, int (32bit) - instead of UInt32, and long (64bit) - instead of UInt64.

Unfortunately, it isn't a straight translation and without knowing more about your program, its hard to suggest what the "right" approach is.
If your UInt8 values really range from 0-255, you may have to use Java signed int to be able to hold the entire range.
If you are dealing with byte streams or memory layouts and really need to use just a single byte of memory, than you could try byte, but you may have to test and handle cases to handle when the high-bit is set (value > 127). Ditto with the other unsigned types.
Ideally, if your code just kind of "defaulted" to the unsigned types, but really the signed versions would have worked fine too (i.e. the ranges of your values never equal or exceed 2^7, 2^15, or 2^31 respectively), then you may be fine with the "straight" translation to byte, int, and long.

Yes, those are the correctly sized data types to use in Java. Make sure you take into account that Java does not have unsigned types and the trick is to use the next largest size. 64 bit unsigned arithmetic requires special consideration.

Why are there no byte or short literals in Java?

I can create a literal long by appending an L to the value; why can't I create a literal short or byte in some similar way? Why do I need to use an int literal with a cast?
And if the answer is "Because there was no short literal in C", then why are there no short literals in C?
This doesn't actually affect my life in any meaningful way; it's easy enough to write (short) 0 instead of 0S or something. But the inconsistency makes me curious; it's one of those things that bother you when you're up late at night. Someone at some point made a design decision to make it possible to enter literals for some of the primitive types, but not for all of them. Why?

In C, int at least was meant to have the "natural" word size of the CPU and long was probably meant to be the "larger natural" word size (not sure in that last part, but it would also explain why int and long have the same size on x86).
Now, my guess is: for int and long, there's a natural representation that fits exactly into the machine's registers. On most CPUs however, the smaller types byte and short would have to be padded to an int anyway before being used. If that's the case, you can as well have a cast.

I suspect it's a case of "don't add anything to the language unless it really adds value" - and it was seen as adding sufficiently little value to not be worth it. As you've said, it's easy to get round, and frankly it's rarely necessary anyway (only for disambiguation).
The same is true in C#, and I've never particularly missed it in either language. What I do miss in Java is an unsigned byte type :)

Another reason might be that the JVM doesn't know about short and byte. All calculations and storing is done with ints, longs, floats and doubles inside the JVM.

There are several things to consider.
1) As discussed above the JVM has no notion of byte or short types. Generally these types are not used in computation at the JVM level; so one can think there would be less use of these literals.
2) For initialization of byte and short variables, if the int expression is constant and in the allowed range of the type it is implicitly cast to the target type.
3) One can always cast the literal, ex (short)10

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Bit data type in Java - java

boolean or Boolean with true or false (or Boolean.TRUE / Boolean.FALSE). Both of which can be used with autoboxing and unboxing. There is also a BitSet for handling a vector of bits.

Related

Why isn't BigInteger a primitive

Java - indexing into array with a byte

Why HashMap internal work helper variables are int, which can be byte datatype

Objective-C and Java Primitive Data Types

Why are there no byte or short literals in Java?

Categories

Resources