If you use BigInteger (or BigDecimal) and want to perform arithmetic on them, you have to use the methods add or subtract, for example. This may sound fine until you realize that this
i += d + p + y;
would be written like this for a BigInteger:
i = i.add(d.add(p.add(y)));
As you can see it is a little easier to read the first line. This could be solved if Java allowed operator overloading but it doesn't, so this begs the question:
Why isn't BigInteger a primitive type so it can take advantage of the same operators as other primitive types?
That's because BigInteger is not, in fact, anything that is close to being a primitive. It is implemented using an array and some additional fields, and the various operations include complex operations. For example, here is the implementation of add:
public BigInteger add(BigInteger val) {
if (val.signum == 0)
return this;
if (signum == 0)
return val;
if (val.signum == signum)
return new BigInteger(add(mag, val.mag), signum);
int cmp = compareMagnitude(val);
if (cmp == 0)
return ZERO;
int[] resultMag = (cmp > 0 ? subtract(mag, val.mag)
: subtract(val.mag, mag));
resultMag = trustedStripLeadingZeroInts(resultMag);
return new BigInteger(resultMag, cmp == signum ? 1 : -1);
}
Primitives in Java are types that are usually implemented directly by the CPU of the host machine. For example, every modern computer has a machine-language instruction for integer addition. Therefore it can also have very simple byte code in the JVM.
A complex type like BigInteger cannot usually be handled that way, and it cannot be translated into simple byte code. It cannot be a primitive.
So your question might be "Why no operator overloading in Java". Well, that's part of the language philosophy.
And why not make an exception, like for String? Because it's not just one operator that is the exception. You need to make an exception for the operators *, /, +,-, <<, ^ and so on. And you'll still have some operations in the object itself (like pow which is not represented by an operator in Java), which for primitives are handled by speciality classes (like Math).
Fundamentally, because the informal meaning of "primitive" is that it's data that can be handled directly with a single CPU instruction. In other words, they are primitives because they fit in a 32 or 64 bits word, which is the data architecture that your CPU works with, so they can explicitely be stored in the registers.
And thus your CPU can make the following operation:
ADD REGISTER_3 REGISTER_2 REGISTER_1 ;;; REGISTER_3 = REGISTER_1 + REGISTER_2
A BigInteger which can occupy an arbitrarily large amount of memory can't be stored in a single REGISTER and will need to perform multiple instructions to make a simple sum.
This is why they couldn't possibly be a primitive type, and now they actually are objects with methods and fields, a much more complex structure than simple primitive types.
Note: The reason why I called this informal is because ultimately the Java designers could define a "Java primitive type" as anything they wanted, they own the word, however this is vaguely the agreed use of the word.
int and boolean and char aren't primitives so that you can take advantage of operators like + and /. They are primitives for historical reasons, the biggest of which is performance.
In Java, primitives are defined as just those things that are not full-fledged Objects. Why create these unusual structures (and then re-implement them as proper objects, like Integer, later on)? Primarily for performance: operations on Objects were (and are) slower than operations on primitive types. (As other answers mention, hardware support made these operations faster, but I'd disagree that hardware support is an "essential property" of primitives.)
So some types received "special treatment" (and were implemented as primitives), and others didn't. Think of it this way: if even the wildly-popular String is not a primitive type, why would BigInteger be?
It's because primitive types have a size limit. For instance int is 32 bits and long is 64 bits. So if you create a variable of type int the JVM allocates 32 bits of memory on the stack for it. But as for BigInteger, it "theoretically" has no size limit. Meaning it can grow arbitrarily in size. Because of this, there is no way to know its size and allocate a fixed block of memory on the stack for it. Therefore it is allocated on the heap where the JVM can always increase the size if needed.
Primitive types are normally historic types defined by processor architecture. Which is why byte is 8-bit, short is 16-bit, int is 32-bit and long is 64-bit. Maybe when there's more 128-bit architectures, an extra primitive will be created...but I can't see there being enough drive for this...
Related
While going through the ways of converting primitive arrays to Streams, I found that char[] are not supported while other primitive array types are supported. Any particular reason to leave them out in the stream?
Of course, the answer is "because that's what the designers decided". There is no technical reason why CharStream could not exist.
If you want justification, you usually need to turn the the OpenJDK mailing list*. The JDK's documentation is not in the habit of justifying why anything is why it is.
Someone asked
Using IntStream to represent char/byte stream is a little
inconvenient. Should we add CharStream and ByteStream as well?
The reply from Brian Goetz (Java Language Architect) says
Short answer: no.
It is not worth another 100K+ of JDK footprint each for these forms
which are used almost never. And if we added those, someone would
demand short, float, or boolean.
Put another way, if people insisted we had all the primitive
specializations, we would have no primitive specializations. Which
would be worse than the status quo.
Source
He also says the same elsewhere
If you want to deal with them as chars, you can downcast them to
chars easily enough. Doesn't seem like an important enough use case
to have a whole 'nother set of streams. (Same with Short, Byte,
Float).
Source
TL;DR: Not worth the maintenance cost.
*In case you're curious, the google query I used was
site:http://mail.openjdk.java.net/ charstream
As Eran said, it's not the only one missing.
A BooleanStream would be useless, a ByteStream (if it existed) can be handled as an InputStream or converted to IntStream (as can short), and float can be handled as a DoubleStream.
As char is not able to represent all characters anyway (see linked), it would be a bit of a legacy stream. Although most people don't have to deal with codepoints anyway, so it can seem strange. I mean you use String.charAt() without thinking "this doesn't actually work in all cases".
So some things were left out because they weren't deemed that important. As said by JB Nizet in the linked question:
The designers explicitly chose to avoid the explosion of classes and
methods by limiting the primitive streams to 3 types, since the other
types (char, short, float) can be represented by their larger
equivalent (int, double) without any significant performance penalty.
The reason BooleanStream would be useless, is because you only have 2 values and that limits the operations a lot. There's no mathematical operations to do, and how often are you working with lots of boolean values anyway?
As can be seen from the comments, a BooleanStream is not needed. If it were, there would be a lot of actual use cases instead of theoretical situations, a use case going back to Java 1.4, and a fallacious comparison to while loop.
It's not only char arrays that are not supported.
There are only 3 types of primitive streams - IntStream, LongStream and DoubleStream.
As a result, Arrays has methods that convert int[], long[] and double[] to the corresponding primitive streams.
There are no corresponding methods for boolean[], byte[], short[], char[] and float[], since these primitive types have no corresponding primitive streams.
char is a dependent part of String - storing UTF-16 values. A Unicode symbol, a code point, is sometimes a surrogate pair of chars. So any simple solution with chars only covers part of the Unicode domain.
There was a time that char had its own right to be a public type. But nowadays it is better to use code points, an IntStream. A stream of char could not straightforwardly handle surrogate pairs.
The other more prosaic reason is that the JVM "processor" model uses an int as smallest "register", keeping booleans, bytes, shorts and also chars in such an int sized storage location. To not necessarily bloat java classes, one refrained from all possible copy variants.
In the far future one might expect primitive types allowed to function as generic type parameters, providing a List<int>. Then we might see a Stream<char>.
For the moment better avoid char, and maybe use java.text.Normalizer for a unique canonical form of code points / Unicode strings.
I am writing a program, in Java, which need bit processing, and manipulations. But I could not find any "bit" level primitive datatype. Is there any way in Java to define "bit" datatype?
In C/C++, we can tell compiler how many bits to allocate for variables storage as
struct bit {
unsigned int value : 1; // 1 bit to store value
};
How can I do same thing in Java?
boolean or Boolean with true or false (or Boolean.TRUE / Boolean.FALSE). Both of which can be used with autoboxing and unboxing. There is also a BitSet for handling a vector of bits.
Depends on what you mean by "bit".
If you want single bits use a boolean and true/false.
If you want multi-bit fields you'll have to use an integer type (byte, short, int or long) and do the masking yourself. There's also BitSet for an OO implementation, but it's likely to be slower than masking for small (<64 bits) sets, although better for readability.
Can someone explain that why do arithmetic operations on integral types in Java always result in "int" or "long" results?
I think it's worth pointing out that this (arithmetic operations on integers producing integers) is a feature of many many programming languages, not only Java.
Many of those programming languages were invented before Java, many after Java, so I think that arguments that it is a hang-over from the days when hardware was less capable are wide of the mark. This feature of language design is about making languages type-safe. There are very good reasons for separating integers and floating-point numbers in programming languages, and for making the programmer responsible for identifying when and how conversions from type to type take place.
Check this out: http://www.particle.kth.se/~lindsey/JavaCourse/Book/Part1/Java/Chapter02/operators.html#ArithOps
It explains how the type of the return value is determined by the types of the operands. Essentially:
the arithmetic operators require a numeric type
if the type of either operand is an integral type, the return value will be the widest type included (so int + long = long)
if the type of either operand is a floating-point number then a floating-point number will be returned
if both operands are floating-point, then a double will be returned if either operand is a double
If you need to control the types, then you'll need to cast the operands to the appropriate types. For example, int * int could be too long for an int, so you may need to do:
long result = myInt * (long) anotherInt
Likewise for really large or really tiny floats resulting from arithmetic operations.
Because the basic integer arithmetic operators are only defined either between int and int or between long and long. In all other cases types are automatically widened to suit. There are no doubt some abstruse paragraphs in the Java Language Specification explaining exactly what happens.
Dummy answer: because this is how the Java Language Specification defines them:
4.2.2. Integer Operations
The Java programming language provides a number of operators that act on integral values:
[...]
The numerical operators, which result in a value of type int or long:
Do you mean why you don't get a double or BigInteger result? Historical accident and efficiency reasons, mostly. Detecting overflow from + or * and handing the result (from Integer.MAX_VALUE * Integer.MAX_VALUE, say) means generating lots of exception detection code that will almost never get triggered, but always needs to get executed. Much easier to define addition or multiplication modulo 2^32 (or 2^64) and not worry about it. Same for division with a fractional remainder.
This was certainly the case long ago with C. It is less of an issue today with superscalar processors and lots of bits to play with. But people got used to it, so it remains in Java today. Use Python 3 if you want your arithmetic autoconverted to a type that can hold the result.
The reason is kind of the same as why we have primitive types in Java at all -- it allows writing efficient code. You may argue that it also makes less efficient but correct code much uglier; you'd be about right. Keep in mind that the design choice was made around 1995.
HashMap internally has its own static final variables for its working.
static final int DEFAULT_INITIAL_CAPACITY = 16;
Why can't they use byte datatype instead of using int since the value is too small.
They could, but it would be a micro-optimization, and the tradeoff would be less readable and maintainable code (Premature optimization, anyone?).
This is a static final variable, so it's allocated only once per classloader. I'd say we can spare those 3 (I'm guessing here) bytes.
I think this is because the capacity for a Map is expressed in terms of an int. When you try to work with a byte and an int, because of promotion rules, the byte will anyways be converted to an int. The default capacity is expressed in terms of an int to maybe avoid those needless promotions.
Using byte or short for variables and constants instead of int is a premature optimization that has next to no effect.
Most arithmetic and logical instructions of the JVM work only with int, long, float and double, other data types have to be cast to (usually) ints in order for these instructions to be executed on them.
The default type of number literals is int for integral and double for floating point numbers. Using byte, short and float types can thus cause some subtle programming bugs and generally worsens code readability.
A little example from the Java Puzzlers book:
public static void main(String[] args) {
for (byte b = Byte.MIN_VALUE; b < Byte.MAX_VALUE; b++) {
if (b == 0x90)
System.out.print("Joy!");
}
}
This program doesn't print Joy!, because the hex value 0x90 is implicitly promoted to an int with the value 144. Since bytes in Java are signed (which itself is very inconvenient), the variable b is never assigned to this value (Byte.MAX_VALUE = 127) and therefore, the condition is never satisfied.
All in all, the reduction of the memory footprint is simply too small (next to none) to justify such micro-optimisation. Generally, explicit numeric types of different size are not necessary and suitable for higher level programming. I personally think that only case where smaller numeric types are acceptable are byte arrays.
The byte values still taking the same space in the JVM and it will also need to be converted to int to the practical purposes explicitly or implicitly, including array sizes, indexes, etc.
Converting from a byte to an int(as it needs to be anint` in any case) would make the code slower if anything. The cost of memory is pretty trivial in the overall scheme of things.
Given the default could be any int value, I think int makes sense.
A lot of data can be represented as a series of Bytes.
Int is the default data type that most users will use when counting or workign with whole numbers.
the issue with using Byte is that the compiler will not recognize it for type conversion.
anytime you tried
int variablename = bytevariable;
it wouldnt complete the assignment however
double variablename = intVariable;
would work.
I can create a literal long by appending an L to the value; why can't I create a literal short or byte in some similar way? Why do I need to use an int literal with a cast?
And if the answer is "Because there was no short literal in C", then why are there no short literals in C?
This doesn't actually affect my life in any meaningful way; it's easy enough to write (short) 0 instead of 0S or something. But the inconsistency makes me curious; it's one of those things that bother you when you're up late at night. Someone at some point made a design decision to make it possible to enter literals for some of the primitive types, but not for all of them. Why?
In C, int at least was meant to have the "natural" word size of the CPU and long was probably meant to be the "larger natural" word size (not sure in that last part, but it would also explain why int and long have the same size on x86).
Now, my guess is: for int and long, there's a natural representation that fits exactly into the machine's registers. On most CPUs however, the smaller types byte and short would have to be padded to an int anyway before being used. If that's the case, you can as well have a cast.
I suspect it's a case of "don't add anything to the language unless it really adds value" - and it was seen as adding sufficiently little value to not be worth it. As you've said, it's easy to get round, and frankly it's rarely necessary anyway (only for disambiguation).
The same is true in C#, and I've never particularly missed it in either language. What I do miss in Java is an unsigned byte type :)
Another reason might be that the JVM doesn't know about short and byte. All calculations and storing is done with ints, longs, floats and doubles inside the JVM.
There are several things to consider.
1) As discussed above the JVM has no notion of byte or short types. Generally these types are not used in computation at the JVM level; so one can think there would be less use of these literals.
2) For initialization of byte and short variables, if the int expression is constant and in the allowed range of the type it is implicitly cast to the target type.
3) One can always cast the literal, ex (short)10