Why are there no byte or short literals in Java? - java

I can create a literal long by appending an L to the value; why can't I create a literal short or byte in some similar way? Why do I need to use an int literal with a cast?
And if the answer is "Because there was no short literal in C", then why are there no short literals in C?
This doesn't actually affect my life in any meaningful way; it's easy enough to write (short) 0 instead of 0S or something. But the inconsistency makes me curious; it's one of those things that bother you when you're up late at night. Someone at some point made a design decision to make it possible to enter literals for some of the primitive types, but not for all of them. Why?

In C, int at least was meant to have the "natural" word size of the CPU and long was probably meant to be the "larger natural" word size (not sure in that last part, but it would also explain why int and long have the same size on x86).
Now, my guess is: for int and long, there's a natural representation that fits exactly into the machine's registers. On most CPUs however, the smaller types byte and short would have to be padded to an int anyway before being used. If that's the case, you can as well have a cast.

I suspect it's a case of "don't add anything to the language unless it really adds value" - and it was seen as adding sufficiently little value to not be worth it. As you've said, it's easy to get round, and frankly it's rarely necessary anyway (only for disambiguation).
The same is true in C#, and I've never particularly missed it in either language. What I do miss in Java is an unsigned byte type :)

Another reason might be that the JVM doesn't know about short and byte. All calculations and storing is done with ints, longs, floats and doubles inside the JVM.

There are several things to consider.
1) As discussed above the JVM has no notion of byte or short types. Generally these types are not used in computation at the JVM level; so one can think there would be less use of these literals.
2) For initialization of byte and short variables, if the int expression is constant and in the allowed range of the type it is implicitly cast to the target type.
3) One can always cast the literal, ex (short)10

Related

Why the need for "F" and "L" suffixes at the end of long and float data types?

Why are the "F" and "L" suffixes needed when declaring a long or float? According to the documentation:
An integer literal is of type long if it ends with the letter L or l; otherwise it is of type int.
A floating-point literal is of type float if it ends with the letter F or f; otherwise its type is double.
So, from that, obviously the compiler is treating the values as either an int data type or a double data type, by default. That doesn't quite explain things for me.
I dug a bit deeper and found a discussion where a user describes the conversion from a 64-bit double into a 32-bit float would result in data loss, and the designers didn't want to make assumptions.
Questions I still have:
Why would the compiler allow one to write byte myByte = 100;, and the compiler automatically convers 100, an int as described above, into a byte, but the compiler won't allow the long myLong = 3_000_000_000;? Why will it not auto-convert 3_000_000_000 into a long, despite it being well within the range of a long?
As discussed above, when designing Java, the designers won't allow a double to be assigned to a float because of the data loss. While this may be true for a value that is outside of the range of a float, obviously something like 3.14 is small-enough for a float. So then, why does the compiler throw an error with the assignment float myFloat = 3.14;?
Ultimately, I'm failing to fully understand why the suffixes are needed, and the rules surrounding automatic casting (if that's what's happening under-the-hood), etc.
I know this topic has been discussed before, but the answers given only raise more questions, so I am deciding to create a new post.
In answer to your specific questions:
The problem with long myLong = 3_000_000_000; is that 3_000_000_000 is not a legal int literal because 3,000,000,000 does not fit into 4 bytes. The fact that you want to promote it to a long in order to initialize myLong is irrelevant. (Yes, the language designers could have designed the language so that in this context 3_000_000_000 could have been parsed as a long, but they didn't, probably to keep the language simpler and to avoid ambiguities in other contexts.)
The problem with 3.14 is not a matter of range but of loss of precision. In particular, while 3.14 terminates in base 10 representation, it does not have a finite representation in binary floating point. So converting from a double to a float (in order to initialize myFloat) would involve truncating significant, non-zero bits of the representation. (But just to be clear: Java considers every narrowing conversion from double to float to be lossy, regardless of the actual values involved. So float myFloat = 3.0; would also fail. However, float myFloat = 3; succeeds because conversion from an int value to a float is considered a widening conversion.)
In both cases, the right thing to do is to indicate exactly to the compiler what you are trying to do by appending the appropriate suffix to the numeric literal.
Why would the compiler allow one to write byte myByte = 100;, and the compiler automatically convers 100, an int as described above, into a byte, but the compiler won't allow the long myLong = 3_000_000_000;?
Because the spec says so. Note that byte myByte = 100; does work, yes, but that is a special case, explicitly mentioned in the Java Language Specification; ordinarily, 100 as a literal in a .java file is always interpreted as an int first, and never silently converts itself to a byte, except in two cases, both explicitly mentioned in the JLS: The cast is 'implied' in modified assignment: someByteArr += anyNumber; always works and implies the cast (again, why? Because the spec says so), and the same explicit presumption is made when declaring a variable: byte b = 100;, assuming the int literal is in fact in byte range (-128 to +127).
The JLS does not make an explicit rule that such concepts are applied in a long x = veryLargeLiteral;. And that is where your quest really ought to end. The spec says so. End of story.
If you'd like to ask the question: "Surely whomever person or persons added this, or rather failed to add this explicit case to the JLS had their reasons for it, and these reasons are more technical and merit based than 'cuz they thought of it in a dream' or 'because they flipped a coin', and then we get to a pure guess (because you'd have to ask them, so probably James Gosling, about why he made a decision 25 years ago):
Because it would be considerably more complex to implement for the javac codebase.
Right now literals are first considered as an int and only then, much later in the process, if the code is structured such that the JLS says no cast is needed, they can be 'downcast'. Whereas with the long scenario this does not work: Once you try to treat 3_000_000_000 as an int, you already lost the game because that does not fit, thus the parser that parses this needs to create some sort of bizarro 'schrodinger's cat' style node, which represents 3_000_000_000 accurately, but nevertheless will downstream get turned into a parsing error UNLESS it is used in an explicit scenario where the silently-treat-as-long part is allowed. That's certainly possible, but slightly more complex.
Presumably the same argument applies to why, in 25 years, java has not seen an update. It could get that at some point in time, but I doubt it'll have high priority.
As discussed above, when designing Java, the designers won't allow a double to be assigned to a float because of the data loss.
This really isn't related at all. int -> long is lossy, but double -> float mostly isn't (it's floating point, you lose a little every time you do stuff with them pretty much, but that's sort of baked into the contract when you use them at all, so that should not stop you).
obviously something like 3.14 is small-enough for a float.
Long and int are easy: Ints go from about -2 billion to about +2 billion and longs go a lot further. But float/double is not like that. They represent roughly the same range (which is HUGE, 300+ digit numbers are fine), but their accuracy goes down as you get away from the 0, and for floats it goes down a lot faster. Almost every number, probably including 3.14, cannot be perfectly represented by either float or double, so we're just arguing on how much error is acceptable. Thus, java does not as a rule silently convert stuff to a float, because, hey, you picked double, presumably for a reason, so you need to explicitly tell the compiler: "Yup. I get it, I want you to convert and I will accept the potential loss, it is what I want", because once the compiler starts guessing at what you meant, that is an excellent source of hard to find bugs. Java has loads of places where it is designed like this. Contrast to languages like javascript or PHP where tons of code is legal even if it is bizarre and seems to make no sense, because the compiler will just try to guess at what you wanted.
Java is much better than that - it draws a line; once your code is sufficiently weird that the odds that javac knows what you wanted drop below a treshold, java will actively refuse to then take a wild stab in the dark at what you meant and will just flat out refuse and ask you to be more clear about it. In a 20 year coding career I cannot stress enough how useful that is :)
I know this topic has been discussed before, but the answers given only raise more questions, so I am deciding to create a new post.
And yet you asked the same question again instead of the 'more questions' than this raised. Shouldn't you have asked about those?
First, we need to understand how declaration happens in Java. Java is a statically-typed language, once we declare a variable, we can't change the data type of our variable after. Let's look up an examples:
long myLong = 3_000_000_000;
Integral types(byte,short,int,long) are "int" by default. The differences are sizes(byte<short<int<long).
When we declare a variable we're saying to java that "myLong" variable's type should be long(which is int but longer size). And after we're trying to equalize with "3_000_000_000"(Literal) which is int BUT int's max value is 3,147,483,647 so it's bigger. That's why we should write "L or l" to the end of the literal. After adding "l", now, our literal is long and we can equalize with declared long "myLong". => long myLong = 3_000_000_000l;
int myInt = 300L; => (Error will appears)
In this example our literal(300L) is long. As I mentioned before long's size is bigger than other integral types. When we delete "L" from end of the literal, "300" will be int.
Here is another example for FLoat and Double :
float myFloat = 5.5; (Error)
float myFloat = 5.5F; (Correct version)
Float and Double are "double" by default. The difference is, double bigger than float. myFloat is "float" in the begining, 5.5 is double so error will appear that we can't equalize. That is why we should add "F or f" to the end of the 5.5. We can use "D or d" for double but it's up on us, it's not necessary because there's no bigger floating type than double.
Hope it's clear :)

Why are char[] the only arrays not supported by Arrays.stream()?

While going through the ways of converting primitive arrays to Streams, I found that char[] are not supported while other primitive array types are supported. Any particular reason to leave them out in the stream?
Of course, the answer is "because that's what the designers decided". There is no technical reason why CharStream could not exist.
If you want justification, you usually need to turn the the OpenJDK mailing list*. The JDK's documentation is not in the habit of justifying why anything is why it is.
Someone asked
Using IntStream to represent char/byte stream is a little
inconvenient. Should we add CharStream and ByteStream as well?
The reply from Brian Goetz (Java Language Architect) says
Short answer: no.
It is not worth another 100K+ of JDK footprint each for these forms
which are used almost never. And if we added those, someone would
demand short, float, or boolean.
Put another way, if people insisted we had all the primitive
specializations, we would have no primitive specializations. Which
would be worse than the status quo.
Source
He also says the same elsewhere
If you want to deal with them as chars, you can downcast them to
chars easily enough. Doesn't seem like an important enough use case
to have a whole 'nother set of streams. (Same with Short, Byte,
Float).
Source
TL;DR: Not worth the maintenance cost.
*In case you're curious, the google query I used was
site:http://mail.openjdk.java.net/ charstream
As Eran said, it's not the only one missing.
A BooleanStream would be useless, a ByteStream (if it existed) can be handled as an InputStream or converted to IntStream (as can short), and float can be handled as a DoubleStream.
As char is not able to represent all characters anyway (see linked), it would be a bit of a legacy stream. Although most people don't have to deal with codepoints anyway, so it can seem strange. I mean you use String.charAt() without thinking "this doesn't actually work in all cases".
So some things were left out because they weren't deemed that important. As said by JB Nizet in the linked question:
The designers explicitly chose to avoid the explosion of classes and
methods by limiting the primitive streams to 3 types, since the other
types (char, short, float) can be represented by their larger
equivalent (int, double) without any significant performance penalty.
The reason BooleanStream would be useless, is because you only have 2 values and that limits the operations a lot. There's no mathematical operations to do, and how often are you working with lots of boolean values anyway?
As can be seen from the comments, a BooleanStream is not needed. If it were, there would be a lot of actual use cases instead of theoretical situations, a use case going back to Java 1.4, and a fallacious comparison to while loop.
It's not only char arrays that are not supported.
There are only 3 types of primitive streams - IntStream, LongStream and DoubleStream.
As a result, Arrays has methods that convert int[], long[] and double[] to the corresponding primitive streams.
There are no corresponding methods for boolean[], byte[], short[], char[] and float[], since these primitive types have no corresponding primitive streams.
char is a dependent part of String - storing UTF-16 values. A Unicode symbol, a code point, is sometimes a surrogate pair of chars. So any simple solution with chars only covers part of the Unicode domain.
There was a time that char had its own right to be a public type. But nowadays it is better to use code points, an IntStream. A stream of char could not straightforwardly handle surrogate pairs.
The other more prosaic reason is that the JVM "processor" model uses an int as smallest "register", keeping booleans, bytes, shorts and also chars in such an int sized storage location. To not necessarily bloat java classes, one refrained from all possible copy variants.
In the far future one might expect primitive types allowed to function as generic type parameters, providing a List<int>. Then we might see a Stream<char>.
For the moment better avoid char, and maybe use java.text.Normalizer for a unique canonical form of code points / Unicode strings.

Why isn't BigInteger a primitive

If you use BigInteger (or BigDecimal) and want to perform arithmetic on them, you have to use the methods add or subtract, for example. This may sound fine until you realize that this
i += d + p + y;
would be written like this for a BigInteger:
i = i.add(d.add(p.add(y)));
As you can see it is a little easier to read the first line. This could be solved if Java allowed operator overloading but it doesn't, so this begs the question:
Why isn't BigInteger a primitive type so it can take advantage of the same operators as other primitive types?
That's because BigInteger is not, in fact, anything that is close to being a primitive. It is implemented using an array and some additional fields, and the various operations include complex operations. For example, here is the implementation of add:
public BigInteger add(BigInteger val) {
if (val.signum == 0)
return this;
if (signum == 0)
return val;
if (val.signum == signum)
return new BigInteger(add(mag, val.mag), signum);
int cmp = compareMagnitude(val);
if (cmp == 0)
return ZERO;
int[] resultMag = (cmp > 0 ? subtract(mag, val.mag)
: subtract(val.mag, mag));
resultMag = trustedStripLeadingZeroInts(resultMag);
return new BigInteger(resultMag, cmp == signum ? 1 : -1);
}
Primitives in Java are types that are usually implemented directly by the CPU of the host machine. For example, every modern computer has a machine-language instruction for integer addition. Therefore it can also have very simple byte code in the JVM.
A complex type like BigInteger cannot usually be handled that way, and it cannot be translated into simple byte code. It cannot be a primitive.
So your question might be "Why no operator overloading in Java". Well, that's part of the language philosophy.
And why not make an exception, like for String? Because it's not just one operator that is the exception. You need to make an exception for the operators *, /, +,-, <<, ^ and so on. And you'll still have some operations in the object itself (like pow which is not represented by an operator in Java), which for primitives are handled by speciality classes (like Math).
Fundamentally, because the informal meaning of "primitive" is that it's data that can be handled directly with a single CPU instruction. In other words, they are primitives because they fit in a 32 or 64 bits word, which is the data architecture that your CPU works with, so they can explicitely be stored in the registers.
And thus your CPU can make the following operation:
ADD REGISTER_3 REGISTER_2 REGISTER_1 ;;; REGISTER_3 = REGISTER_1 + REGISTER_2
A BigInteger which can occupy an arbitrarily large amount of memory can't be stored in a single REGISTER and will need to perform multiple instructions to make a simple sum.
This is why they couldn't possibly be a primitive type, and now they actually are objects with methods and fields, a much more complex structure than simple primitive types.
Note: The reason why I called this informal is because ultimately the Java designers could define a "Java primitive type" as anything they wanted, they own the word, however this is vaguely the agreed use of the word.
int and boolean and char aren't primitives so that you can take advantage of operators like + and /. They are primitives for historical reasons, the biggest of which is performance.
In Java, primitives are defined as just those things that are not full-fledged Objects. Why create these unusual structures (and then re-implement them as proper objects, like Integer, later on)? Primarily for performance: operations on Objects were (and are) slower than operations on primitive types. (As other answers mention, hardware support made these operations faster, but I'd disagree that hardware support is an "essential property" of primitives.)
So some types received "special treatment" (and were implemented as primitives), and others didn't. Think of it this way: if even the wildly-popular String is not a primitive type, why would BigInteger be?
It's because primitive types have a size limit. For instance int is 32 bits and long is 64 bits. So if you create a variable of type int the JVM allocates 32 bits of memory on the stack for it. But as for BigInteger, it "theoretically" has no size limit. Meaning it can grow arbitrarily in size. Because of this, there is no way to know its size and allocate a fixed block of memory on the stack for it. Therefore it is allocated on the heap where the JVM can always increase the size if needed.
Primitive types are normally historic types defined by processor architecture. Which is why byte is 8-bit, short is 16-bit, int is 32-bit and long is 64-bit. Maybe when there's more 128-bit architectures, an extra primitive will be created...but I can't see there being enough drive for this...

Why would you need unsigned types in Java?

I have often heard complaints against Java for not having unsigned data types. See for example this comment. I would like to know how is this a problem? I have been programming in Java for 10 years more or less and never had issues with it. Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Since unsigned and signed numbers are represented with the same bit values, the only places I can think of where signedness matters are:
When converting the numbers to other bit representation. Between 8, 16 and 32 bit integer types you can use bitmasks if needed.
When converting numbers to decimal format, usually to Strings.
Interoperating with non-Java systems through API's or protocols. Again the data is just bits, so I don't see the problem here.
Using the numbers as memory or other offsets. With 32 bit ints this might be problem for very huge offsets.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those. What am I missing? What are the actual benefits of having unsigned types in a programming language and how would having those make Java better?
Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Why not? Is "applying a bitwise AND with 0xFF" actually part of what your code is trying to represent? If not, why should it have to be part of have you write it? I actually find that almost anything I want to do with bytes beyond just copying them from one place to another ends up requiring a mask. I want my code to be cruft-free; the lack of unsigned bytes hampers this :(
Additionally, consider an API which will always return a non-negative value, or only accepts non-negative values. Using an unsigned type allows you to express that clearly, without any need for validation. Personally I think it's a shame that unsigned types aren't used more in .NET, e.g. for things like String.Length, ICollection.Count etc. It's very common for a value to naturally only be non-negative.
Is the lack of unsigned types in Java a fatal flaw? Clearly not. Is it an annoyance? Absolutely.
The comment that you quote hits the nail on the head:
Java's lack of unsigned data types also stands against it. Yes, you can work around it, but it's not ideal and you'll be using code that doesn't really reflect the underlying data correctly.
Suppose you are interoperating with another system, which wants an unsigned 16 bit integer, and you want to represent the number 65535. You claim "the data is just bits, so I don't see the problem here" - but having to pass -1 to mean 65535 is a problem. Any impedance mismatch between the representation of your data and its underlying meaning introduces an extra speedbump when writing, reading and testing the code.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those.
The only times you would need to consider those operations is when you were naturally working with values of two different types - one signed and one unsigned. At that point, you absolutely want to have that difference pointed out. With signed types being used to represent naturally unsigned values, you should still be considering the differences, but the fact that you should is hidden from you. Consider:
// This should be considered unsigned - so a value of -1 is "really" 65535
short length = /* some value */;
// This is really signed
short foo = /* some value */;
boolean result = foo < length;
Suppose foo is 100 and length is -1. What's the logical result? The value of length represents 65535, so logically foo is smaller than it. But you'd probably go along with the code above and get the wrong result.
Of course they don't even need to represent different types here. They could both be naturally unsigned values, represented as signed values with negative numbers being logically greater than positive ones. The same error applies, and wouldn't be a problem if you had unsigned types in the language.
You might also want to read this interview with Joshua Bloch (Google cache, as I believe it's gone from java.sun.com now), including:
Ooh, good question... I'm going to say that the strangest thing about the Java platform is that the byte type is signed. I've never heard an explanation for this. It's quite counterintuitive and causes all sorts of errors.
If you like, yes, everything is ones and zeroes. However, your hardware arithmetic and logic unit doesn't work that way. If you want to store your bits in a signed integer value but perform operations that are not natural to signed integers, you will usually waste both storage space and processing time.
An unsigned integer type stores twice as many non-negative values in the same space as the corresponding signed integer type. So if you want to take into Java any data commonly used in a language with unsigned values, such as a POSIX date value (unsigned number of seconds) that is normally used with C, then in general you will need to use a wider integer type than C would use. If you are processing many such values, again you will waste both storage space and fetch-execute time.
The times I have used unsigned data types have been when I read in large blocks of data that correspond to images, or worked with openGL. I personally prefer unsigned if I know something will never be negative, as a "safety feature" of sorts.
Unsigned types are useful for bit-by-bit comparisons, and I'm pretty sure they are used extensively in graphics.

Why does char to int casting works and not char to Integer in Java

I was working on a problem when I encountered this.
(int)input.charAt(i) //works
(Integer)input.charAt(i) // Does not work
// input being a string
The first thought I have is primitives are treated differently and that is why this is not working. But then I find it difficult to understand why would they have a Integer Wrapper class in the first place.
Edit:
What are the advantages of having wrapper classes then? Is it just for not having a primitives presence and being more OO in design? I'm finding it difficult to understand how are tehy helpful. New doubt altogetehr.
You're right that primitives are treated differently. The following would work:
(Integer)(int)input.charAt(i);
The difference is that when the argument is an int, (Integer) boxes the integer. It's not actually a cast even though it looks like it. But if the argument is a char, then it would be a cast attempt; but primitives can't be cast to objects and therefore it doesn't work. What you can do is to first cast the char to int - this cast is okay since both are primitive types - and then the int can be boxed.
Of course, char -> Integer boxing could have been made working. "Why not?" is a good question. Probably there would have been little use for such feature, especially when the same function can be achieved by being a little bit more explicit. (Should char -> Long work too, then? And char -> Short? chars are 16-bit, so this would be most straightforward.)
Answer to edit: the advantage of wrapper classes is that wrapped primitives can be treated like objects: stored in a List<Integer>, for example. List<int> would not work, because int is not an object. So maybe even more relevant question would be, what are primitive non-objects doing in an OO language? The answer is in performance: primitives are faster and take less memory. The use case determines whether the convenience of objects or the performance of primitives is more important.
Because Integer is an Object. and char is not. you cant cast a non Object thing to some Object.
Infact you cannot cast some Object to any other class Object which is not in the hierarchy of that Object.
eg you cannot do this
Integer g = (Integer) s;
where s is object of String.
Now why chat to int works, because every character is represented as unicode in java, so you can think of it as at backend char is a smaller version of int. (int is 32 bits and char is 16 bits)

Categories