It looks like all java containers, buffers, arrays and etc, can only be indexed by int. On C++, I can index by unsidned long for example.
What is the solution for this in Java? I can surely create my own class that uses lots of int32 indexable buffers and access the rigth one, but is there a better and simpler way?
According to the Java language specification.
10.4 Array Access
Arrays must be indexed by int values; short, byte, or char values may also be used as index values because they are subjected to unary numeric promotion (§5.6) and become int values.
An attempt to access an array component with a long index value results in a compile-time error.
And from my own perspective, since the Array.length returns an int, there would be no need to have an index beyond Integer.MAX_VALUE. So indexing via a long wouldn't be necessary.
Is it possible to index a Java array based on a byte?
i.e. something like
array[byte b] = x;
I have a very performance-critical application which reads b (in the code above) from a file, and I don't want the overhead of converting this to an int. What is the best way to achieve this? Is there a performance-decrease as a result of using this method of indexing rather than an int?
With many thanks,
Froskoy.
There's no overhead for "converting this to an int." At the Java bytecode level, all bytes are already ints.
In any event, doing array indexing will automatically upcast to an int anyway. None of these things will improve performance, and many will decrease performance. Just leave your code using an int.
The JVM specification, section 2.11.1:
Note that most instructions in Table 2.2 do not have forms for the integral types byte, char, and short. None have forms for the boolean type. Compilers encode loads of literal values of types byte and short using Java virtual machine instructions that sign-extend those values to values of type int at compile-time or runtime. Loads of literal values of types boolean and char are encoded using instructions that zero-extend the literal to a value of type int at compile-time or runtime. Likewise, loads from arrays of values of type boolean, byte, short, and char are encoded using Java virtual machine instructions that sign-extend or zero-extend the values to values of type int. Thus, most operations on values of actual types boolean, byte, char, and short are correctly performed by instructions operating on values of computational type int.
As all integer types in java are signed you have anyway to mask out 8 bits of b's value provided you do expect to read from the file values greater than 0x7F:
byte b;
byte a[256];
a [b & 0xFF] = x;
No; array indices are non-negative integers (JLS 10.4), but byte indices will be promoted.
No, there is no performance decrease, because on the moment you read the byte, you store it in a CPU register sometime. Those registers always works with WORDs, which means that the byte is always "converted" to an int (or a long, if you are on a 64 bit machine).
So, simply read your byte like this:
int b = (in.readByte() & 0xFF);
If your application is that performance critical, you should be optimizing elsewhere.
HashMap internally has its own static final variables for its working.
static final int DEFAULT_INITIAL_CAPACITY = 16;
Why can't they use byte datatype instead of using int since the value is too small.
They could, but it would be a micro-optimization, and the tradeoff would be less readable and maintainable code (Premature optimization, anyone?).
This is a static final variable, so it's allocated only once per classloader. I'd say we can spare those 3 (I'm guessing here) bytes.
I think this is because the capacity for a Map is expressed in terms of an int. When you try to work with a byte and an int, because of promotion rules, the byte will anyways be converted to an int. The default capacity is expressed in terms of an int to maybe avoid those needless promotions.
Using byte or short for variables and constants instead of int is a premature optimization that has next to no effect.
Most arithmetic and logical instructions of the JVM work only with int, long, float and double, other data types have to be cast to (usually) ints in order for these instructions to be executed on them.
The default type of number literals is int for integral and double for floating point numbers. Using byte, short and float types can thus cause some subtle programming bugs and generally worsens code readability.
A little example from the Java Puzzlers book:
public static void main(String[] args) {
for (byte b = Byte.MIN_VALUE; b < Byte.MAX_VALUE; b++) {
if (b == 0x90)
System.out.print("Joy!");
}
}
This program doesn't print Joy!, because the hex value 0x90 is implicitly promoted to an int with the value 144. Since bytes in Java are signed (which itself is very inconvenient), the variable b is never assigned to this value (Byte.MAX_VALUE = 127) and therefore, the condition is never satisfied.
All in all, the reduction of the memory footprint is simply too small (next to none) to justify such micro-optimisation. Generally, explicit numeric types of different size are not necessary and suitable for higher level programming. I personally think that only case where smaller numeric types are acceptable are byte arrays.
The byte values still taking the same space in the JVM and it will also need to be converted to int to the practical purposes explicitly or implicitly, including array sizes, indexes, etc.
Converting from a byte to an int(as it needs to be anint` in any case) would make the code slower if anything. The cost of memory is pretty trivial in the overall scheme of things.
Given the default could be any int value, I think int makes sense.
A lot of data can be represented as a series of Bytes.
Int is the default data type that most users will use when counting or workign with whole numbers.
the issue with using Byte is that the compiler will not recognize it for type conversion.
anytime you tried
int variablename = bytevariable;
it wouldnt complete the assignment however
double variablename = intVariable;
would work.
i had a doubt in assigning or casting to a proper data type.
byte a=3; //compiled
byte b=5; //compiled
byte c=a+b; //not compiled and reporting as possible loss of precision.
here first two statements are compiling even though we are assigning int literal to byte.but what about third statement i am doing the same as above and that too value of a+b is in the range of byte .why there is such error?
The general rule is that you can not use assignment to narrow an integer to a byte, because that is an unsafe narrowing conversion (most ints don't fit in a byte). Specifically, none of the allowed assignment conversions may narrow.
However, there is an exception specifically for this case:
A narrowing primitive conversion may
be used if the type of the variable is
byte, short, or char, and the value of
the constant expression is
representable in the type of the
variable.
This applies to a and b. The type of both variables is byte, and the values of both constant expressions clearly fit in a byte.
When you do the + operation on two bytes, they get implicitly converted to an int, so the result is an int as well. Therefore, you need another cast.
( The literal assignments in the first statement have nothing to do with it. )
I believe that addition performs binary numeric promotion, so a and b are being promoted to ints in the third statement.
I have a question about the primitive type short in Java. I am using JDK 1.6.
If I have the following:
short a = 2;
short b = 3;
short c = a + b;
the compiler does not want to compile - it says that it "cannot convert from int to short" and suggests that I make a cast to short, so this:
short c = (short) (a + b);
really works. But my question is why do I need to cast? The values of a and b are in the range of short - the range of short values is {-32,768, 32767}.
I also need to cast when I want to perform the operations -, *, / (I haven't checked for others).
If I do the same for primitive type int, I do not need to cast aa+bb to int. The following works fine:
int aa = 2;
int bb = 3;
int cc = aa +bb;
I discovered this while designing a class where I needed to add two variables of type short, and the compiler wanted me to make a cast. If I do this with two variables of type int, I don't need to cast.
A small remark: the same thing also happens with the primitive type byte. So, this works:
byte a = 2;
byte b = 3;
byte c = (byte) (a + b);
but this not:
byte a = 2;
byte b = 3;
byte c = a + b;
For long, float, double, and int, there is no need to cast. Only for short and byte values.
As explained in short C# (but also for other language compilers as well, like Java)
There is a predefined implicit conversion from short to int, long, float, double, or decimal.
You cannot implicitly convert nonliteral numeric types of larger storage size to short (see Integral Types Table for the storage sizes of integral types). Consider, for example, the following two short variables x and y:
short x = 5, y = 12;
The following assignment statement will produce a compilation error, because the arithmetic expression on the right-hand side of the assignment operator evaluates to int by default.
short z = x + y; // Error: no conversion from int to short
To fix this problem, use a cast:
short z = (short)(x + y); // OK: explicit conversion
It is possible though to use the following statements, where the destination variable has the same storage size or a larger storage size:
int m = x + y;
long n = x + y;
A good follow-up question is:
"why arithmetic expression on the right-hand side of the assignment operator evaluates to int by default" ?
A first answer can be found in:
Classifying and Formally Verifying Integer Constant Folding
The Java language specification defines exactly how integer numbers are represented and how integer arithmetic expressions are to be evaluated. This is an important property of Java as this programming language has been designed to be used in distributed applications on the Internet. A Java program is required to produce the same result independently of the target machine executing it.
In contrast, C (and the majority of widely-used imperative and
object-oriented programming languages) is more sloppy and leaves many important characteristics open. The intention behind this inaccurate language
specification is clear. The same C programs are supposed to run on a 16-bit,
32-bit, or even 64-bit architecture by instantiating the integer arithmetics of
the source programs with the arithmetic operations built-in in the target processor. This leads to much more efficient code because it can use the available
machine operations directly. As long as the integer computations deal only
with numbers being “sufficiently small”, no inconsistencies will arise.
In this sense, the C integer arithmetic is a placeholder which is not defined exactly
by the programming language specification but is only completely instantiated by determining the target machine.
Java precisely defines how integers are represented and how integer arithmetic is to be computed.
Java Integers
--------------------------
Signed | Unsigned
--------------------------
long (64-bit) |
int (32-bit) |
short (16-bit) | char (16-bit)
byte (8-bit) |
Char is the only unsigned integer type. Its values represent Unicode characters, from \u0000 to \uffff, i.e. from 0 to 216−1.
If an integer operator has an operand of type long, then the other operand is also converted to type long. Otherwise the operation is performed on operands of type int, if necessary shorter operands are converted into int. The conversion rules are exactly specified.
[From Electronic Notes in Theoretical Computer Science 82 No. 2 (2003)
Blesner-Blech-COCV 2003: Sabine GLESNER, Jan Olaf BLECH,
Fakultät für Informatik,
Universität Karlsruhe
Karlsruhe, Germany]
EDIT: Okay, now we know it's Java...
Section 4.2.2 of the Java Language Specification states:
The Java programming language provides
a number of operators that act on
integral values:
[...]
The numerical operators, which result
in a value of type int or long:
[...]The additive operators + and
- (§15.18)
In other words, it's like C# - the addition operator (when applied to integral types) only ever results in int or long, which is why you need to cast to assign to a short variable.
Original answer (C#)
In C# (you haven't specified the language, so I'm guessing), the only addition operators on primitive types are:
int operator +(int x, int y);
uint operator +(uint x, uint y);
long operator +(long x, long y);
ulong operator +(ulong x, ulong y);
float operator +(float x, float y);
double operator +(double x, double y);
These are in the C# 3.0 spec, section 7.7.4. In addition, decimal addition is defined:
decimal operator +(decimal x, decimal y);
(Enumeration addition, string concatenation and delegate combination are also defined there.)
As you can see, there's no short operator +(short x, short y) operator - so both operands are implicitly converted to int, and the int form is used. That means the result is an expression of type "int", hence the need to cast.
In C# and Java, the arithmatic expression on the right hand side of the assignment evaluates to int by default. That's why you need to cast back to a short, because there is no implicit conversion form int to short, for obvious reasons.
Given that the "why int by default" question hasn't been answered ...
First, "default" is not really the right term (although close enough). As noted by VonC, an expression composed of ints and longs will have a long result. And an operation consisting of ints/logs and doubles will have a double result. The compiler promotes the terms of an expression to whatever type provides a greater range and/or precision in the result (floating point types are presumed to have greater range and precision than integral, although you do lose precision converting large longs to double).
One caveat is that this promotion happens only for the terms that need it. So in the following example, the subexpression 5/4 uses only integral values and is performed using integer math, even though the overall expression involves a double. The result isn't what you might expect...
(5/4) * 1000.0
OK, so why are byte and short promoted to int? Without any references to back me up, it's due to practicality: there are a limited number of bytecodes.
"Bytecode," as its name implies, uses a single byte to specify an operation. For example iadd, which adds two ints. Currently, 205 opcodes are defined, and integer math takes 18 for each type (ie, 36 total between integer and long), not counting conversion operators.
If short, and byte each got their own set of opcodes, you'd be at 241, limiting the ability of the JVM to expand. As I said, no references to back me up on this, but I suspect that Gosling et al said "how often do people actually use shorts?" On the other hand, promoting byte to int leads to this not-so-wonderful effect (the expected answer is 96, the actual is -16):
byte x = (byte)0xC0;
System.out.println(x >> 2);
What language are you using?
Many C based languages have a rule that any mathematical expression is performed in size int or larger. Because of this, once you add two shorts the result is of type int. This causes the need for a cast.
Java always uses at least 32 bit values for calculations. This is due to the 32-bit architecture which was common 1995 when java was introduced. The register size in the CPU was 32 bit and the arithmetic logic unit accepted 2 numbers of the length of a cpu register. So the cpus were optimized for such values.
This is the reason why all datatypes which support arithmetic opperations and have less than 32-bits are converted to int (32 bit) as soon as you use them for calculations.
So to sum up it mainly was due to performance issues and is kept nowadays for compatibility.
In java, every numeric expression like:
anyPrimitive zas = 1;
anyPrimitive bar = 3;
?? x = zas + bar
x will always result to be at least an int, or a long if one of the addition elements was a long.
But there's are some quirks tough
byte a = 1; // 1 is an int, but it won't compile if you use a variable
a += 2; // the shortcut works even when 2 is an int
a++; // the post and pre increment operator work
AFAIS, nobody mentions of final usage for that. If you modify your last example and define variables a and b as final
variables, then the compiler is assured that their sum, value 5 , can be assigned to a
variable of type byte, without any loss of precision. In this case, the compiler is good
to assign the sum of a and b to c . Here’s the modified code:
final byte a = 2;
final byte b = 3;
byte c = a + b;
Any data type which is lower than "int" (except Boolean) is implicitly converts to "int".
In your case:
short a = 2;
short b = 3;
short c = a + b;
The result of (a+b) is implicitly converted to an int. And now you are assigning it to "short".So that you are getting the error.
short,byte,char --for all these we will get same error.
I'd like to add something that hasn't been pointed out. Java doesn't take into account the values you have given the variables (2 and 3) in...
short a = 2;
short b = 3;
short c = a + b;
So as far as Java knows, you could done this...
short a = 32767;
short b = 32767;
short c = a + b;
Which would be outside the range of short, it autoboxes the result to an int becuase it's "possible" that the result will be more than a short but not more than an int. Int was chosen as a "default" because basically most people wont be hard coding values above 2,147,483,647 or below -2,147,483,648
If two values have different data types, then java will automatically promote one of the values to the larger of the two data types. In your case, smaller data types such as byte, short and char will be "promoted" to int anytime they are used with a binary arithmetic operator. This is still true if neither operands are an int.
short x = 10;
short y = 20;
short z = x+y // this will be a compiler error. To solve this then casting would be required
short z = (short)(x+y) // this will return 30
short z = (short) x+y //this will return a compiler error
Remember that casting is a unary operator, thus by casting a larger value into a smaller datatype, you are effectively telling the compiler to ignore the default behaviour.