Java: Why is constant pool maintained only for String values ?

Java: Why is constant pool maintained only for String values ? - java

My question is about java interning and constant pools.
Java maintains a a constants pool for java.lang.String, to use JVM memory cleverly, and to do so java.lang.String is made immutable. So why doesn't java maintain constant pools of other immutable types, such as Long, Integer, Char, Short ? Wouldn't that save memory too ?
I am aware of the fact that Integers are pooled for value range [-127, 127], though I do not understand the reason for choosing this range.
Here's a test code I wrote to test pooling of other immutable data types.
public class PoolTest {
public static void main(String... args) {
// Pooling of Integer [-127, 127]
Integer x = 127, y = 127;
System.out.println("Integer:" + (x == y)); // prints true
x = 129;
y = 129;
System.out.println("Integer:" + (x == y)); // prints false
// Apparent pooling of short [-127, 127]
Short i = 127, j = 127;
System.out.println("Short: " + (i == j)); // prints true
i = 128;
j = 128;
System.out.println("Short: " + (i == j)); // prints false
// No pooling of long values
Long k = 10L, l = 10L;
System.out.println("Long: " + (i == j)); // prints false
k = 128L;
l = 128L;
System.out.println("Long: " + (i == j)); // prints false
}
}

The purpose of a constant pool is to reduce the memory overhead required by keeping multiple copies of constants. In the case of Strings, the JVM is inherently required to keep some object around for each individually distinguishable constant, and the Java spec basically says that the JVM should deduplicate String objects when class loading. The ability to manually place Strings in the pool via intern is inexpensive and allows programmers to identify particular values (such as properties) that are going to be around for the life of the program and tell the JVM to put them out of the way of normal garbage collection.
Pooling numeric constants, on the other hand, doesn't make a lot of sense, for a few reasons:
Most particular numbers aren't ever used in a given program's code.
When numbers are used in code, embedding them in the code as immediate opcode values is less expensive in terms of memory than trying to pool them. Note that even the empty String carries around a char[], an int for its length, and another for its hashCode. For a number, by contrast, a maximum of eight immediate bytes is required.
As of recent Java versions, Byte, Short, and Integer objects from -128 to 127 (0 to 127 for Character) are precached for performance reasons, not to save memory. This range was presumably chosen because this is the ranged of a signed byte, and it will cover a large number of common uses, while it would be impractical to try to precache a very large number of values.
As a note, keep in mind that the rules about interning were made long before the introduction of autoboxing and generic types in Java 5, which significantly expanded how much the wrapper classes were casually used. This increase in use led Sun to add those common values to a constant pool.

Well, because String objects are immutable, it's safe for multiple references to "share" the same String object.
public class ImmutableStrings
{
public static void main(String[] args)
{
String one = "str1";
String two = "str1";
System.out.println(one.equals(two));
System.out.println(one == two);
}
}
// Output
true
true
In such a case, there is really no need to make two instances of an identical String object. If a String object could be changed, as a StringBuffer can be changed, we would be forced to create two separate objects. But, as we know that String objects cannot change, we can safely share a String object among the two String references, one and two. This is done through the String literal pool.
You can go through this link to know in details.

Related

Equality not working while comparing Integer and int [duplicate]

This question already has answers here:
Why is 128==128 false but 127==127 is true when comparing Integer wrappers in Java?
(8 answers)
Closed 8 years ago.
Why Integer == operator does not work for 128 and after Integer values? Can someone explain this situation?
This is my Java environment:
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
Sample Code:
Integer a;
Integer b;
a = 129;
b = 129;
for (int i = 0; i < 200; i++) {
a = i;
b = i;
if (a != b) {
System.out.println("Value:" + i + " - Different values");
} else {
System.out.println("Value:" + i + " - Same values");
}
}
Some part of console output:
Value:124 - Same values
Value:125 - Same values
Value:126 - Same values
Value:127 - Same values
Value:128 - Different values
Value:129 - Different values
Value:130 - Different values
Value:131 - Different values
Value:132 - Different values

Check out the source code of Integer . You can see the caching of values there.
The caching happens only if you use Integer.valueOf(int), not if you use new Integer(int). The autoboxing used by you uses Integer.valueOf.
According to the JLS, you can always count on the fact that for values between -128 and 127, you get the identical Integer objects after autoboxing, and on some implementations you might get identical objects even for higher values.
Actually in Java 7 (and I think in newer versions of Java 6), the implementation of the IntegerCache class has changed, and the upper bound is no longer hardcoded, but it is configurable via the property "java.lang.Integer.IntegerCache.high", so if you run your program with the VM parameter -Djava.lang.Integer.IntegerCache.high=1000, you get "Same values" for all values.
But the JLS still guarantees it only until 127:
Ideally, boxing a given primitive value p, would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rules above are a pragmatic compromise. The final clause above requires that certain common values always be boxed into indistinguishable objects. The implementation may cache these, lazily or eagerly.
For other values, this formulation disallows any assumptions about the identity of the boxed values on the programmer's part. This would allow (but not require) sharing of some or all of these references.
This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. Less memory-limited implementations might, for example, cache all characters and shorts, as well as integers and longs in the range of -32K - +32K.

Integer is a wrapper class for int.
Integer != Integer compares the actual object reference, where int != int will compare the values.
As already stated, values -128 to 127 are cached, so the same objects are returned for those.
If outside that range, separate objects will be created so the reference will be different.
To fix it:
Make the types int or
Cast the types to int or
Use .equals()

According to Java Language Specifications:
If the value p being boxed is true, false, a byte, a char in the range
\u0000 to \u007f, or an int or short number between -128 and 127, then
let r1 and r2 be the results of any two boxing conversions of p. It is
always the case that r1 == r2.
JLS Boxing Conversions
Refer to this article for more information on int caching

The Integer object has an internal cache mechanism:
private static class IntegerCache {
static final int high;
static final Integer cache[];
static {
final int low = -128;
// high value may be configured by property
int h = 127;
if (integerCacheHighPropValue != null) {
// Use Long.decode here to avoid invoking methods that
// require Integer's autoboxing cache to be initialized
int i = Long.decode(integerCacheHighPropValue).intValue();
i = Math.max(i, 127);
// Maximum array size is Integer.MAX_VALUE
h = Math.min(i, Integer.MAX_VALUE - -low);
}
high = h;
cache = new Integer[(high - low) + 1];
int j = low;
for(int k = 0; k < cache.length; k++)
cache[k] = new Integer(j++);
}
private IntegerCache() {}
}
Also see valueOf method:
public static Integer valueOf(int i) {
if(i >= -128 && i <= IntegerCache.high)
return IntegerCache.cache[i + 128];
else
return new Integer(i);
}
This is why you should use valueOf instead of new Integer. Autoboxing uses this cache.
Also see this post: https://effective-java.com/2010/01/java-performance-tuning-with-maximizing-integer-valueofint/
Using == is not a good idea, use equals to compare the values.

Use .equals() instead of ==.
Integer values are only cached for numbers between -127 and 128, because they are used most often.
if (a.equals(b)) { ... }

Depending on how you get your Integer instances, it may not work for any value:
System.out.println(new Integer(1) == new Integer(1));
prints
false
This is because the == operator applied to reference-typed operands has nothing to do with the value those operands represent.

It's because Integer class implementation logic. It has prepared objects for numbers till 128. You can checkout http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/Integer.java source of open-jdk for example (search for cache[]).
Basically objects shouldn't be compared using == at all, with one exception to Enums.

Wrapper class and Reference [duplicate]

This question already has answers here:
Why is 128==128 false but 127==127 is true when comparing Integer wrappers in Java?
(8 answers)
Closed 8 years ago.
Why Integer == operator does not work for 128 and after Integer values? Can someone explain this situation?
This is my Java environment:
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
Sample Code:
Integer a;
Integer b;
a = 129;
b = 129;
for (int i = 0; i < 200; i++) {
a = i;
b = i;
if (a != b) {
System.out.println("Value:" + i + " - Different values");
} else {
System.out.println("Value:" + i + " - Same values");
}
}
Some part of console output:
Value:124 - Same values
Value:125 - Same values
Value:126 - Same values
Value:127 - Same values
Value:128 - Different values
Value:129 - Different values
Value:130 - Different values
Value:131 - Different values
Value:132 - Different values

Integer is a wrapper class for int.
Integer != Integer compares the actual object reference, where int != int will compare the values.
As already stated, values -128 to 127 are cached, so the same objects are returned for those.
If outside that range, separate objects will be created so the reference will be different.
To fix it:
Make the types int or
Cast the types to int or
Use .equals()

According to Java Language Specifications:
If the value p being boxed is true, false, a byte, a char in the range
\u0000 to \u007f, or an int or short number between -128 and 127, then
let r1 and r2 be the results of any two boxing conversions of p. It is
always the case that r1 == r2.
JLS Boxing Conversions
Refer to this article for more information on int caching

The Integer object has an internal cache mechanism:
private static class IntegerCache {
static final int high;
static final Integer cache[];
static {
final int low = -128;
// high value may be configured by property
int h = 127;
if (integerCacheHighPropValue != null) {
// Use Long.decode here to avoid invoking methods that
// require Integer's autoboxing cache to be initialized
int i = Long.decode(integerCacheHighPropValue).intValue();
i = Math.max(i, 127);
// Maximum array size is Integer.MAX_VALUE
h = Math.min(i, Integer.MAX_VALUE - -low);
}
high = h;
cache = new Integer[(high - low) + 1];
int j = low;
for(int k = 0; k < cache.length; k++)
cache[k] = new Integer(j++);
}
private IntegerCache() {}
}
Also see valueOf method:
public static Integer valueOf(int i) {
if(i >= -128 && i <= IntegerCache.high)
return IntegerCache.cache[i + 128];
else
return new Integer(i);
}
This is why you should use valueOf instead of new Integer. Autoboxing uses this cache.
Also see this post: https://effective-java.com/2010/01/java-performance-tuning-with-maximizing-integer-valueofint/
Using == is not a good idea, use equals to compare the values.

Use .equals() instead of ==.
Integer values are only cached for numbers between -127 and 128, because they are used most often.
if (a.equals(b)) { ... }

Depending on how you get your Integer instances, it may not work for any value:
System.out.println(new Integer(1) == new Integer(1));
prints
false
This is because the == operator applied to reference-typed operands has nothing to do with the value those operands represent.

It's because Integer class implementation logic. It has prepared objects for numbers till 128. You can checkout http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/Integer.java source of open-jdk for example (search for cache[]).
Basically objects shouldn't be compared using == at all, with one exception to Enums.

Primitive int vs Integer type [duplicate]

I just saw code similar to this:
public class Scratch
{
public static void main(String[] args)
{
Integer a = 1000, b = 1000;
System.out.println(a == b);
Integer c = 100, d = 100;
System.out.println(c == d);
}
}
When ran, this block of code will print out:
false
true
I understand why the first is false: because the two objects are separate objects, so the == compares the references. But I can't figure out, why is the second statement returning true? Is there some strange autoboxing rule that kicks in when an Integer's value is in a certain range? What's going on here?

The true line is actually guaranteed by the language specification. From section 5.1.7:
If the value p being boxed is true,
false, a byte, a char in the range
\u0000 to \u007f, or an int or short
number between -128 and 127, then let
r1 and r2 be the results of any two
boxing conversions of p. It is always
the case that r1 == r2.
The discussion goes on, suggesting that although your second line of output is guaranteed, the first isn't (see the last paragraph quoted below):
Ideally, boxing a given primitive
value p, would always yield an
identical reference. In practice, this
may not be feasible using existing
implementation techniques. The rules
above are a pragmatic compromise. The
final clause above requires that
certain common values always be boxed
into indistinguishable objects. The
implementation may cache these, lazily
or eagerly.
For other values, this formulation
disallows any assumptions about the
identity of the boxed values on the
programmer's part. This would allow
(but not require) sharing of some or
all of these references.
This ensures that in most common
cases, the behavior will be the
desired one, without imposing an undue
performance penalty, especially on
small devices. Less memory-limited
implementations might, for example,
cache all characters and shorts, as
well as integers and longs in the
range of -32K - +32K.

public class Scratch
{
public static void main(String[] args)
{
Integer a = 1000, b = 1000; //1
System.out.println(a == b);
Integer c = 100, d = 100; //2
System.out.println(c == d);
}
}
Output:
false
true
Yep the first output is produced for comparing reference; 'a' and 'b' - these are two different reference. In point 1, actually two references are created which is similar as -
Integer a = new Integer(1000);
Integer b = new Integer(1000);
The second output is produced because the JVM tries to save memory, when the Integer falls in a range (from -128 to 127). At point 2 no new reference of type Integer is created for 'd'. Instead of creating a new object for the Integer type reference variable 'd', it only assigned with previously created object referenced by 'c'. All of these are done by JVM.
These memory saving rules are not only for Integer. for memory saving purpose, two instances of the following wrapper objects (while created through boxing), will always be == where their primitive values are the same -
Boolean
Byte
Character from \u0000 to \u007f (7f is 127 in decimal)
Short and Integer from -128 to 127

Integer objects in some range (I think maybe -128 through 127) get cached and re-used. Integers outside that range get a new object each time.

Integer Cache is a feature that was introduced in Java Version 5 basically for :
Saving of Memory space
Improvement in performance.
Integer number1 = 127;
Integer number2 = 127;
System.out.println("number1 == number2" + (number1 == number2);
OUTPUT: True
Integer number1 = 128;
Integer number2 = 128;
System.out.println("number1 == number2" + (number1 == number2);
OUTPUT: False
HOW?
Actually when we assign value to an Integer object, it does auto promotion behind the hood.
Integer object = 100;
is actually calling Integer.valueOf() function
Integer object = Integer.valueOf(100);
Nitty-gritty details of valueOf(int)
public static Integer valueOf(int i) {
if (i >= IntegerCache.low && i <= IntegerCache.high)
return IntegerCache.cache[i + (-IntegerCache.low)];
return new Integer(i);
}
Description:
This method will always cache values in the range -128 to 127,
inclusive, and may cache other values outside of this range.
When a value within range of -128 to 127 is required it returns a constant memory location every time.
However, when we need a value thats greater than 127
return new Integer(i);
returns a new reference every time we initiate an object.
Furthermore, == operators in Java is used to compares two memory references and not values.
Object1 located at say 1000 and contains value 6.
Object2 located at say 1020 and contains value 6.
Object1 == Object2 is False as they have different memory locations though contains same values.

Yes, there is a strange autoboxing rule that kicks in when the values are in a certain range. When you assign a constant to an Object variable, nothing in the language definition says a new object must be created. It may reuse an existing object from cache.
In fact, the JVM will usually store a cache of small Integers for this purpose, as well as values such as Boolean.TRUE and Boolean.FALSE.

My guess is that Java keeps a cache of small integers that are already 'boxed' because they are so very common and it saves a heck of a lot of time to re-use an existing object than to create a new one.

That is an interesting point.
In the book Effective Java suggests always to override equals for your own classes. Also that, to check equality for two object instances of a java class always use the equals method.
public class Scratch
{
public static void main(String[] args)
{
Integer a = 1000, b = 1000;
System.out.println(a.equals(b));
Integer c = 100, d = 100;
System.out.println(c.equals(d));
}
}
returns:
true
true

Direct assignment of an int literal to an Integer reference is an example of auto-boxing, where the literal value to object conversion code is handled by the compiler.
So during compilation phase compiler converts Integer a = 1000, b = 1000; to Integer a = Integer.valueOf(1000), b = Integer.valueOf(1000);.
So it is Integer.valueOf() method which actually gives us the integer objects, and if we look at the source code of Integer.valueOf() method we can clearly see the method caches integer objects in the range -128 to 127 (inclusive).
/**
* Returns an {#code Integer} instance representing the specified
* {#code int} value. If a new {#code Integer} instance is not
* required, this method should generally be used in preference to
* the constructor {#link #Integer(int)}, as this method is likely
* to yield significantly better space and time performance by
* caching frequently requested values.
*
* This method will always cache values in the range -128 to 127,
* inclusive, and may cache other values outside of this range.
*
* #param i an {#code int} value.
* #return an {#code Integer} instance representing {#code i}.
* #since 1.5
*/
public static Integer valueOf(int i) {
if (i >= IntegerCache.low && i <= IntegerCache.high)
return IntegerCache.cache[i + (-IntegerCache.low)];
return new Integer(i);
}
So instead of creating and returning new integer objects, Integer.valueOf() the method returns Integer objects from the internal IntegerCache if the passed int literal is greater than -128 and less than 127.
Java caches these integer objects because this range of integers gets used a lot in day to day programming which indirectly saves some memory.
The cache is initialized on the first usage when the class gets loaded into memory because of the static block. The max range of the cache can be controlled by the -XX:AutoBoxCacheMax JVM option.
This caching behaviour is not applicable for Integer objects only, similar to Integer.IntegerCache we also have ByteCache, ShortCache, LongCache, CharacterCache for Byte, Short, Long, Character respectively.
You can read more on my article Java Integer Cache - Why Integer.valueOf(127) == Integer.valueOf(127) Is True.

In Java the boxing works in the range between -128 and 127 for an Integer. When you are using numbers in this range you can compare it with the == operator. For Integer objects outside the range you have to use equals.

If we check the source code of the Integer class, we can find the source of the valueOf method just like this:
public static Integer valueOf(int i) {
if (i >= IntegerCache.low && i <= IntegerCache.high)
return IntegerCache.cache[i + (-IntegerCache.low)];
return new Integer(i);
}
This explains why Integer objects, which are in the range from -128 (Integer.low) to 127 (Integer.high), are the same referenced objects during the autoboxing. And we can see there is a class IntegerCache that takes care of the Integer cache array, which is a private static inner class of the Integer class.
There is another interesting example that may help to understand this weird situation:
public static void main(String[] args) throws ReflectiveOperationException {
Class cache = Integer.class.getDeclaredClasses()[0];
Field myCache = cache.getDeclaredField("cache");
myCache.setAccessible(true);
Integer[] newCache = (Integer[]) myCache.get(cache);
newCache[132] = newCache[133];
Integer a = 2;
Integer b = a + a;
System.out.printf("%d + %d = %d", a, a, b); // The output is: 2 + 2 = 5
}

In Java 5, a new feature was introduced to save the memory and improve performance for Integer type objects handlings. Integer objects are cached internally and reused via the same referenced objects.
This is applicable for Integer values in range between –127 to +127
(Max Integer value).
This Integer caching works only on autoboxing. Integer objects will
not be cached when they are built using the constructor.
For more detail pls go through below Link:
Integer Cache in Detail

Class Integer contains cache of values between -128 and 127, as it required by JLS 5.1.7. Boxing Conversion. So when you use the == to check the equality of two Integers in this range, you get the same cached value, and if you compare two Integers out of this range, you get two diferent values.
You can increase the cache upper bound by changing the JVM parameters:
-XX:AutoBoxCacheMax=<cache_max_value>
or
-Djava.lang.Integer.IntegerCache.high=<cache_max_value>
See inner IntegerCache class:
/**
* Cache to support the object identity semantics of autoboxing for values
* between -128 and 127 (inclusive) as required by JLS.
*
* The cache is initialized on first usage. The size of the cache
* may be controlled by the {#code -XX:AutoBoxCacheMax=<size>} option.
* During VM initialization, java.lang.Integer.IntegerCache.high property
* may be set and saved in the private system properties in the
* sun.misc.VM class.
*/
private static class IntegerCache {
static final int low = -128;
static final int high;
static final Integer cache[];
static {
// high value may be configured by property
int h = 127;
String integerCacheHighPropValue =
sun.misc.VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
if (integerCacheHighPropValue != null) {
try {
int i = parseInt(integerCacheHighPropValue);
i = Math.max(i, 127);
// Maximum array size is Integer.MAX_VALUE
h = Math.min(i, Integer.MAX_VALUE - (-low) -1);
} catch( NumberFormatException nfe) {
// If the property cannot be parsed into an int, ignore it.
}
}
high = h;
cache = new Integer[(high - low) + 1];
int j = low;
for(int k = 0; k < cache.length; k++)
cache[k] = new Integer(j++);
// range [-128, 127] must be interned (JLS7 5.1.7)
assert IntegerCache.high >= 127;
}
private IntegerCache() {}
}

Is the performance/memory benefit of short nullified by downcasting?

I'm writing a large scale application where I'm trying to conserve as much memory as possible as well as boost performance. As such, when I have a field that I know is only going to have values from 0 - 10 or from -100 - 100, I try to use the short data type instead of int.
What this means for the rest of the code, though, is that all over the place when I call these functions, I have to downcast simple ints into shorts. For example:
Method Signature
public void coordinates(short x, short y) ...
Method Call
obj.coordinates((short) 1, (short) 2);
It's like that all throughout my code because the literals are treated as ints and aren't being automatically downcast or typed based on the function parameters.
As such, is any performance or memory gain actually significant once this downcasting occurs? Or is the conversion process so efficient that I can still pick up some gains?

There is no performance benefit of using short versus int on 32-bit platforms, in all but the case of short[] versus int[] - and even then the cons usually outweigh the pros.
Assuming you're running on either x64, x86 or ARM-32:
When in use, 16-bit SHORTs are stored in integer registers which are either 32-bit or 64-bits long, just the same as ints. I.e. when the short is in use, you gain no memory or performance benefit versus an int.
When on the stack, 16-bit SHORTs are stored in 32-bit or 64-bit "slots" in order to keep the stack aligned (just like ints). You gain no performance or memory benefit from using SHORTs versus INTs for local variables.
When being passed as parameters, SHORTs are auto-widened to 32-bit or 64-bit when they are pushed on the stack (unlike ints which are just pushed). Your code here is actually slightly less performance and has a slightly bigger (code) memory footprint than if you used ints.
When storing global (static) variables, these variables are automatically expanded to take up 32-bit or 64-bit slots to guarantee alignment of pointers (references). This means you get no performance or memory benefit for using SHORTs versus INTs for global (static) variables.
When storing fields, these live in a structure in heap memory that maps to the layout of the class. In this class, fields are automatically padded to 32-bit or 64-bit to maintain the alignment of fields on the heap. You get no performance or memory benefit by using SHORTs for fields versus INTs.
The only benefit you'll ever see for using SHORTs versus INTs is in the case where you allocate an array of them. In this case, an array of N shorts is roughly half as long as an array of N ints.
Other than the performance benefit caused by having variables in a hot loop together in the case of complex but localized math within a large array of shorts, you'll never see a benefit for using SHORTS versus INTs.
In ALL other cases - such as shorts being used for fields, globals, parameters and locals, other than the number of bits that it can store, there is no difference between a SHORT and an INT.
My advice as always is to recommend that before making your code more difficult to read, and more artificially restricted, try BENCHMARKING your code to see where the memory and CPU bottlenecks are, and then tackle those.
I strongly suspect that if you ever come across the case where your app is suffering from use of ints rather than shorts, then you'll have long since ditched Java for a less memory/CPU hungry runtime anyway, so doing all of this work upfront is wasted effort.

As far as I can see, the casts per se should have no runtime costs (whether using short instead of int actually improves performance is debatable, and depends on the specifics of your application).
Consider the following:
public class Main {
public static void f(short x, short y) {
}
public static void main(String args[]) {
final short x = 1;
final short y = 2;
f(x, y);
f((short)1, (short)2);
}
}
The last two lines of main() compile to:
// f(x, y)
4: iconst_1
5: iconst_2
6: invokestatic #21 // Method f:(SS)V
// f((short)1, (short)2);
9: iconst_1
10: iconst_2
11: invokestatic #21 // Method f:(SS)V
As you can see, they are identical. The casts happen at compile time.

The type casting from int literal to short occurs at compile time and has no runtime performance impact.

You need a way to check the effect of your type choices on the memory use. If short vs. int in a given situation is going to gain performance through lower memory footprint, the effect on memory should be measurable.
Here is a simple method for measuring the amount of memory in use:
private static long inUseMemory() {
Runtime rt = Runtime.getRuntime();
rt.gc();
final long memory = rt.totalMemory() - rt.freeMemory();
return memory;
}
I am also including an example of a program using that method to check memory use in some common situations. The memory increase for allocating an array of a million shorts confirms that short arrays use two bytes per element. The memory increases for the various object arrays indicate that changing the type of one or two fields makes little difference.
Here is the output from one run. YMMV.
Before short[1000000] allocation: In use: 162608 Change 162608
After short[1000000] allocation: In use: 2162808 Change 2000200
After TwoShorts[1000000] allocation: In use: 34266200 Change 32103392
After NoShorts[1000000] allocation: In use: 58162560 Change 23896360
After TwoInts[1000000] allocation: In use: 90265920 Change 32103360
Dummy to keep arrays live -378899459
The rest of this article is the program source:
public class Test {
private static int BIG = 1000000;
private static long oldMemory = 0;
public static void main(String[] args) {
short[] megaShort;
NoShorts[] megaNoShorts;
TwoShorts[] megaTwoShorts;
TwoInts[] megaTwoInts;
System.out.println("Before short[" + BIG + "] allocation: "
+ memoryReport());
megaShort = new short[BIG];
System.out
.println("After short[" + BIG + "] allocation: " + memoryReport());
megaTwoShorts = new TwoShorts[BIG];
for (int i = 0; i < BIG; i++) {
megaTwoShorts[i] = new TwoShorts();
}
System.out.println("After TwoShorts[" + BIG + "] allocation: "
+ memoryReport());
megaNoShorts = new NoShorts[BIG];
for (int i = 0; i < BIG; i++) {
megaNoShorts[i] = new NoShorts();
}
System.out.println("After NoShorts[" + BIG + "] allocation: "
+ memoryReport());
megaTwoInts = new TwoInts[BIG];
for (int i = 0; i < BIG; i++) {
megaTwoInts[i] = new TwoInts();
}
System.out.println("After TwoInts[" + BIG + "] allocation: "
+ memoryReport());
System.out.println("Dummy to keep arrays live "
+ (megaShort[0] + megaTwoShorts[0].hashCode() + megaNoShorts[0]
.hashCode() + megaTwoInts[0].hashCode()));
}
private static long inUseMemory() {
Runtime rt = Runtime.getRuntime();
rt.gc();
final long memory = rt.totalMemory() - rt.freeMemory();
return memory;
}
private static String memoryReport() {
long newMemory = inUseMemory();
String result = "In use: " + newMemory + " Change "
+ (newMemory - oldMemory);
oldMemory = newMemory;
return result;
}
}
class NoShorts {
//char a, b, c;
}
class TwoShorts {
//char a, b, c;
short s, t;
}
class TwoInts {
//char a, b, c;
int s, t;
}

First I want to confirm the memory savings as I saw some doubts raised. Per the documentation of short in tutorial here : http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
short: The short data type is a 16-bit signed two's complement integer. It has a minimum value of -32,768 and a maximum value of 32,767 (inclusive). As with byte, the same guidelines apply: you can use a short to save memory in large arrays, in situations where the memory savings actually matters.
By using short You do save the memory in large arrays(hope that is the case) hence its good idea to use it.
Now to your question:
Is the performance/memory benefit of short nullified by downcasting?
Short Answer is NO. Down casting from int to short happens at compilation time itself hence no down impact from performance perspective, but since you are saving memory, it may result in the better performance in memory threshold scenarios.

Switch to BigInteger if necessary

I am reading a text file which contains numbers in the range [1, 10^100]. I am then performing a sequence of arithmetic operations on each number. I would like to use a BigInteger only if the number is out of the int/long range. One approach would be to count how many digits there are in the string and switch to BigInteger if there are too many. Otherwise I'd just use primitive arithmetic as it is faster. Is there a better way?
Is there any reason why Java could not do this automatically i.e. switch to BigInteger if an int was too small? This way we would not have to worry about overflows.

I suspect the decision to use primitive values for integers and reals (done for performance reasons) made that option not possible. Note that Python and Ruby both do what you ask.
In this case it may be more work to handle the smaller special case than it is worth (you need some custom class to handle the two cases), and you should just use BigInteger.

Is there any reason why Java could not do this automatically i.e. switch to BigInteger if an int was too small?
Because that is a higher level programming behavior than what Java currently is. The language is not even aware of the BigInteger class and what it does (i.e. it's not in JLS). It's only aware of Integer (among other things) for boxing and unboxing purposes.
Speaking of boxing/unboxing, an int is a primitive type; BigInteger is a reference type. You can't have a variable that can hold values of both types.

You could read the values into BigIntegers, and then convert them to longs if they're small enough.
private final BigInteger LONG_MAX = BigInteger.valueOf(Long.MAX_VALUE);
private static List<BigInteger> readAndProcess(BufferedReader rd) throws IOException {
List<BigInteger> result = new ArrayList<BigInteger>();
for (String line; (line = rd.readLine()) != null; ) {
BigInteger bignum = new BigInteger(line);
if (bignum.compareTo(LONG_MAX) > 0) // doesn't fit in a long
result.add(bignumCalculation(bignum));
else result.add(BigInteger.valueOf(primitiveCalculation(bignum.longValue())));
}
return result;
}
private BigInteger bignumCalculation(BigInteger value) {
// perform the calculation
}
private long primitiveCalculation(long value) {
// perform the calculation
}
(You could make the return value a List<Number> and have it a mixed collection of BigInteger and Long objects, but that wouldn't look very nice and wouldn't improve performance by a lot.)
The performance may be better if a large amount of the numbers in the file are small enough to fit in a long (depending on the complexity of calculation). There's still risk for overflow depending on what you do in primitiveCalculation, and you've now repeated the code, (at least) doubling the bug potential, so you'll have to decide if the performance gain really is worth it.
If your code is anything like my example, though, you'd probably have more to gain by parallelizing the code so the calculations and the I/O aren't performed on the same thread - you'd have to do some pretty heavy calculations for an architecture like that to be CPU-bound.

The impact of using BigDecimals when something smaller will suffice is surprisingly, err, big: Running the following code
public static class MyLong {
private long l;
public MyLong(long l) { this.l = l; }
public void add(MyLong l2) { l += l2.l; }
}
public static void main(String[] args) throws Exception {
// generate lots of random numbers
long ls[] = new long[100000];
BigDecimal bds[] = new BigDecimal[100000];
MyLong mls[] = new MyLong[100000];
Random r = new Random();
for (int i=0; i<ls.length; i++) {
long n = r.nextLong();
ls[i] = n;
bds[i] = new BigDecimal(n);
mls[i] = new MyLong(n);
}
// time with longs & Bigints
long t0 = System.currentTimeMillis();
for (int j=0; j<1000; j++) for (int i=0; i<ls.length-1; i++) {
ls[i] += ls[i+1];
}
long t1 = Math.max(t0 + 1, System.currentTimeMillis());
for (int j=0; j<1000; j++) for (int i=0; i<ls.length-1; i++) {
bds[i].add(bds[i+1]);
}
long t2 = System.currentTimeMillis();
for (int j=0; j<1000; j++) for (int i=0; i<ls.length-1; i++) {
mls[i].add(mls[i+1]);
}
long t3 = System.currentTimeMillis();
// compare times
t3 -= t2;
t2 -= t1;
t1 -= t0;
DecimalFormat df = new DecimalFormat("0.00");
System.err.println("long: " + t1 + "ms, bigd: " + t2 + "ms, x"
+ df.format(t2*1.0/t1) + " more, mylong: " + t3 + "ms, x"
+ df.format(t3*1.0/t1) + " more");
}
produces, on my system, this output:
long: 375ms, bigd: 6296ms, x16.79 more, mylong: 516ms, x1.38 more
The MyLong class is there only to look at the effects of boxing, to compare against what you would get with a custom BigOrLong class.

Java is Fast--really really Fast. It's only 2-4x slower than c and sometimes as fast or a tad faster where most other languages are 10x (python) to 100x (ruby) slower than C/Java. (Fortran is also hella-fast, by the way)
Part of this is because it doesn't do things like switch number types for you. It could, but currently it can inline an operation like "a*5" in just a few bytes, imagine the hoops it would have to go through if a was an object. It would at least be a dynamic call to a's multiply method which would be a few hundred / thousand times slower than it was when a was simply an integer value.
Java probably could, these days, actually use JIT compiling to optimize the call better and inline it at runtime, but even then very few library calls support BigInteger/BigDecimal so there would be a LOT of native support, it would be a completely new language.
Also imagine how switching from int to BigInteger instead of long would make debugging video games crazy-hard! (Yeah, every time we move to the right side of the screen the game slows down by 50x, the code is all the same! How is this possible?!??)

Would it have been possible? Yes. But there are many problems with it.
Consider, for instance, that Java stores references to BigInteger, which is actually allocated on the heap, but store int literals. The difference can be made clear in C:
int i;
BigInt* bi;
Now, to automatically go from a literal to a reference, one would necessarily have to annotate the literal somehow. For instance, if the highest bit of the int was set, then the other bits could be used as a table lookup of some sort to retrieve the proper reference. That also means you'd get a BigInt** bi whenever it overflowed into that.
Of course, that's the bit usually used for sign, and hardware instructions pretty much depend on it. Worse still, if we do that, then the hardware won't be able to detect overflow and set the flags to indicate it. As a result, each operation would have to be accompanied by some test to see if and overflow has happened or will happen (depending on when it can be detected).
All that would add a lot of overhead to basic integer arithmetic, which would in practice negate any benefits you had to begin with. In other words, it is faster to assume BigInt than it is to try to use int and detect overflow conditions while at the same time juggling with the reference/literal problem.
So, to get any real advantage, one would have to use more space to represent ints. So instead of storing 32 bits in the stack, in the objects, or anywhere else we use them, we store 64 bits, for example, and use the additional 32 bits to control whether we want a reference or a literal. That could work, but there's an obvious problem with it -- space usage. :-) We might see more of it with 64 bits hardware, though.
Now, you might ask why not just 40 bits (32 bits + 1 byte) instead of 64? Basically, on modern hardware it is preferable to store stuff in 32 bits increments for performance reasons, so we'll be padding 40 bits to 64 bits anyway.
EDIT
Let's consider how one could go about doing this in C#. Now, I have no programming experience with C#, so I can't write the code to do it, but I expect I can give an overview.
The idea is to create a struct for it. It should look roughly like this:
public struct MixedInt
{
private int i;
private System.Numeric.BigInteger bi;
public MixedInt(string s)
{
bi = BigInteger.Parse(s);
if (parsed <= int.MaxValue && parsed => int.MinValue)
{
i = (int32) parsed;
bi = 0;
}
}
// Define all required operations
}
So, if the number is in the integer range we use int, otherwise we use BigInteger. The operations have to ensure transition from one to another as required/possible. From the client point of view, this is transparent. It's just one type MixedInt, and the class takes care of using whatever fits better.
Note, however, that this kind of optimization may well be part of C#'s BigInteger already, given it's implementation as a struct.
If Java had something like C#'s struct, we could do something like this in Java as well.

Is there any reason why Java could not
do this automatically i.e. switch to
BigInteger if an int was too small?
This is one of the advantage of dynamic typing, but Java is statically typed and prevents this.
In a dynamically type language when two Integer which are summed together would produce an overflow, the system is free to return, say, a Long. Because dynamically typed language rely on duck typing, it's fine. The same can not happen in a statically typed language; it would break the type system.
EDIT
Given that my answer and comment was not clear, here I try to provide more details why I think that static typing is the main issue:
1) the very fact that we speak of primitive type is a static typing issue; we wouldn't care in a dynamically type language.
2) with primitive types, the result of the overflow can not be converted to another type than an int because it would not be correct w.r.t static typing
int i = Integer.MAX_VALUE + 1; // -2147483648
3) with reference types, it's the same except that we have autoboxing. Still, the addition could not return, say, a BigInteger because it would not match the static type sytem (A BigInteger can not be casted to Integer).
Integer j = new Integer( Integer.MAX_VALUE ) + 1; // -2147483648
4) what could be done is to subclass, say, Number and implement at type UnboundedNumeric that optimizes the representation internally (representation independence).
UnboundedNum k = new UnboundedNum( Integer.MAX_VALUE ).add( 1 ); // 2147483648
Still, it's not really the answer to the original question.
5) with dynamic typing, something like
var d = new Integer( Integer.MAX_VALUE ) + 1; // 2147483648
would return a Long which is ok.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.