Java int memory usage - java

While I was thinking over the memory usage of various types, I started to become a bit confused of how Java utilizes memory for integers when passed to a method.
Say, I had the following code:
public static void main (String[] args){
int i = 4;
addUp(i);
}
public static int addUp(int i){
if(i == 0) return 0;
else return addUp(i - 1);
}
In this following example, I am wondering if my following logic was correct:
I have made a memory initially for integer i = 4. Then I pass it to a method. However, since primitives are not pointed in Java, in the addUp(i == 4), I create another integer i = 4. Then afterwards, there is another addUp(i == 3), addUp(i == 2), addUp(i == 1), addUp(i == 0) in which each time, since the value is not pointed, a new i value is allocated in the memory.
Then for a single "int i" value, I have used 6 integer value memories.
However, if I were to always pass it through an array:
public static void main (String[] args){
int[] i = {4};
// int tempI = i[0];
addUp(i);
}
public static int addUp(int[] i){
if(i[0] == 0) return 0;
else return addUp(i[0] = i[0] - 1);
}
- Since I create an integer array of size 1 and then pass that to addUp which will again be passed for addUp(i[0] == 3), addUp(i[0] == 2), addUp(i[0] == 1), addUp(i[0] == 0), I have only had to use 1 integer array memory space and hence far more cost efficient. In addition, if I were to make a int value beforehand to store the initial value of i[0], I still have my "original" value.
Then this leads me to the question, why do people pass primitives like int in Java methods? Isn't it far more memory efficient to just pass the array values of those primitives? Or is the first example somehow still just O(1) memory?
And on top of this question, I just wonder the memory differences of using int[] and int especially for a size of 1. Thank you in advance. I was simply wondering being more memory efficient with Java and this came to my head.
Thanks for all the answers! I'm just now quickly wondering if I were to "analyze" big-oh memory of each code, would they both be considered O(1) or would that be wrong to assume?

What you are missing here: the int values in your example go on the stack, not on the heap.
And it is much less overhead to deal with fixed size primitive values existing on the stack - compared to objects on the heap!
In other words: using a "pointer" means that you have to create a new object on the heap. All objects live on the heap; there is no stack for arrays! And objects becomes subject to garbage collection immediately after you stopped using them. Stacks on the other hand come and go as you invoke methods!
Beyond that: keep in mind that the abstractions that programming languages provide to us are created to help us writing code that is easy to read, understand and maintain. Your approach is basically to do some sort of fine tuning that leads to more complicated code. And that is not how Java solves such problems.
Meaning: with Java, the real "performance magic" happens at runtime, when the just-in-time compiler kicks in! You see, the JIT can inline calls to small methods when the method is invoked "often enough". And then it becomes even more important to keep data "close" together. As in: when data lives on the heap, you might have to access memory to get a value. Whereas items living on the stack - might still be "close" (as in: in the processor cache). So your little idea to optimize memory usage could actually slow down program execution by orders of magnitude. Because even today, there are orders of magnitude between accessing the processor cache and reading main memory.
Long story short: avoid getting into such "micro-tuning" for either performance or memory usage: the JVM is optimized for the "normal, typical" use cases. Your attempts to introduce clever work-arounds can therefore easily result in "less good" results.
So - when you worry about performance: do what everybody else is doing. And if you one really care - then learn how the JVM works. As it turns out that even my knowledge is slightly outdated - as the comments imply that a JIT can inline objects on the stack. In that sense: focus on writing clean, elegant code that solves the problem in straight forward way!
Finally: this is subject to change at some point. There are ideas to introduce true value value objects to java. Which basically live on the stack, not the heap. But don't expect that to happen before Java10. Or 11. Or ... (I think this would be relevant here).

Several things:
First thing will be splitting hairs, but when you pass an int in java you are allocating 4 bytes onto the stack, and when you pass an array (because it is a reference) you are actually allocating 8 bytes (assuming an x64 architecture) onto the stack, plus the additional 4 bytes that store the int into the heap.
More importantly, the data that lives in the array is allocated into the heap, whereas the reference to the array itself is allocated onto the stack, when passing an integer there is no heap allocation required the primitive is only allocated into the stack. Over time reducing the heap allocations will mean that the garbage collector will have fewer things to clean up. Whereas the cleanup of stack-frames is trivial and doesn't require additional processing.
However, this is all moot (imho) because in practice when you have complicated collections of variables and objects you are likely going to end up grouping them together into a class. In general, you should be writing to promote readability and maintainability rather than trying to squeeze every last drop of performance out of the JVM. The JVM is pretty quick as it is, and there is always Moore's Law as a backstop.
It would be difficult to analyze the the Big-O for each because in order to get a true picture you would have to factor in the behavior of the garbage collector and that behavior is highly dependent on both the JVM itself and any runtime (JIT) optimizations that the JVM has made to your code.
Please remember Donald Knuth's wise words that "premature optimization is the root of all evil"
Write code that avoids micro-tuning, code that promotes readability and maintainability will fare better over the long run.

If your assumption is that arguments passed to functions necessarily consume memory (which is false by the way), then in your second example that passes an array, a copy of the reference to the array is made. That reference may actually be larger than an int, it's unlikely to be smaller.

Whether these methods take O(1) or O(N) depends on the compiler. (Here N is the value of i or i[0], depending.) If the compiler uses tail-recursion optimization then the stack space for the parameters, local variables, and return address can be reused and the implementation will then be O(1) for space. Absent tail-recursion optimization the space complexity is the same as the time complexity, O(N).
Basically tail-recursion optimization amounts (in this case) to the compiler rewriting your code as
public static int addUp(int i){
while(i != 0) i = i-1 ;
return 0;
}
or
public static int addUp(int[] i){
while(i[0] != 0) i[0] = i[0] - 1 ;
return 0 ;
}
A good optimizer might further optimize away the loops.
As far as I know, no Java compilers implement tail-recursion optimization at present, but there is no technical reason that it can't be done in many cases.

Actually, when you pass an array as a parameter to a method - a reference to this array is passed under the hood. The array itself is stored on the heap. And the reference can be 4 or 8 bytes in size (depending on CPU architecture, JVM implementation, etc.; even more, JLS doesn't say anything about how big a reference is in memory).
On the other hand, primitive int value always consumes only 4 bytes and resides on the stack.

When you pass an array, the content of the array may be modified by the method that receives the array. When you pass int primitives, those primitives may not be modified by the method that receives them. That's why sometimes you may use primitives and sometimes arrays.
Also in general, in Java programming you tend to favor readability and let this kind of memory optimizations be done by the JIT compiler.

The int array reference actually takes up more space in the stack frames than an int primitive (8 bytes vs 4). You're actually using more space.
But I think the primary reason people prefer the first way is because it's clearer and more legible.
People actually do do things a lot closer to the second when more ints are involved.

Related

Helping the JVM with stack allocation by using separate objects

I have a bottleneck method which attempts to add points (as x-y pairs) to a HashSet. The common case is that the set already contains the point in which case nothing happens. Should I use a separate point for adding from the one I use for checking if the set already contains it? It seems this would allow the JVM to allocate the checking-point on stack. Thus in the common case, this will require no heap allocation.
Ex. I'm considering changing
HashSet<Point> set;
public void addPoint(int x, int y) {
if(set.add(new Point(x,y))) {
//Do some stuff
}
}
to
HashSet<Point> set;
public void addPoint(int x, int y){
if(!set.contains(new Point(x,y))) {
set.add(new Point(x,y));
//Do some stuff
}
}
Is there a profiler which will tell me whether objects are allocated on heap or stack?
EDIT: To clarify why I think the second might be faster, in the first case the object may or may not be added to the collection, so it's not non-escaping and cannot be optimized. In the second case, the first object allocated is clearly non-escaping so it can be optimized by the JVM and put on stack. The second allocation only occurs in the rare case where it's not already contained.
Marko Topolnik properly answered your question; the space allocated for the first new Point may or may not be immediately freed and it is probably foolish to bank on it happening. But I want to expand on why you're currently in a deep state of sin:
You're trying to optimise this the wrong way.
You've identified object creation to be the bottleneck here. I'm going to assume that you're right about this. You're hoping that, if you create fewer objects, the code will run faster. That might be true, but it will never run very fast as you've designed it.
Every object in Java has a pretty fat header (16 bytes; an 8-byte "mark word" full of bit fields and an 8-byte pointer to the class type) and, depending on what's happened in your program thus far, possibly another pretty fat trailer. Your HashSet isn't storing just the contents of your objects; it's storing pointers to those fat-headers-followed-by-contents. (Actually, it's storing pointers to Entry classes that themselves store pointers to Points. Two levels of indirection there.)
A HashSet lookup, then, figures out which bucket it needs to look at and then chases one pointer per thing in the bucket to do the comparison. (As one great big chain in series.) There probably aren't very many of these objects, but they almost certainly aren't stored close together, making your cache angry. Note that object allocation in Java is extremely cheap---you just increment a pointer---and that this is quite probably a bigger source of slowness.
Java doesn't provide any abstraction like C++'s templates, so the only real way to make this fast and still provide the Set abstraction is to copy HashSet's code, change all of the data structures to represent your objects inline, modify the methods to work with the new data structures, and, if you're still worried, make copies of the relevant methods that take a list of parameters corresponding to object contents (i.e. contains(int, int)) that do the right thing without constructing a new object.
This approach is error-prone and time-consuming, but it's necessary unfortunately often when working on Java projects where performance matters. Take a look at the Trove library Marko mentioned and see if you can use it instead; Trove did exactly this for the primitive types.
With that out of the way, a monomorphic call site is one where only one method is called. Hotspot aggressively inlines calls from monomorphic call sites. You'll notice that HashSet.contains punts to HashMap.containsKey. You'd better pray for HashMap.containsKey to be inlined since you need the hashCode call and equals calls inside to be monomorphic. You can verify that your code is being compiled nicely by using the -XX:+PrintAssembly option and poring over the output, but it's probably not---and even if it is, it's probably still slow because of what a HashSet is.
As soon as you have written new Point(x,y), you are creating a new object. It may happen not to be placed on the heap, but that's just a bet you can lose. For example, the contains call should be inlined for the escape analysis to work, or at least it should be a monomorphic call site. All this means that you are optimizing against a quite erratic performance model.
If you want to avoid allocation the solid way, you can use Trove library's TLongHashSet and have your (int,int) pairs encoded as single long values.

What is the reason for BitSet's size() method?

Is there a use case for the size() method on the java.util.BitSet class?
I mean - the JavaDoc clearly says it's implementation dependant, it returns the size of the internal long[] storage in bits. From what it says, one could conclude that you won't be able to set a bit with a higher index than size(), but that's not true, the BitSet can grow automatically:
BitSet myBitSet = new BitSet();
System.out.println(myBitSet.size()); // prints "64"
myBitSet.set(768);
System.out.println(myBitSet.size()); // prints "832"
In every single encounter with BitSet I have had in my life, I always wanted to use length() since that one returns the logical size of the BitSet:
BitSet myBitSet = new BitSet();
System.out.println(myBitSet.length()); // prints "0"
myBitSet.set(768);
System.out.println(myBitSet.length()); // prints "769"
Even though I have been programming Java for the last 6 years, the two methods are always highly confusing for me. I often mix them up and use the wrong one incidentally, because in my head, I think of BitSet as a clever Set<boolean> where I'd use size().
It's like if ArrayList had length() returning the number of elements and size() returning the size of the underlying array.
Now, is there any use case for the size() method I am missing? Is it useful in any way? Has anyone ever used it for anything? Might it be important for some manual bit twiddling or something similar?
EDIT (after some more research)
I realized BitSet was introduced in Java 1.0 while the Collections framework with most of the classes we use was introduced in Java 1.2. So basically it seems to me that size() is kept because of legacy reasons and there's no real use for it. The new Collection classes don't have such methods, while some of the old ones (Vector, for example) do.
I realized BitSet was introduced in Java 1.0 while the Collections framework with most of the classes we use was introduced in Java 1.2.
Correct.
So basically it seems to me that size() is kept because of legacy reasons and there's no real use for it.
Yes, pretty much.
The other "size" method is length() which gives you the largest index at which a bit is set. From a logical perspective, length() is more useful than size() ... but length() was only introduced in Java 1.2.
The only (hypothetical) use-case I can think of where size() might be better than length() is when:
you are trying to establish a "fence post" for an iteration of the bits in the set, and
it is highly likely that you will stop iterating well before the end, and
it doesn't matter is you go a little bit beyond the last bit that is set.
In that case, size() is arguably better than length() because it is a cheaper call. (Look at the source code ...) But that's pretty marginal.
(I guess, another use-case along similar lines is when you are creating a new BitSet and preallocating it based on the size() of an existing BitSet. Again, the difference is marginal.)
But you are right about compatibility. It is clear that they could not either get rid of size() or change its semantics without creating compatibility problems. So they presumably decided to leave it alone. (Indeed, they didn't even see the need to deprecate it. The "harm" in having a not-particularly-useful method in the API is minimal.)
If the size method wasn't designed by Java creators as public, it would still undoubtedly exist as a private method/field. So we are discussing its accessibility and maybe naming.
Java 1.0 took a lot of inspiration, not just the procedural syntax, from C/C++. In the C++ standard library, the counterparts to BitSet's length and size also exist. They are called there size and capacity, respectively. There is rarely any hard reason to use capacity in C++, and even less so in a garbage collected language such as Java, but having the method accessible is still arguably useful. I will explain in Java terms.
Tell me, what is the maximum number of machine instructions ever needed for executing a BitSet operation such as set? One would like to answer "just a handful", but this is only true if that particular operation does not result in reallocation of the whole underlying array. Theoretically, the reallocations turn a constant time algorithm into a linear time one.
Does this theoretical difference have much practical impact? Rarely. The array usually doesn't grow too often. However, whenever you have an algorithm operating over a gradually growing BitSet with an approximately known final size, you will save on reallocations if you pass the final size already to the BitSet's constructor. In some very special circumstances this may even have a noticeable effect, in most circumstances it does not hurt.
set then has constant time complexity - calling it cannot ever block the application for too long.
if just one extremely large BitSet instance is using up all your available memory (by design), swapping may start noticeably later dependending on how your JVM implements the growth operation (with or without an extra copy).
Now imagine that you operate on many BitSets, all of which have been allocated with a target size. You are constructing one BitSet instance from another and you want the new one share the old one's target size as you know you will be using them side by side. Having the size method public makes this easier to implement cleanly.
It is the number of 0 and 1s which has to be a multiple of 64. You could use the cardinality() for the number of 1s.
One of the main reason i think it may be useful is when we need to extend the BitSet class and override the length method. In that case, the size is useful. below is how length returns value with dependancy on size method.
protected Set bitset;
public int length() {
int returnValue = 0;
// Make sure set not empty
// Get maximum value +1
if (bitset.size() > 0) {
Integer max = (Integer)Collections.max(bitset);
returnValue = max.intValue()+1;
}
return returnValue;
}

Are arrays of 'structs' theoretically possible in Java?

There are cases when one needs a memory efficient to store lots of objects. To do that in Java you are forced to use several primitive arrays (see below why) or a big byte array which produces a bit CPU overhead for converting.
Example: you have a class Point { float x; float y;}. Now you want to store N points in an array which would take at least N * 8 bytes for the floats and N * 4 bytes for the reference on a 32bit JVM. So at least 1/3 is garbage (not counting in the normal object overhead here). But if you would store this in two float arrays all would be fine.
My question: Why does Java not optimize the memory usage for arrays of references? I mean why not directly embed the object in the array like it is done in C++?
E.g. marking the class Point final should be sufficient for the JVM to see the maximum length of the data for the Point class. Or where would this be against the specification? Also this would save a lot of memory when handling large n-dimensional matrices etc
Update:
I would like to know wether the JVM could theoretically optimize it (e.g. behind the scene) and under which conditions - not wether I can force the JVM somehow. I think the second point of the conclusion is the reason it cannot be done easily if at all.
Conclusions what the JVM would need to know:
The class needs to be final to let the JVM guess the length of one array entry
The array needs to be read only. Of course you can change the values like Point p = arr[i]; p.setX(i) but you cannot write to the array via inlineArr[i] = new Point(). Or the JVM would have to introduce copy semantics which would be against the "Java way". See aroth's answer
How to initialize the array (calling default constructor or leaving the members intialized to their default values)
Java doesn't provide a way to do this because it's not a language-level choice to make. C, C++, and the like expose ways to do this because they are system-level programming languages where you are expected to know system-level features and make decisions based on the specific architecture that you are using.
In Java, you are targeting the JVM. The JVM doesn't specify whether or not this is permissible (I'm making an assumption that this is true; I haven't combed the JLS thoroughly to prove that I'm right here). The idea is that when you write Java code, you trust the JIT to make intelligent decisions. That is where the reference types could be folded into an array or the like. So the "Java way" here would be that you cannot specify if it happens or not, but if the JIT can make that optimization and improve performance it could and should.
I am not sure whether this optimization in particular is implemented, but I do know that similar ones are: for example, objects allocated with new are conceptually on the "heap", but if the JVM notices (through a technique called escape analysis) that the object is method-local it can allocate the fields of the object on the stack or even directly in CPU registers, removing the "heap allocation" overhead entirely with no language change.
Update for updated question
If the question is "can this be done at all", I think the answer is yes. There are a few corner cases (such as null pointers) but you should be able to work around them. For null references, the JVM could convince itself that there will never be null elements, or keep a bit vector as mentioned previously. Both of these techniques would likely be predicated on escape analysis showing that the array reference never leaves the method, as I can see the bookkeeping becoming tricky if you try to e.g. store it in an object field.
The scenario you describe might save on memory (though in practice I'm not sure it would even do that), but it probably would add a fair bit of computational overhead when actually placing an object into an array. Consider that when you do new Point() the object you create is dynamically allocated on the heap. So if you allocate 100 Point instances by calling new Point() there is no guarantee that their locations will be contiguous in memory (and in fact they will most likely not be allocated to a contiguous block of memory).
So how would a Point instance actually make it into the "compressed" array? It seems to me that Java would have to explicitly copy every field in Point into the contiguous block of memory that was allocated for the array. That could become costly for object types that have many fields. Not only that, but the original Point instance is still taking up space on the heap, as well as inside of the array. So unless it gets immediately garbage-collected (I suppose any references could be rewritten to point at the copy that was placed in the array, thereby theoretically allowing immediate garbage-collection of the original instance) you're actually using more storage than you would be if you had just stored the reference in the array.
Moreover, what if you have multiple "compressed" arrays and a mutable object type? Inserting an object into an array necessarily copies that object's fields into the array. So if you do something like:
Point p = new Point(0, 0);
Point[] compressedA = {p}; //assuming 'p' is "optimally" stored as {0,0}
Point[] compressedB = {p}; //assuming 'p' is "optimally" stored as {0,0}
compressedA[0].setX(5)
compressedB[0].setX(1)
System.out.println(p.x);
System.out.println(compressedA[0].x);
System.out.println(compressedB[0].x);
...you would get:
0
5
1
...even though logically there should only be a single instance of Point. Storing references avoids this kind of problem, and also means that in any case where a nontrivial object is being shared between multiple arrays your total storage usage is probably lower than it would be if each array stored a copy of all of that object's fields.
Isn't this tantamount to providing trivial classes such as the following?
class Fixed {
float hiddenArr[];
Point pointArray(int position) {
return new Point(hiddenArr[position*2], hiddenArr[position*2+1]);
}
}
Also, it's possible to implement this without making the programmer explicitly state that they'd like it; the JVM is already aware of "value types" (POD types in C++); ones with only other plain-old-data types inside them. I believe HotSpot uses this information during stack elision, no reason it couldn't do it for arrays too?

Why is boxing a primitive value-type in .NET uncached, unlike Java?

Consider:
int a = 42;
// Reference equality on two boxed ints with the same value
Console.WriteLine( (object)a == (object)a ); // False
// Same thing - listed only for clarity
Console.WriteLine(ReferenceEquals(a, a)); // False
Clearly, each boxing instruction allocates a separate instance of a boxed Int32, which is why reference-equality between them fails. This page appears to indicate that this is specified behaviour:
The box instruction converts the 'raw'
(unboxed) value type into an object
reference (type O). This is
accomplished by creating a new object
and copying the data from the value
type into the newly allocated object.
But why does this have to be the case?
Is there any compelling reason why the CLR does not choose to hold a "cache" of boxed Int32s, or even stronger, common values for all primitive value-types (which are all immutable)? I know Java has something like this.
In the days of no-generics, wouldn't it have helped out a lot with reducing the memory requirements as well as GC workload for a large ArrayListconsisting mainly of small integers? I'm also sure that there exist several modern .NET applications that do use generics, but for whatever reason (reflection, interface assignments etc.), run up large boxing-allocations that could be massively reduced with (what appears to be) a simple optimization.
So what's the reason? Some performance implication I haven't considered (I doubt if testing that the item is in the cache etc. will result in a net performance loss, but what do I know)? Implementation difficulties? Issues with unsafe code? Breaking backwards compatibility (I can't think of any good reason why a well-written program should rely on the existing behaviour)? Or something else?
EDIT: What I was really suggesting was a static cache of "commonly-occurring" primitives, much like what Java does. For an example implementation, see Jon Skeet's answer. I understand that doing this for arbitrary, possibly mutable, value-types or dynamically "memoizing" instances at run-time is a completely different matter.
EDIT: Changed title for clarity.
One reason which I find compelling is consistency. As you say, Java does cache boxed values in a certain range... which means it's all too easy to write code which works for a while:
// Passes in all my tests. Shame it fails if they're > 127...
if (value1 == value2) {
// Do something
}
I've been bitten by this - admittedly in a test rather than production code, fortunately, but it's still nasty to have something which changes behaviour significantly outside a given range.
Don't forget that any conditional behaviour also incurs a cost on all boxing operations - so in cases where it wouldn't use the cache, you'd actually find that it was slower (because it would first have to check whether or not to use the cache).
If you really want to write your own caching box operation, of course, you can do so:
public static class Int32Extensions
{
private static readonly object[] BoxedIntegers = CreateCache();
private static object[] CreateCache()
{
object[] ret = new object[256];
for (int i = -128; i < 128; i++)
{
ret[i + 128] = i;
}
}
public object Box(this int i)
{
return (i >= -128 && i < 128) ? BoxedIntegers[i + 128] : (object) i;
}
}
Then use it like this:
object y = 100.Box();
object z = 100.Box();
if (y == z)
{
// Cache is working
}
I can't claim to be able to read minds, but here's a couple factors:
1) caching the value types can make for unpredictability - comparing two boxed values that are equal could be true or false depending on cache hits and implementation. Ouch!
2) The lifetime of a boxed value type is most likely short - so how long do you hold the value in cache? Now you either have a lot of cached values that will no longer be used, or you need to make the GC implementation more complicated to track the lifetime of cached value types.
With these downsides, what is the potential win? Smaller memory footprint in an application that does a lot of long-lived boxing of equal value types. Since this win is something that is going to affect a small number of applications and can be worked around by changing code, I'm going to agree with the c# spec writer's decisions here.
Boxed value objects are not necessarily immutable. It is possible to change the value in a boxed value type, such as through an interface.
So if boxing a value type always returned the same instance based on the same original value, it would create references which may not be appropriate (for example, two different value type instances which happen to have the same value end up with the same reference even though they should not).
public interface IBoxed
{
int X { get; set; }
int Y { get; set; }
}
public struct BoxMe : IBoxed
{
public int X { get; set; }
public int Y { get; set; }
}
public static void Test()
{
BoxMe original = new BoxMe()
{
X = 1,
Y = 2
};
object boxed1 = (object) original;
object boxed2 = (object) original;
((IBoxed) boxed1).X = 3;
((IBoxed) boxed1).Y = 4;
Console.WriteLine("original.X = " + original.X);
Console.WriteLine("original.Y = " + original.Y);
Console.WriteLine("boxed1.X = " + ((IBoxed)boxed1).X);
Console.WriteLine("boxed1.Y = " + ((IBoxed)boxed1).Y);
Console.WriteLine("boxed2.X = " + ((IBoxed)boxed2).X);
Console.WriteLine("boxed2.Y = " + ((IBoxed)boxed2).Y);
}
Produces this output:
original.X = 1
original.Y = 2
boxed1.X = 3
boxed1.Y = 4
boxed2.X = 1
boxed2.Y = 2
If boxing didn't create a new instance, then boxed1 and boxed2 would have the same values, which would be inappropriate if they were created from different original value type instance.
There's an easy explanation for this: un/boxing is fast. It needed to be back in the .NET 1.x days. After the JIT compiler generates the machine code for it, there's but a handful of CPU instructions generated for it, all inline without method calls. Not counting corner cases like nullable types and large structs.
The effort of looking up a cached value would greatly diminish the speed of this code.
I wouldn't think a run-time-filled cache would be a good idea, but I would think it might be reasonable on 64-bit systems, to define ~8 billion of the 64 quintillion possible objects-reference values as being integer or float literals, and on any system pre-box all primitive literals. Testing whether the upper 31 bits of a reference type hold some value should probably be cheaper than a memory reference.
Adding to the answers already listed is the fact that in .net, at least with the normal garbage collector, object references are internally stored as direct pointers. This means that when a garbage collection is performed the system has to update every single reference to every object that gets moved, but it also means that "main-line" operation can be very fast. If object references were sometimes direct pointers and sometimes something else, this would require extra code every time an object is dereferenced. Since object dereferencing is one of the most common operations during the execution of a .net program, even a 5% slowdown here would be devastating unless it was matched by an awesome speedup. It's possible, for example, a "64-bit compact" model, in which each object reference was a 32-bit index into an object table, might offer better performance than the existing model in which each reference is a 64-bit direct pointer. Deferencing operations would require an extra table lookup, which would be bad, but object references would be smaller, thus allowing more of them to be stored in the cache at once. In some circumstances, that could be a major performance win (maybe often enough to be worthwhile--maybe not). It's unclear, though, that allowing an object reference to sometimes be a direct memory pointer and sometimes be something else would really offer much advantage.

Declare an object inside or outside a loop?

Is there any performance penalty for the following code snippet?
for (int i=0; i<someValue; i++)
{
Object o = someList.get(i);
o.doSomething;
}
Or does this code actually make more sense?
Object o;
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
If in byte code these two are totally equivalent then obviously the first method looks better in terms of style, but I want to make sure this is the case.
In today's compilers, no. I declare objects in the smallest scope I can, because it's a lot more readable for the next guy.
To quote Knuth, who may be quoting Hoare:
Premature optimization is the root of all evil.
Whether the compiler will produce marginally faster code by defining the variable outside the loop is debatable, and I imagine it won't. I would guess it'll produce identical bytecode.
Compare this with the number of errors you'll likely prevent by correctly-scoping your variable using in-loop declaration...
There's no performance penalty for declaring the Object o within the loop.
The compiler generates very similar bytecode and makes the correct optimizations.
See the article Myth - Defining loop variables inside the loop is bad for performance for a similar example.
You can disassemble the code with javap -c and check what the compiler actually emits. On my setup (java 1.5/mac compiled with eclipse), the bytecode for the loop is identical.
The first code is better as it restricts scope of o variable to the for block. From a performance perspective, it might not have any effects in Java, but it might have in lower level compilers. They might put the variable in a register if you do the first.
In fact, some people might think that if the compiler is dumb, the second snippet is better in terms of performance. This is what some instructor told me at the college and I laughed at him for this suggestion! Basically, compilers allocate memory on the stack for the local variables of a method just once at the start of the method (by adjusting the stack pointer) and release it at the end of method (again by adjusting the stack pointer, assuming it's not C++ or it doesn't have any destructors to be called). So all stack-based local variables in a method are allocated at once, no matter where they are declared and how much memory they require. Actually, if the compiler is dumb, there is no difference in terms of performance, but if it's smart enough, the first code can actually be better as it'll help the compiler understand the scope and the lifetime of the variable! By the way, if it's really smart, there should no absolutely no difference in performance as it infers the actual scope.
Construction of a object using new is totally different from just declaring it, of course.
I think readability is more important that performance and from a readability standpoint, the first code is definitely better.
I've got to admit I don't know java. But are these two equivalent? Are the object lifetimes the same? In the first example, I assume (not knowing java) that o will be eligible for garbage collection immediately the loop terminates.
But in the second example surely o won't be eligible for garbage collection until the outer scope (not shown) is exited?
Don't prematurely optimize. Better than either of these is:
for(Object o : someList) {
o.doSomething();
}
because it eliminates boilerplate and clarifies intent.
Unless you are working on embedded systems, in which case all bets are off. Otherwise, don't try to outsmart the JVM.
I've always thought that most compilers these days are smart enough to do the latter option. Assuming that's the case, I would say the first one does look nicer as well. If the loop gets very large, there's no need to look all around for where o is declared.
These have different semantics. Which is more meaningful?
Reusing an object for "performance reasons" is often wrong.
The question is what does the object "mean"? WHy are you creating it? What does it represent? Objects must parallel real-world things. Things are created, undergo state changes, and report their states for reasons.
What are those reasons? How does your object model and reflect those reasons?
To get at the heart of this question... [Note that non-JVM implementations may do things differently if allowed by the JLS...]
First, keep in mind that the local variable "o" in the example is a pointer, not an actual object.
All local variables are allocated on the runtime stack in 4-byte slots. doubles and longs require two slots; other primitives and pointers take one. (Even booleans take a full slot)
A fixed runtime-stack size must be created for each method invocation. This size is determined by the maximum local variable "slots" needed at any given spot in the method.
In the above example, both versions of the code require the same maximum number of local variables for the method.
In both cases, the same bytecode will be generated, updating the same slot in the runtime stack.
In other words, no performance penalty at all.
HOWEVER, depending on the rest of the code in the method, the "declaration outside the loop" version might actually require a larger runtime stack allocation. For example, compare
for (...) { Object o = ... }
for (...) { Object o = ... }
with
Object o;
for (...) { /* loop 1 */ }
for (...) { Object x =...; }
In the first example, both loops require the same runtime stack allocation.
In the second example, because "o" lives past the loop, "x" requires an additional runtime stack slot.
Hope this helps,
-- Scott
In both cases the type info for the object o is determined at compile time.In the second instance, o is seen as being global to the for loop and in the first instance, the clever Java compiler knows that o will have to be available for as long as the loop lasts and hence will optimise the code in such a way that there wont be any respecification of o's type in each iteration.
Hence, in both cases, specification of o's type will be done once which means the only performance difference would be in the scope of o. Obviously, a narrower scope always enhances performance, therefore to answer your question: no, there is no performance penalty for the first code snip; actually, this code snip is more optimised than the second.
In the second snip, o is being given unnecessary scope which, besides being a performance issue, can be also a security issue.
The first makes far more sense. It keeps the variable in the scope that it is used in. and prevents values assigned in one iteration being used in a later iteration, this is more defensive.
The former is sometimes said to be more efficient but any reasonable compiler should be able to optimise it to be exactly the same as the latter.
As someone who maintains more code than writes code.
Version 1 is much preferred - keeping scope as local as possible is essential for understanding. Its also easier to refactor this sort of code.
As discussed above - I doubt this would make any difference in efficiency. In fact I would argue that if the scope is more local a compiler may be able to do more with it!
When using multiple threads (if your doing 50+) then i found this to be a very effective way of handling ghost thread problems:
Object one;
Object two;
Object three;
Object four;
Object five;
try{
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
}catch(e){
e.printstacktrace
}
finally{
one = null;
two = null;
three = null;
four = null;
five = null;
System.gc();
}
The answer depends partly on what the constructor does and what happens with the object after the loop, since that determines to a large extent how the code is optimized.
If the object is large or complex, absolutely declare it outside the loop. Otherwise, the people telling you not to prematurely optimize are right.
I've actually in front of me a code which looks like this:
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
So, relying on compiler abilities, I can assume there would be only one stack allocation for i and one for append. Then everything would be fine except the duplicated code.
As a side note, java applications are known to be slow. I never tried to do profiling in java but I guess the performance hit comes mostly from memory allocation management.

Categories