Best practice for java object initialization

Best practice for java object initialization - java

Are there any practical differences between these approaches? (memory, GC, performance, etc?)
while...{
Object o=new Object();
...
o=new Object();
...
}
and
Object o;
while...{
o=new Object();
...
o=new Object();
...
}

From Effective Java 2nd Edition:
The most powerful technique for minimizing the scope of a local variable
is to declare it where it is first used. If a variable is declared before it is used, it’s
just clutter—one more thing to distract the reader who is trying to figure out what
the program does. By the time the variable is used, the reader might not remember
the variable’s type or initial value.
Declaring a local variable prematurely can cause its scope not only to extend
too early, but also to end too late. The scope of a local variable extends from the
point where it is declared to the end of the enclosing block. If a variable is
declared outside of the block in which it is used, it remains visible after the program
exits that block. If a variable is used accidentally before or after its region of
intended use, the consequences can be disastrous.
In other words, the difference in performance (CPU, memory) are irrelevant in your case. What is far more important is the semantics and correctness of the program, which is better in your first code example.

In your first example, o will go out of scope after your while loop finishes.
Now, If you don't actually use o outside of the while loop (even if you load the object it references into a different structure) this is fine, but you won't be able to access o outside of the loop
also, and this is just being nitpicky, but neither of those will compile, because you declare Object o twice.

I think you need to trade off between object reuse and the eligibility for garbage collection + readability.
The minimum scope always increase readability & minimize error-proneness.
Again if the creation of some object is too costly(like Thread, Database Connection), the reuse should be considered. They are not generally created inside loop and are cached in pool.
That's why connection pooling & Thread Pool are so popular.

In case of Option 1, Object will be eligible for GC once while loops finishes, whereas in option 2 Object will last until method end.

Related

Should variables be declared inside the loop or outside the loop in java [duplicate]

This question already has answers here:
Declaring variables inside or outside of a loop
(20 answers)
Closed 8 years ago.
I know similar question has been asked many times previously but I am still not convinced about when objects become eligible for GC and which approach is more efficient.
Approach one:
for (Item item : items) {
MyObject myObject = new MyObject();
//use myObject.
}
Approach Two:
MyObject myObject = null;
for (Item item : items) {
myObject = new MyObject();
//use myObject.
}
I understand: "By minimizing the scope of local variables, you increase the readability and maintainability of your code and reduce the likelihood of error". (Joshua Bloch).
But How about performance/memory consumption? In Java Objects are Garbage collected when there is no reference left to the object. If there are e.g. 100000 items then 100000 objects will be created. In Approach One each object will have a reference (myObject) to it so they are not eligible for GC?
Where as in Approach Two with every loop iteration you are removing reference from the object created in previous iteration. so surely objects start becoming eligible after the first loop iteration.
Or is it a trade off between performance and code readability & maintainability?
What have I misunderstood?
Note:
Assuming I care about performance and myObject is not needed after the loop.
Thanks In Advance

If there are e.g. 100000 items then 100000 objects will be created in Approach One and each object will have a reference (myObject) to it so they are not eligible for GC?
No, from Garbage Collector's point of view both the approaches work the same i.e. no memory is leaked. With approach two, as soon as the following statement runs
myObject = new MyObject();
the previous MyObject that was being referenced becomes an orphan (unless while using that Object you passed it around, say, to another method where that reference was saved) and is eligible for garbage collection.
The difference is that once the loop runs out you would have the last instance of MyObject still reachable through the myObject reference originally created outside the loop.
Does GC know when references go out of scope during the loop execution or it can only know at the end of method?
First of all there's only one reference, not references. It's the objects that are getting unreferenced in the loop. Secondly, the garbage collection doesn't kick in spontaneously. So forget the loop, it may not even happen when the method exits.
Notice that I said, orphan objects become eligible for gc, not that they get collected immediately. Garbage collection never happens in real time, it happens in phases. In the mark phase, all the objects that are not reachable through a live thread anymore are marked for deletion. Then in the sweep phase, memory is reclaimed and additionally compacted much like defragmenting a hard drive. So, it works more like a batch rather than piecemeal operations.
GC isn't bothered about scopes or methods as such. It only looks for unreferenced objects and it does so when it feels like doing it. You can't force it. The only thing that you can be sure of is that GC would run if the JVM is running out of memory but you can't pin exactly when it would do so.
But, all this does not mean that GC can't kick in while the method executes or even while the loop is running. If you had, say, a Message Processor that processed 10,000 messages every 10 mins or so and then slept in between i.e. the bean waits within the loop, does 10,000 iterations and then waits again; GC would definitely kick into action to reclaim memory even though the method hasn't run to completion yet.

You have misunderstood when objects become eligible for GC - they do this when they are no longer reachable from an active thread. In this context that means:
When the only reference to them goes out of scope (approach 1).
When the only reference to them is assigned another value (approach 2).
So, the instance of MyObject would be eligible for GC at the end of each loop iteration whichever approach was used. The difference (theoretically) between the two approaches is that the JVM would have to allocate memory for a new object reference each iteration in approach 1 but not in approach 2. However, this assumes the Java compiler and/or Just-In-Time compiler is not smart to optimise approach 1 to actually act like approach 2.
In any case, I would go for the more readable and less error prone approach 1 on the grounds that:
The performance overhead for a single object reference allocation is tiny.
It will probably get optimised away anyway.

In both approaches objects will get Garbage collected.
In Approach 1: As and when for loop exits , all the local variable inside for loop get Garbage collected , as the loop ends.
In Approach 2 : As when new new reference is assigned to myObject variable the earlier has no proper reference .So that earlier get garbage collected and so on until loop runs.
So in both approaches there is no performance bottle neck.

I would not expect declaring the variable inside a block to have a detrimental impact on performance.
At least notionally the JVM allocates the stack frame at the start of the method and destroys it at the end. By implication will have the cumulative size to accommodate all the local variables.
See section 2.6 in here:
http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html
That is consistent with other languages such as C where resizing the stack frame as the function/method executes is an overhead with no apparent return.
So wherever you declare it shouldn't make a difference.
Indeed declaring variables in blocks may help the compiler realize that the effective size of the stack frame can be smaller:
void foo() {
int x=6;
int y=7;
int z=8;
//.....
}
Versus
void bar() {
{
int x=6;
//....
}
{
int y=7;
//....
}
{
int z=8;
//....
}
}
Notice that bar() clearly only needs one local variable not 3.
Though making the stack frame smaller is unlikely to have any real influence on performance!
However when a reference goes out of the scope may make the object it references available for garbage collection. You would otherwise need to set references to null which is an untidy and unnecessary bother (and tinsy weenie overhead).
Without question you should declare variables inside a loop if (and only if) you don't need to access them outside the loop.
IMHO blocked statements (like bar has above) are under used.
If a method proceeds in stages you can protect the later stages from variable pollution using blocks.
With suitable (short) comments it can often be more readable (and efficient) way of structuring code than breaking it down it lost of private methods.
I have a chunky algorithm (Hashlife) where making earlier artifacts available for garbage collection during the method can make the difference between getting to the end and getting OutOfMemoryError.

Declaration of variables within a loop

I have a very basic question about scoping rules. When you declare a variable within a loop, say:
while ( /*some condition*/ )
{
int a = 0;
//Remaining operations
}
Is a new int variable declared in every iteration of the loop? Or is it that a is destroyed at the end of every iteration and freshly created again? How does the compiler in Java or C++ understand and implement this?

You have to differentiate between the logical level and the implementation level.
From a logical point of view, the variable is not really 'created' or 'destroyed', but that's how you could probably imagine it. The variable is simply declared within some scope, so it's guaranteed to exist (you can assign to it and read its value), it's initialized at the beginning of the block (so it has the value 0) and it isn't visible outside the code block. Thats what the language definition says. In C++, if you omit the initialization (i.e. the =0 part), the language doesn't make any assumption about what the value is (so the compiler is free to "reuse" the memory location). In Java, afair, the initialization is implicit, so a will also be set to zero, if you omit the initialization.
At implementation level, the compiler is more or less free to do whatever it wants, as long as it fulfills the above specifications. So in practise, it will most likely reserve some space on the stack and use this same memory for every iteration to store the value of a. Since you've used an initializer, the value 0 will be written to this location at the beginning of every loop. Note, that if a isn't used within the scope, the compiler is also free to simply optimize it away. Or, if possible, it can assign it to a CPU register.
However, theoretically, a compiler could also reserve a "new" memory location for a in every iteration and clear all of them at the end of the loop (although this could result in StackOverflow (!) for long loops...). Or use garbage-collected dynamic memory allocation (which would result in poor performance...).

I find it easier to think about a as being the same variable that gets repeatedly created and destroyed.

Basically, a is a local variable, which is initialized every iteration in the loop with the value 0, and then destroyed, and so on, until the loop is finished when it is ultimately destroyed
Note:
while(//Some Condition)
would comment out the right parenthesis, and therefore the code would not run anyway
Correct this to:
while(/* some condition */)

It's declared only in source code. In bytecode it simply uses a local variable on the stack which will be initialized with 0 at every iteration. The difference with declaration outside the loop is that when it is inside loop JVM will reuse the variable which a occupied.

a create and destroyed after each and every iteration.

Is repeatedly instantiating an anonymous class wasteful?

I had a remark about a piece of code in the style of:
Iterable<String> upperCaseNames = Iterables.transform(
lowerCaseNames, new Function<String, String>() {
public String apply(String input) {
return input.toUpperCase();
}
});
The person said that every time I go through this code, I instantiate this anonymous Function class, and that I should rather have a single instance in, say, a static variable:
static Function<String, String> toUpperCaseFn =
new Function<String, String>() {
public String apply(String input) {
return input.toUpperCase();
}
};
...
Iterable<String> upperCaseNames =
Iterables.transform(lowerCaseNames, toUpperCaseFn);
On a very superficial level, this somehow makes sense; instantiating a class multiple times has to waste memory or something, right?
On the other hand, people instantiate anonymous classes in middle of the code like there's no tomorrow, and it would be trivial for the compiler to optimize this away.
Is this a valid concern?

Fun fact about Hot Spot JVM optimizations, if you instantiate an object that isn't passed outside of the current method, the JVM will perform optimizations at the bytecode level.
Usually, stack allocation is associated with languages that expose the memory model, like C++. You don't have to delete stack variables in C++ because they're automatically deallocated when the scope is exited. This is contrary to heap allocation, which requires you to delete the pointer when you're done with it.
In the Hot Spot JVM, the bytecode is analyzed to decide if an object can "escape" the thread. There are three levels of escape:
No escape - the object is only used within the method/scope it is created, and the object can't be accessed outside the thread.
Local/Arg escape - the object is returned by the method that creates it or passed to a method that it calls, but none of those methods will put that object somewhere that it can be accessed outside of the thread.
Global escape - the object is put somewhere that it can be accessed in another thread.
This basically is analogous to the questions, 1) do I pass it to another method or return it, and 2) do I associate it with something attached to a GC root like a ClassLoader or something stored in a static field?
In your particular case, the anonymous object will be tagged as "local escape", which only means that any locks (read: use of synchronized) on the object will be optimized away. (Why synchronize on something that won't ever be used in another thread?) This is different from "no escape", which will do allocation on the stack. It's important to note that this "allocation" isn't the same as heap allocation. What it really does is allocates space on the stack for all the variables inside the non-escaping object. If you have 3 fields, int, String, and MyObject inside the no-escape object, then three stack variables will be allocated: an int, a String reference, and a MyObject reference – the MyObject instance itself will still be stored in heap unless it is also analyzed to have "no escape". The object allocation is then optimized away and constructors/methods will run using the local stack variables instead of heap variables.
That being said, it sounds like premature optimization to me. Unless the code is later proven to be slow and is causing performance problems, you shouldn't do anything to reduce its readability. To me, this code is pretty readable, I'd leave it alone. This is totally subjective, of course, but "performance" is not a good reason to change code unless it has something to do with its actual running time. Usually, premature optimization results in code that's harder to maintain with minimal performance benefits.
Java 8+ and Lambdas
If allocating anonymous instances still bothers you, I recommend switching to using Lambdas for single abstract method (SAM) types. Lambda evaluation is performed using invokedynamic, and the implementation ends up creating only a single instance of a Lambda on the first invocation. More details can be found in my answer here and this answer here. For non-SAM types, you will still need to allocate an anonymous instance. The performance impact here will be negligible in most use cases, but IMO, it's more readable this way.
References
Escape analysis (wikipedia.org)
HotSpot escape analysis 14 | 11 | 8 (oracle.com)
What is a 'SAM type' in Java? (stackoverflow.com)
Why are Java 8 lambdas invoked using invokedynamic? (stackoverflow.com)

Short answer: No - don't worry.
Long answer: it depends how frequently you're instantiating it. If in a frequently-called tight loop, maybe - though note that when the function is applied it calls String.toUpperCase() once for every item in an Iterable - each call presumably creates a new String, which will create far more GC churn.
"Premature optimization is the root of all evil" - Knuth

Found this thread: Java anonymous class efficiency implications , you may find it interesting
Did some micro-benchmarking. The micro-benchmark was a comparison between: instantiating an (static inner) class per loop iteration, instantiating a (static inner) class once and using it in the loop, and the two similar ones but with anonymous classes. For the micro benchmarking the compiler seemed to extract the anonymous class out of loops and as predicted, promoted the anonymous class to an inner class of the caller. This meant all four methods were indistinguishable in speed. I also compared it to an outside class and again, same speed. The one with anonymous classes probably took ~128 bits of space more
You can check out my micro-benchmark at http://jdmaguire.ca/Code/Comparing.java & http://jdmaguire.ca/Code/OutsideComp.java. I ran this on various values for wordLen, sortTimes, and listLen. As well, the JVM is slow to warm-up so I shuffled the method calls around. Please don't judge me for the awful non-commented code. I program better than that in RL. And Microbenching marking is almost as evil and useless as premature optimization.

to ensure a java method is thread safe

is it enough to use only local variables and no instance variables. Thus only using memory on the stack (per thread).
But what happens when you create a new MyObject that is local to the method. Doesn't the new object get created on the heap ? Is it thread safe becuase the reference to it is local (thread safe) ?

It is thread safe because if it is only referenced by variables in that particular method (it is, as you said, a local variable), then no other threads can possibly have a reference to the object, and therefore cannot change it.
Imagine you and I are pirates (threads). You go and bury your booty (the object) on an island (the heap), keeping a map to it (the reference). I happen to use the same island for burying my booty, but unless you give me your map, or I go digging all over the island (which isn't allowed on the island of Java), I can't mess with your stash.

Your new MyObject is thread-safe because each call to the method will create its own local instance on the heap. None of the calls refer to a common method; if there are N calls, that means N instances of MyObject on the heap. When the method exits, each instance is eligible for GC as long as you don't return it to the caller.

Well, let me ask you a question: does limiting your method to local variables mean your method can't share a resource with another thread? If not, then obviously this isn't sufficient for thread safety in general.
If you're worried about whether another thread can modify an object you created in another thread, then the only thing you need to worry about is never leaking a reference to that object out of the thread. If you achieve that, your object will be in the heap, but no other thread will be able to reference it so it doesn't matter.
Edit
Regarding my first statement, here's a method with no instance variables:
public void methodA() {
File f = new File("/tmp/file");
//...
}
This doesn't mean there can't be a shared resource between two threads :-).

Threre's no way to other threads to access such object reference. But if that object is not thread-safe, then the overall thread-safety is compromised.
Consider for example that MyObject is a HashMap.
The argument that if it's in the heap, it's not thread-safe, is not valid. The heap is not accessible via pointer arithmetic, so it doesn't affect where the object is actually stored (besides ThreadLocal's).

Assigning "null" to objects in every application after their use

Do you always assign null to an object after its scope has been reached?
Or do you rely on the JVM for garbage collection?
Do you do it for all sort of applications regardless of their length?
If so, is it always a good practice?

It's not necessary to explicitly mark objects as null unless you have a very specific reason. Furthermore, I've never seen an application that marks all objects as null when they are no longer needed. The main benefit of garbage collection is the intrinsic memory management.

no, don't do that, except for specific cases such as static fields or when you know a variable/field lives a lot longer than the code referencing it
yes, but with a working knowledge of your VM's limits (and how to cause blocks of memory to be held accidentally)
n/a

I declare almost all of my variables as "final". I also make my methods small and declare most variables local to methods.
Since they are final I cannot assign them null after use... but that is fine since the methods are small the objects are eligible for garbage collection once they return. Since most of the variables are local there is less chance of accidentally holding onto a reference for longer than needed (memory leak).

Assignin null to a variable does not implicitly mean it will be garbage collected right away. In fact it most likely won't be. Whether you practice setting variables to null is usually only cosmetic (with the exception of static variables)

We don't practice this assigning "null". If a variable's scope has reached it's end it should already be ready for GC. There may be some edge cases in which the scope lasts for a while longer due to a long running operation in which case it might make sense to set it to null, but I would imagine they would be rare.
It also goes without saying that if the variable is an object's member variable or a static variable and hence never really goes out of scope then setting it to null to GC is mandatory.

Garbage collection is not as magical as you might expect. As long as an object is referenced from any reachable object it simply can't be collected. So it might be absolutely necessary to null a reference in order to avoid memory leaks. I don't say you should do this always, but always when it's necessary.

As the others have mentioned, it's not usually necessary.
Not only that, but it clutters up your code and increases the data someone needs to read and understand when revisiting your code.

Assigning is not done to objects, it is done to variables, and it means that this variable then holds a reference to some object. Assigning NULL to a variable is not a way to destroy an object, it just clears one reference. If the variable you are clearing will leave its scope afterwards anyway, assigning NULL is just useless noise, because that happens on leaving scope in any case.

The one time I tend to use this practice is if I need to transform a large Collection in some early part of a method.
For example:
public void foo() {
List<? extends Trade> trades = loadTrades();
Map<Date, List<? extends Trade>> tradesByDate = groupTradesByDate(trades);
trades = null; // trades no longer required.
// Apply business logic to tradesByDate map.
}
Obviously I could reduce the need for this by refactoring this into another method: Map<Date, List<? extends Trade>>> loadTradesAndGroupByDate() so it really depends on circumstances / clarity of code.

I only assign a reference to null when:
The code really lies in a memory-critical part.
The reference has a wide scope (and must be reused later). If it is not the case I just declare it in the smallest possible code block. It will be available for collection automatically.
That means that I only use this technique in iterative process where I use the reference to store incoming huge collection of objects. After processing, I do not need the collection any more but I want to reuse the reference for the next collection.
In that case (and only in that case), I then call System.gc() to give a hint to the Garbage Collector. I monitored this technique through heap visualizer and it works very well for big collections (more then 500Mb of data).

When using the .Net I don't think there's a need to set the object to null. Just let the garbage collection happen.

- Do you always assign null to an object after its scope has been reached?
No
- Or do you rely on the JVM for garbage collection?
Yes
- Do you do it for all sort of applications regardless of their length?
Yes
- If so, is it always a good practice?
N/A

I assume you're asking this question because you've seen code with variables being assigned to null at the point where they will never be accessed again.
I dislike this style, but another programmer used it extensively, and said he was taught to do so at a programming course at his university. The reasoning he gave is that it would prevent undetectable bugs if he tried to reuse the variable later on, instead of indeterminate behavior, he'd get a null pointer exception.
So if you're prone to using variables where you shouldn't be using variables, it might make your code more easy to debug.

There was a class of memory leak bugs that happened regardless of whether I set the reference to null - if the library I was using was written in a language like C without memory management, then simply setting the object to null would not necessarily free the memory. We had to call the object's close() method to release the memory (which, of course, we couldn't do after setting it to null.)
It thus seems to me that the de facto method of memory management in java is to rely on the garbage collector unless the object/library you're using has a close() method (or something similar.)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.