Is repeatedly instantiating an anonymous class wasteful? - java

I had a remark about a piece of code in the style of:
Iterable<String> upperCaseNames = Iterables.transform(
lowerCaseNames, new Function<String, String>() {
public String apply(String input) {
return input.toUpperCase();
}
});
The person said that every time I go through this code, I instantiate this anonymous Function class, and that I should rather have a single instance in, say, a static variable:
static Function<String, String> toUpperCaseFn =
new Function<String, String>() {
public String apply(String input) {
return input.toUpperCase();
}
};
...
Iterable<String> upperCaseNames =
Iterables.transform(lowerCaseNames, toUpperCaseFn);
On a very superficial level, this somehow makes sense; instantiating a class multiple times has to waste memory or something, right?
On the other hand, people instantiate anonymous classes in middle of the code like there's no tomorrow, and it would be trivial for the compiler to optimize this away.
Is this a valid concern?

Fun fact about Hot Spot JVM optimizations, if you instantiate an object that isn't passed outside of the current method, the JVM will perform optimizations at the bytecode level.
Usually, stack allocation is associated with languages that expose the memory model, like C++. You don't have to delete stack variables in C++ because they're automatically deallocated when the scope is exited. This is contrary to heap allocation, which requires you to delete the pointer when you're done with it.
In the Hot Spot JVM, the bytecode is analyzed to decide if an object can "escape" the thread. There are three levels of escape:
No escape - the object is only used within the method/scope it is created, and the object can't be accessed outside the thread.
Local/Arg escape - the object is returned by the method that creates it or passed to a method that it calls, but none of those methods will put that object somewhere that it can be accessed outside of the thread.
Global escape - the object is put somewhere that it can be accessed in another thread.
This basically is analogous to the questions, 1) do I pass it to another method or return it, and 2) do I associate it with something attached to a GC root like a ClassLoader or something stored in a static field?
In your particular case, the anonymous object will be tagged as "local escape", which only means that any locks (read: use of synchronized) on the object will be optimized away. (Why synchronize on something that won't ever be used in another thread?) This is different from "no escape", which will do allocation on the stack. It's important to note that this "allocation" isn't the same as heap allocation. What it really does is allocates space on the stack for all the variables inside the non-escaping object. If you have 3 fields, int, String, and MyObject inside the no-escape object, then three stack variables will be allocated: an int, a String reference, and a MyObject reference – the MyObject instance itself will still be stored in heap unless it is also analyzed to have "no escape". The object allocation is then optimized away and constructors/methods will run using the local stack variables instead of heap variables.
That being said, it sounds like premature optimization to me. Unless the code is later proven to be slow and is causing performance problems, you shouldn't do anything to reduce its readability. To me, this code is pretty readable, I'd leave it alone. This is totally subjective, of course, but "performance" is not a good reason to change code unless it has something to do with its actual running time. Usually, premature optimization results in code that's harder to maintain with minimal performance benefits.
Java 8+ and Lambdas
If allocating anonymous instances still bothers you, I recommend switching to using Lambdas for single abstract method (SAM) types. Lambda evaluation is performed using invokedynamic, and the implementation ends up creating only a single instance of a Lambda on the first invocation. More details can be found in my answer here and this answer here. For non-SAM types, you will still need to allocate an anonymous instance. The performance impact here will be negligible in most use cases, but IMO, it's more readable this way.
References
Escape analysis (wikipedia.org)
HotSpot escape analysis 14 | 11 | 8 (oracle.com)
What is a 'SAM type' in Java? (stackoverflow.com)
Why are Java 8 lambdas invoked using invokedynamic? (stackoverflow.com)

Short answer: No - don't worry.
Long answer: it depends how frequently you're instantiating it. If in a frequently-called tight loop, maybe - though note that when the function is applied it calls String.toUpperCase() once for every item in an Iterable - each call presumably creates a new String, which will create far more GC churn.
"Premature optimization is the root of all evil" - Knuth

Found this thread: Java anonymous class efficiency implications , you may find it interesting
Did some micro-benchmarking. The micro-benchmark was a comparison between: instantiating an (static inner) class per loop iteration, instantiating a (static inner) class once and using it in the loop, and the two similar ones but with anonymous classes. For the micro benchmarking the compiler seemed to extract the anonymous class out of loops and as predicted, promoted the anonymous class to an inner class of the caller. This meant all four methods were indistinguishable in speed. I also compared it to an outside class and again, same speed. The one with anonymous classes probably took ~128 bits of space more
You can check out my micro-benchmark at http://jdmaguire.ca/Code/Comparing.java & http://jdmaguire.ca/Code/OutsideComp.java. I ran this on various values for wordLen, sortTimes, and listLen. As well, the JVM is slow to warm-up so I shuffled the method calls around. Please don't judge me for the awful non-commented code. I program better than that in RL. And Microbenching marking is almost as evil and useless as premature optimization.

Related

Java method param: cached result vs direct call

I had hard time describing what I mean exactly in the topic so let's stick to the example.
Given a hypothetical set of methods
Object firstObject();
Object secondObject();
void myMethod(Object o1, Object o2);
Is there a any difference between the following pieces of code, other than style?
// Code 1.
myMethod(firstObject(), secondObject());
// Code 2.
Object o1 = firstObject();
Object o2 = secondObject();
myMethod(o1,o2);
I am asking mostly due to the fact that I came across some really long lines of code due to Code 1. type of style and I am wondering if there are really any benefits of having it written like that.
If you are asking if there is any overhead to refactoring a method call result to a variable then answer is no.
One significant pro for option 1 is better readability and ability to easily use a debugger.
Here's an example of the same question in a bit more in depth manner: Does chaining methods vs making temporary variables in Java impact memory allocation?
Important part is:
To answer your question, this will make no difference in terms of object allocation and therefore GC impact.

Creating a variable instead of multiple getter usage - which is better for overall performance? [duplicate]

In the following piece of code we make a call listType.getDescription() twice:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
if (listType.getDescription() != null)
{
children.add(new SelectItem( listType.getId() , listType.getDescription()));
}
}
I would tend to refactor the code to use a single variable:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
String description = listType.getDescription();
if (description != null)
{
children.add(new SelectItem(listType.getId() ,description));
}
}
My understanding is the JVM is somehow optimized for the original code and especially nesting calls like children.add(new SelectItem(listType.getId(), listType.getDescription()));.
Comparing the two options, which one is the preferred method and why? That is in terms of memory footprint, performance, readability/ease, and others that don't come to my mind right now.
When does the latter code snippet become more advantageous over the former, that is, is there any (approximate) number of listType.getDescription() calls when using a temp local variable becomes more desirable, as listType.getDescription() always requires some stack operations to store the this object?
I'd nearly always prefer the local variable solution.
Memory footprint
A single local variable costs 4 or 8 bytes. It's a reference and there's no recursion, so let's ignore it.
Performance
If this is a simple getter, the JVM can memoize it itself, so there's no difference. If it's a expensive call which can't be optimized, memoizing manually makes it faster.
Readability
Follow the DRY principle. In your case it hardly matters as the local variable name is character-wise as about as long as the method call, but for anything more complicated, it's readability as you don't have to find the 10 differences between the two expressions. If you know they're the same, so make it clear using the local variable.
Correctness
Imagine your SelectItem does not accept nulls and your program is multithreaded. The value of listType.getDescription() can change in the meantime and you're toasted.
Debugging
Having a local variable containing an interesting value is an advantage.
The only thing to win by omitting the local variable is saving one line. So I'd do it only in cases when it really doesn't matter:
very short expression
no possible concurrent modification
simple private final getter
I think the way number two is definitely better because it improves readability and maintainability of your code which is the most important thing here. This kind of micro-optimization won't really help you in anything unless you writing an application where every millisecond is important.
I'm not sure either is preferred. What I would prefer is clearly readable code over performant code, especially when that performance gain is negligible. In this case I suspect there's next to no noticeable difference (especially given the JVM's optimisations and code-rewriting capabilities)
In the context of imperative languages, the value returned by a function call cannot be memoized (See http://en.m.wikipedia.org/wiki/Memoization) because there is no guarantee that the function has no side effect. Accordingly, your strategy does indeed avoid a function call at the expense of allocating a temporary variable to store a reference to the value returned by the function call.
In addition to being slightly more efficient (which does not really matter unless the function is called many times in a loop), I would opt for your style due to better code readability.
I agree on everything. About the readability I'd like to add something:
I see lots of programmers doing things like:
if (item.getFirst().getSecond().getThird().getForth() == 1 ||
item.getFirst().getSecond().getThird().getForth() == 2 ||
item.getFirst().getSecond().getThird().getForth() == 3)
Or even worse:
item.getFirst().getSecond().getThird().setForth(item2.getFirst().getSecond().getThird().getForth())
If you are calling the same chain of 10 getters several times, please, use an intermediate variable. It's just much easier to read and debug
I would agree with the local variable approach for readability only if the local variable's name is self-documenting. Calling it "description" wouldn't be enough (which description?). Calling it "selectableListTypeDescription" would make it clear. I would throw in that the incremented variable in the for loop should be named "selectableListType" (especially if the "listTypeManager" has accessors for other ListTypes).
The other reason would be if there's no guarantee this is single-threaded or your list is immutable.

How do GC work in middle of scope

I have code like:
public void foo()
{
Object x = new LongObject();
doSomething(x);
//More Code
// x is never used again
// x = null helps GB??
Object x2 = new LongObject();
doSomething(x2);
}
I would like that memory alocated by x could be free by GC if it's needed. But I don't know if set to null is necesary or compiler do it.
In point of fact, the JIT does liveness analysis on references (which at bytecode level are stored as slots in the current frame). If a reference is never again read from, its slot can be reused, and the JIT will know that. It is completely possible for an object to be garbage collected while a variable that refers to it is still in lexical scope, so long as the compiler and JIT are able to prove that the variable will never again be dereferenced.
The point is: scope is a construct of the language, and specifies what a name like x means at any point in the text of the program code that it occurs. Lifetime is a property of objects, and the JIT and GC manage that -- often in non-obvious ways.
Remember that the JIT can recompile your code while it's running, and will optimize your code as it sees what happens when it executes. Unless you're really certain you know what you're doing, don't try to outsmart the JIT. Write code that is correct and let the JIT do its job, and only worry about it if you have evidence that the JIT hasn't done its job well enough.
To answer your questions literally, the compiler (speaking of source code to bytecode compiler) never inserts null assignments, but still, assigning a variable to null is not necessary—usually.
As this answer explains, scope is a compile time thing and formally, an object is eligible to garbage collection, if it can not “be accessed in any potential continuing computation from any live thread”. But there is no guaranty about which eligible object will be identified by a particular implementation. As the linked answer also explains, JIT compiled code will only keep references to objects which will be subsequently accessed. This may go even further than you expect, allow garbage collection of objects that look like being in use in the source code, as runtime optimization may transform the code and reduce actual memory accesses.
But in interpreted mode, the analysis will not go so far and there might be object references in the current stack frame preventing the collection of the referent, despite the variable is not being used afterwards or even out of scope in the source code. There is no guaranty that switching from interpreted to compiled code while the method is executed is capable of getting rid of such a dangling references. It’s even unlikely that the hotspot optimizer considers compiling foo() when the actual heavy computation happens within doSomething.
Still, this is rarely an issue. Running interpreted happens only during the initialization or first time execution and even if these objects are large, there’s rarely a problem if such an object gets collected a bit later than it could. An average application consists of millions of objects.
However, if you ever think there could be an issue, you can easily fix this, without assigning null to the variable. Limit the scope:
public void foo()
{
{
Object x = new LongObject();
doSomething(x);
//More Code
}
{
Object x2 = new LongObject();
doSomething(x2);
}
}
Other than assigning null, limiting the scope of variables to the actual use is improving the source code quality, even in cases where it has no impact on the compiled code. While the scope is purely a source code thing, it can have an impact on the bytecode though. In the code above, compilers will reuse the location of x within the stack frame to store x2, so no dangling reference to the first LongObject exists during the second doSomething execution.
As said, this is rarely needed for memory management and improving source code quality should be driving you decisions, not attempts to help the garbage collector.

Memory Effecient Object creation in Java

I have a basic doubt regarding object creation in java.Suppose i have two classes as follows
Class B{
public int value=100;
}
Class A{
public B getB(){
return new B();
}
public void accessValue(){
//accessing the value without storing object B
System.out.println("value is :"+getB().value);
//accessing the value by storing object B in variable b
B b=getB();
System.out.println("value is :"+b.value);
}
}
My question is,does storing the object and accessing the value make any difference in terms of memory or both are same?
They are both equivalent, since you are instantiating B both times. The first way is just a shorter version of the second.
Following piece of code is using an anonymous object. which can't be reused later in code.
//accessing the value without storing object B
System.out.println("value is :"+getB().value);
Below code uses the object by assigning it to a reference.
//accessing the value by storing object B in variable b
B b=getB();
System.out.println("value is :"+b.value);
Memory and performance wise it's NOT much difference except that in later version stack frame has an extra pointer.
It is the same. This way: B b=getB(); just keeps your code more readable. Keep in mind, that object must be stored somewhere in memory anyway.
If you never reuse the B-object after this part, the first option with an anonymous object is probably neater:
the second option would need an additional store/load command (as Hot Licks mentioned) if it isn't optimized by the compiler
possibly first storing the object in a variable creates slight overhead for the garbage collector as opposed to an anonymous object, but that's more of a "look into that" than a definitive statement of me
If you do want to access a B a second time, storing one in its own variable is faster.
EDIT: ah, both points already mentioned above while I was typing.
You will not be able to say the difference without looking at the generated machine code. It could be that the JIT puts the local variable "b" onto the stack. More likely however that the JIT will optimize b away. Depends on the JRE and JIT you are using. In any case, the difference is minor and only significant in extremely special cases.
Actually there is no difference in the second instance you are just giving the new object reference to b.
So code wise you cannot achieve the println if you use version 1, as you dont have any reference as you have in the second case unless you keep creating new object for every method call.
In that case the difference, if any, would not be worth mentioning. In the second case an extra bytecode or two would probably be generated, if the compiler didn't optimize them away, but any decent JIT would almost certainly optimize the two cases to the identical machine code.
And, in any event, the cost of an extra store/load would be inconsequential for 99.9% of applications (and swamped in this case by the new operation).
Details: If you look at the bytecodes, in the first case the getB method is called and returns a value on the top of the stack. Then a getfield references value and places that on the top of the stack. Then a StringBuilder append is done to begin building the println parameter list.
In the second case there is an extra astore and aload (pointer store/load) after the getB invocation, and the setup for the StringBuilder is stuck between the two, so that side-effects occur in the order specified in the source. If there were no side-effects to worry about the compiler might have chosen to do the very slightly more efficient dupe and astore sequence. In any case, a decent JIT would recognize that b is never used again and optimize away the store.

Does using final for variables in Java improve garbage collection?

Today my colleagues and me have a discussion about the usage of the final keyword in Java to improve the garbage collection.
For example, if you write a method like:
public Double doCalc(final Double value)
{
final Double maxWeight = 1000.0;
final Double totalWeight = maxWeight * value;
return totalWeight;
}
Declaring the variables in the method final would help the garbage collection to clean up the memory from the unused variables in the method after the method exits.
Is this true?
Here's a slightly different example, one with final reference-type fields rather than final value-type local variables:
public class MyClass {
public final MyOtherObject obj;
}
Every time you create an instance of MyClass, you'll be creating an outgoing reference to a MyOtherObject instance, and the GC will have to follow that link to look for live objects.
The JVM uses a mark-sweep GC algorithm, which has to examine all the live refereces in the GC "root" locations (like all the objects in the current call stack). Each live object is "marked" as being alive, and any object referred to by a live object is also marked as being alive.
After the completion of the mark phase, the GC sweeps through the heap, freeing memory for all unmarked objects (and compacting the memory for the remaining live objects).
Also, it's important to recognize that the Java heap memory is partitioned into a "young generation" and an "old generation". All objects are initially allocated in the young generation (sometimes referred to as "the nursery"). Since most objects are short-lived, the GC is more aggressive about freeing recent garbage from the young generation. If an object survives a collection cycle of the young generation, it gets moved into the old generation (sometimes referred to as the "tenured generation"), which is processed less frequently.
So, off the top of my head, I'm going to say "no, the 'final' modifer doesn't help the GC reduce its workload".
In my opinion, the best strategy for optimizing your memory-management in Java is to eliminate spurious references as quickly as possible. You could do that by assigning "null" to an object reference as soon as you're done using it.
Or, better yet, minimize the size of each declaration scope. For example, if you declare an object at the beginning of a 1000-line method, and if the object stays alive until the close of that method's scope (the last closing curly brace), then the object might stay alive for much longer that actually necessary.
If you use small methods, with only a dozen or so lines of code, then the objects declared within that method will fall out of scope more quickly, and the GC will be able to do most of its work within the much-more-efficient young generation. You don't want objects being moved into the older generation unless absolutely necessary.
Declaring a local variable final will not affect garbage collection, it only means you can not modify the variable. Your example above should not compile as you are modifying the variable totalWeight which has been marked final. On the other hand, declaring a primitive (double instead of Double) final will allows that variable to be inlined into the calling code, so that could cause some memory and performance improvement. This is used when you have a number of public static final Strings in a class.
In general, the compiler and runtime will optimize where it can. It is best to write the code appropriately and not try to be too tricky. Use final when you do not want the variable to be modified. Assume that any easy optimizations will be performed by the compiler, and if you are worried about performance or memory use, use a profiler to determine the real problem.
No, it is emphatically not true.
Remember that final does not mean constant, it just means you can't change the reference.
final MyObject o = new MyObject();
o.setValue("foo"); // Works just fine
o = new MyObject(); // Doesn't work.
There may be some small optimisation based around the knowledge that the JVM will never have to modify the reference (such as not having check to see if it has changed) but it would be so minor as to not worry about.
Final should be thought of as useful meta-data to the developer and not as a compiler optimisation.
Some points to clear up:
Nulling out reference should not help GC. If it did, it would indicate that your variables are over scoped. One exception is the case of object nepotism.
There is no on-stack allocation as of yet in Java.
Declaring a variable final means you can't (under normal conditions) assign a new value to that variable. Since final says nothing about scope, it doesn't say anything about it's effect on GC.
Well, I don't know about the use of the "final" modifier in this case, or its effect on the GC.
But I can tell you this: your use of Boxed values rather than primitives (e.g., Double instead of double) will allocate those objects on the heap rather than the stack, and will produce unnecessary garbage that the GC will have to clean up.
I only use boxed primitives when required by an existing API, or when I need nullable primatives.
Final variables cannot be changed after initial assignment (enforced by the compiler).
This does not change the behaviour of the garbage collection as such. Only thing is that these variables cannot be nulled when not being used any more (which may help the garbage collection in memory tight situations).
You should know that final allows the compiler to make assumptions about what to optimize. Inlining code and not including code known not to be reachable.
final boolean debug = false;
......
if (debug) {
System.out.println("DEBUG INFO!");
}
The println will not be included in the byte code.
There is a not so well known corner case with generational garbage collectors. (For a brief description read the answer by benjismith for a deeper insight read the articles at the end).
The idea in generational GCs is that most of the time only young generations need to be considered. The root location is scanned for references, and then the young generation objects are scanned. During this more frequent sweeps no object in the old generation are checked.
Now, the problem comes from the fact that an object is not allowed to have references to younger objects. When a long lived (old generation) object gets a reference to a new object, that reference must be explicitly tracked by the garbage collector (see article from IBM on the hotspot JVM collector), actually affecting the GC performance.
The reason why an old object cannot refer to a younger one is that, as the old object is not checked in minor collections, if the only reference to the object is kept in the old object, it will not get marked, and would be wrongly deallocated during the sweep stage.
Of course, as pointed by many, the final keyword does not reallly affect the garbage collector, but it does guarantee that the reference will never be changed into a younger object if this object survives the minor collections and makes it to the older heap.
Articles:
IBM on garbage collection: history, in the hotspot JVM and performance. These may no longer be fully valid, as it dates back in 2003/04, but they give some easy to read insight into GCs.
Sun on Tuning garbage collection
GC acts on unreachable refs. This has nothing to do with "final", which is merely an assertion of one-time assignment. Is it possible that some VM's GC can make use of "final"? I don't see how or why.
final on local variables and parameters makes no difference to the class files produced, so cannot affect runtime performance. If a class has no subclasses, HotSpot treats that class as if it is final anyway (it can undo later if a class that breaks that assumption is loaded). I believe final on methods is much the same as classes. final on static field may allow the variable to be interpreted as a "compile-time constant" and optimisation to be done by javac on that basis. final on fields allows the JVM some freedom to ignore happens-before relations.
There seems to be a lot of answers that are wandering conjectures. The truth is, there is no final modifier for local variables at the bytecode level. The virtual machine will never know that your local variables were defined as final or not.
The answer to your question is an emphatic no.
All method and variable can be overridden bydefault in subclasses.If we want to save the subclasses from overridig the members of superclass,we can declare them as final using the keyword final.
For e.g-
final int a=10;
final void display(){......}
Making a method final ensures that the functionality defined in the superclass will never be changed anyway. Similarly the value of a final variable can never be changed. Final variables behaves like class variables.
Strictly speaking about instance fields, final might improve performance slightly if a particular GC wants to exploit that. When a concurrent GC happens (that means that your application is still running, while GC is in progress), see this for a broader explanation, GCs have to employ certain barriers when writes and/or reads are done. The link I gave you pretty much explains that, but to make it really short: when a GC does some concurrent work, all read and writes to the heap (while that GC is in progress), are "intercepted" and applied later in time; so that the concurrent GC phase can finish it's work.
For final instance fields, since they can not be modified (unless reflection), these barriers can be omitted. And this is not just pure theory.
Shenandoah GC has them in practice (though not for long), and you can do, for example:
-XX:+UnlockExperimentalVMOptions
-XX:+UseShenandoahGC
-XX:+ShenandoahOptimizeInstanceFinals
And there will be optimizations in the GC algorithm that will make it slightly faster. This is because there will be no barriers intercepting final, since no one should modify them, ever. Not even via reflection or JNI.
The only thing that I can think of is that the compiler might optimize away the final variables and inline them as constants into the code, thus you end up with no memory allocated.
absolutely, as long as make object's life shorter which yield great benefit of memory management, recently we examined export functionality having instance variables on one test and another test having method level local variable. during load testing, JVM throws outofmemoryerror on first test and JVM got halted. but in second test, successfully able to get the report due to better memory management.
The only time I prefer declaring local variables as final is when:
I have to make them final so that they can be shared with some anonymous class (for example: creating daemon thread and let it access some value from enclosing method)
I want to make them final (for example: some value that shouldn't/doesn't get overridden by mistake)
Does they help in fast garbage collection?
AFAIK a object becomes a candidate of GC collection if it has zero strong references to it and in that case as well there is no guarantee that they will be immediately garbage collected . In general, a strong reference is said to die when it goes out of scope or user explicitly reassign it to null reference, thus, declaring them final means that reference will continue to exists till the method exists (unless its scope is explicitly narrowed down to a specific inner block {}) because you can't reassign final variables (i.e. can't reassign to null). So I think w.r.t Garbage Collection 'final' may introduce a unwanted possible delay so one must be little careful in defining there scope as that controls when they will become candidate for GC.

Categories