I have a basic doubt regarding object creation in java.Suppose i have two classes as follows
Class B{
public int value=100;
}
Class A{
public B getB(){
return new B();
}
public void accessValue(){
//accessing the value without storing object B
System.out.println("value is :"+getB().value);
//accessing the value by storing object B in variable b
B b=getB();
System.out.println("value is :"+b.value);
}
}
My question is,does storing the object and accessing the value make any difference in terms of memory or both are same?
They are both equivalent, since you are instantiating B both times. The first way is just a shorter version of the second.
Following piece of code is using an anonymous object. which can't be reused later in code.
//accessing the value without storing object B
System.out.println("value is :"+getB().value);
Below code uses the object by assigning it to a reference.
//accessing the value by storing object B in variable b
B b=getB();
System.out.println("value is :"+b.value);
Memory and performance wise it's NOT much difference except that in later version stack frame has an extra pointer.
It is the same. This way: B b=getB(); just keeps your code more readable. Keep in mind, that object must be stored somewhere in memory anyway.
If you never reuse the B-object after this part, the first option with an anonymous object is probably neater:
the second option would need an additional store/load command (as Hot Licks mentioned) if it isn't optimized by the compiler
possibly first storing the object in a variable creates slight overhead for the garbage collector as opposed to an anonymous object, but that's more of a "look into that" than a definitive statement of me
If you do want to access a B a second time, storing one in its own variable is faster.
EDIT: ah, both points already mentioned above while I was typing.
You will not be able to say the difference without looking at the generated machine code. It could be that the JIT puts the local variable "b" onto the stack. More likely however that the JIT will optimize b away. Depends on the JRE and JIT you are using. In any case, the difference is minor and only significant in extremely special cases.
Actually there is no difference in the second instance you are just giving the new object reference to b.
So code wise you cannot achieve the println if you use version 1, as you dont have any reference as you have in the second case unless you keep creating new object for every method call.
In that case the difference, if any, would not be worth mentioning. In the second case an extra bytecode or two would probably be generated, if the compiler didn't optimize them away, but any decent JIT would almost certainly optimize the two cases to the identical machine code.
And, in any event, the cost of an extra store/load would be inconsequential for 99.9% of applications (and swamped in this case by the new operation).
Details: If you look at the bytecodes, in the first case the getB method is called and returns a value on the top of the stack. Then a getfield references value and places that on the top of the stack. Then a StringBuilder append is done to begin building the println parameter list.
In the second case there is an extra astore and aload (pointer store/load) after the getB invocation, and the setup for the StringBuilder is stuck between the two, so that side-effects occur in the order specified in the source. If there were no side-effects to worry about the compiler might have chosen to do the very slightly more efficient dupe and astore sequence. In any case, a decent JIT would recognize that b is never used again and optimize away the store.
Related
In Java, does it cost memory to declare a class level instance variable without initializing it?
For example: Does int i; use any memory if I don't initialize it with i = 5;?
Details:
I have a huge super-class that many different (not different enough to have their own super classes) sub-classes extend. Some sub-classes don't use every single primitive declared by the super-class. Can I simply keep such primitives as uninitialized and only initialize them in necessary sub-classes to save memory?
All members defined in your classes have default values, even if you don't initialize them explicitly, so they do use memory.
For example, every int will be initialized by default to 0, and will occupy 4 bytes.
For class members :
int i;
is the same as :
int i = 0;
Here's what the JLS says about instance variables :
If a class T has a field a that is an instance variable, then a new instance variable a is created and initialized to a default value (§4.12.5) as part of each newly created object of class T or of any class that is a subclass of T (§8.1.4). The instance variable effectively ceases to exist when the object of which it is a field is no longer referenced, after any necessary finalization of the object (§12.6) has been completed.
Yes, memory allocates though you are not assigning any value to it.
int i;
That takes 32 bit memory (allocation). No matter you are using it or not.
Some sub-classes don't use every single primitive declared by the super-Class. Can I simply keep such primitives as uninitialized and only initialize them in necessary sub-classes to save memory?
Again, no matter where you initialized, the memory allocates.
Only thing you need to take care is, just find the unused primitives and remove them.
Edit:
Adding one more point that unlike primitive's references by default value is null, which carries a a memory of
4 bytes(32-bit)
8 bytes on (64-bit)
The original question talks about class level variables and the answer is that they do use space, but it's interesting to look at method scoped ones too.
Let's take a small example:
public class MemTest {
public void doSomething() {
long i = 0; // Line 3
if(System.currentTimeMillis() > 0) {
i = System.currentTimeMillis();
System.out.println(i);
}
System.out.println(i);
}
}
If we look at the bytecode generated:
L0
LINENUMBER 3 L0
LCONST_0
LSTORE 1
Ok, as expected we assign a value at line 3 in the code, now if we change line 3 to (and remove the second println due to a compiler error):
long i; // Line 3
... and check the bytecode then nothing is generated for line 3. So, the answer is that no memory is used at this point. In fact, the LSTORE occurs only on line 5 when we assign to the variable. So, declaring an unassigned method variable does not use any memory and, in fact, doesn't generate any bytecode. It's equivalent to making the declaration where you first assign to it.
Yes. In your class level variables will assign to its default value even if you don't initialize them.
In this case you int variables will assign to 0 and will occupied 4 bytes per each.
Neither the Java Language Specification nor the Java Virtual Machine Specification specifies the answer to this because it's an implementation detail. In fact, JVMS §2.7 specifically says:
Representation of Objects
The Java Virtual Machine does not mandate any particular internal structure for objects.
In theory, a conformant virtual machine could implement objects which have a lot of fields using set of bit flags to mark which fields have been set to non-default values. Initially no fields would be allocated, the flag bits would be all 0, and the object would be small. When a field is first set, the corresponding flag bit would be set to 1 and the object would be resized to make space for it. [The garbage collector already provides the necessary machinery for momentarily pausing running code in order to relocate live objects around the heap, which would be necessary for resizing them.]
In practice, this is not a good idea because even if it saves memory it is complicated and slow. Access to fields would require temporarily locking the object to prevent corruption due to multithreading; then reading the current flag bits; and if the field exists then counting the set bits to calculate the current offset of the wanted field relative to the base of the object; then reading the field; and finally unlocking the object.
So, no general-purpose Java virtual machine does anything like this. Some objects with an exorbitant number of fields might benefit from it, but even they couldn't rely on it, because they might need to run on the common virtual machines which don't do that.
A flat layout which allocates space for all fields when an object is first instantiated is simple and fast, so that is the standard. Programmers assume that objects are allocated that way and thus design their programs accordingly to best take advantage of it. Likewise, virtual machine designers optimize to make that usage fast.
Ultimately the flat layout of fields is a convention, not a rule, although you can rely on it anyway.
In Java, when you declare a class attribute such as String str;, you are declaring a reference to an object, but it is not pointing yet to any object unless you affect a value to it str=value;. But as you may guess, the reference, even without pointing to a memory place, consumes itself some memory.
I had a remark about a piece of code in the style of:
Iterable<String> upperCaseNames = Iterables.transform(
lowerCaseNames, new Function<String, String>() {
public String apply(String input) {
return input.toUpperCase();
}
});
The person said that every time I go through this code, I instantiate this anonymous Function class, and that I should rather have a single instance in, say, a static variable:
static Function<String, String> toUpperCaseFn =
new Function<String, String>() {
public String apply(String input) {
return input.toUpperCase();
}
};
...
Iterable<String> upperCaseNames =
Iterables.transform(lowerCaseNames, toUpperCaseFn);
On a very superficial level, this somehow makes sense; instantiating a class multiple times has to waste memory or something, right?
On the other hand, people instantiate anonymous classes in middle of the code like there's no tomorrow, and it would be trivial for the compiler to optimize this away.
Is this a valid concern?
Fun fact about Hot Spot JVM optimizations, if you instantiate an object that isn't passed outside of the current method, the JVM will perform optimizations at the bytecode level.
Usually, stack allocation is associated with languages that expose the memory model, like C++. You don't have to delete stack variables in C++ because they're automatically deallocated when the scope is exited. This is contrary to heap allocation, which requires you to delete the pointer when you're done with it.
In the Hot Spot JVM, the bytecode is analyzed to decide if an object can "escape" the thread. There are three levels of escape:
No escape - the object is only used within the method/scope it is created, and the object can't be accessed outside the thread.
Local/Arg escape - the object is returned by the method that creates it or passed to a method that it calls, but none of those methods will put that object somewhere that it can be accessed outside of the thread.
Global escape - the object is put somewhere that it can be accessed in another thread.
This basically is analogous to the questions, 1) do I pass it to another method or return it, and 2) do I associate it with something attached to a GC root like a ClassLoader or something stored in a static field?
In your particular case, the anonymous object will be tagged as "local escape", which only means that any locks (read: use of synchronized) on the object will be optimized away. (Why synchronize on something that won't ever be used in another thread?) This is different from "no escape", which will do allocation on the stack. It's important to note that this "allocation" isn't the same as heap allocation. What it really does is allocates space on the stack for all the variables inside the non-escaping object. If you have 3 fields, int, String, and MyObject inside the no-escape object, then three stack variables will be allocated: an int, a String reference, and a MyObject reference – the MyObject instance itself will still be stored in heap unless it is also analyzed to have "no escape". The object allocation is then optimized away and constructors/methods will run using the local stack variables instead of heap variables.
That being said, it sounds like premature optimization to me. Unless the code is later proven to be slow and is causing performance problems, you shouldn't do anything to reduce its readability. To me, this code is pretty readable, I'd leave it alone. This is totally subjective, of course, but "performance" is not a good reason to change code unless it has something to do with its actual running time. Usually, premature optimization results in code that's harder to maintain with minimal performance benefits.
Java 8+ and Lambdas
If allocating anonymous instances still bothers you, I recommend switching to using Lambdas for single abstract method (SAM) types. Lambda evaluation is performed using invokedynamic, and the implementation ends up creating only a single instance of a Lambda on the first invocation. More details can be found in my answer here and this answer here. For non-SAM types, you will still need to allocate an anonymous instance. The performance impact here will be negligible in most use cases, but IMO, it's more readable this way.
References
Escape analysis (wikipedia.org)
HotSpot escape analysis 14 | 11 | 8 (oracle.com)
What is a 'SAM type' in Java? (stackoverflow.com)
Why are Java 8 lambdas invoked using invokedynamic? (stackoverflow.com)
Short answer: No - don't worry.
Long answer: it depends how frequently you're instantiating it. If in a frequently-called tight loop, maybe - though note that when the function is applied it calls String.toUpperCase() once for every item in an Iterable - each call presumably creates a new String, which will create far more GC churn.
"Premature optimization is the root of all evil" - Knuth
Found this thread: Java anonymous class efficiency implications , you may find it interesting
Did some micro-benchmarking. The micro-benchmark was a comparison between: instantiating an (static inner) class per loop iteration, instantiating a (static inner) class once and using it in the loop, and the two similar ones but with anonymous classes. For the micro benchmarking the compiler seemed to extract the anonymous class out of loops and as predicted, promoted the anonymous class to an inner class of the caller. This meant all four methods were indistinguishable in speed. I also compared it to an outside class and again, same speed. The one with anonymous classes probably took ~128 bits of space more
You can check out my micro-benchmark at http://jdmaguire.ca/Code/Comparing.java & http://jdmaguire.ca/Code/OutsideComp.java. I ran this on various values for wordLen, sortTimes, and listLen. As well, the JVM is slow to warm-up so I shuffled the method calls around. Please don't judge me for the awful non-commented code. I program better than that in RL. And Microbenching marking is almost as evil and useless as premature optimization.
Consider the following two calls to the same method in java:-
1) doSomething(new Object[]{"something"}) ;
2)
Object[] obj = {"something"} ;
doSomething(obj);
Which one is more efficient in terms of memory and time efficiency ? I would say the 1) is better in both memory and time efficiency. Reason being in the second option requires us to create another variable (extra memory) and then assigns that value to the variable (extra time). Any comments ?
Just to clarify the object will be create only once, i am talking about the extra variable being used to hold the address of the newly created object.
Both are the same in terms of time and memory. The extra assignment can be optimized away by the compiler.
A difference is that the second version gives you an opportunity to give a useful name to your variable, which can make the code more clear.
The second call allows you to reuse the object in the calling method, but the first one does not.
It has no incidence on memory, as the passed object is created anyway.
You should always consider what is simpler and clearer first. You should only consider performance when you know you have a problem because you measured it in a profiler or micro-benchmark.
The best option is likely to be to use varargs
doSomething("something");
void doSomething(String... args) { }
Note: not only is the this simplest, but it is also potentially the fastest as the JIT can eliminate the String[] created.
For example, you want to reverse a string, will there two ways:
first:
String a = "StackOverFlow";
a = new StringBuffer(a).reverse().toString();
and second is:
String a = "StackOverFlow";
StringBuffer b = new StringBuffer(a);
a = b.reverse().toString();
at above code, I have two question:
1) in first code, does java create a "dummy object" StringBuffer in memory before do reverse and change to String.
2) at above code, does first will more optimize than second because It makes GC works more effectively ? (this is a main question I want to ask)
Both snippets will create the same number of objects. The only difference is the number of local variables. This probably won't even change how many values are on the stack etc - it's just that in the case of the second version, there's a name for one of the stack slots (b).
It's very important that you differentiate between objects and variables. It's also important to write the most readable code you can first, rather than trying to micro-optimize. Once you've got clear, working code you should measure to see whether it's fast enough to meet your requirements. If it isn't, you should profile it to work out where you can make changes most effectively, and optimize that section, then remeasure, etc.
The first way will create a very real, not at all a "dummy object" for the StringBuffer.
Unless there are other references to b below the last line of your code, the optimizer has enough information to let the environment garbage-collect b as soon as it's done with toString
The fact that there is no variable for b does not make the object created by new less real. The compiler will probably optimize both snippets into identical bytecode, too.
StringBuffer b is not a dummy object, is a reference; basically just a pointer, that resides in the stack and is very small memory-wise. So not only it makes no difference in performance (GC has nothing to do with this example), but the Java compiler will probably remove it altogether (unless it's used in other places in the code).
In answer to your first question, yes, Java will create a StringBuffer object. It works pretty much the way you think it does.
To your second question, I'm pretty sure that the Java compiler will take care of that for you. The compiler is not without its faults but I think in a simple example like this it will optimize the byte code.
Just a tip though, in Java Strings are immutable. This means they cannot be changed. So when you assign a new value to a String Java will carve out a piece of memory, put the new String value in it, and redirect the variable to the new memory space. After that the garbage collector should come by and clear out the old string.
Is there any performance penalty for the following code snippet?
for (int i=0; i<someValue; i++)
{
Object o = someList.get(i);
o.doSomething;
}
Or does this code actually make more sense?
Object o;
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
If in byte code these two are totally equivalent then obviously the first method looks better in terms of style, but I want to make sure this is the case.
In today's compilers, no. I declare objects in the smallest scope I can, because it's a lot more readable for the next guy.
To quote Knuth, who may be quoting Hoare:
Premature optimization is the root of all evil.
Whether the compiler will produce marginally faster code by defining the variable outside the loop is debatable, and I imagine it won't. I would guess it'll produce identical bytecode.
Compare this with the number of errors you'll likely prevent by correctly-scoping your variable using in-loop declaration...
There's no performance penalty for declaring the Object o within the loop.
The compiler generates very similar bytecode and makes the correct optimizations.
See the article Myth - Defining loop variables inside the loop is bad for performance for a similar example.
You can disassemble the code with javap -c and check what the compiler actually emits. On my setup (java 1.5/mac compiled with eclipse), the bytecode for the loop is identical.
The first code is better as it restricts scope of o variable to the for block. From a performance perspective, it might not have any effects in Java, but it might have in lower level compilers. They might put the variable in a register if you do the first.
In fact, some people might think that if the compiler is dumb, the second snippet is better in terms of performance. This is what some instructor told me at the college and I laughed at him for this suggestion! Basically, compilers allocate memory on the stack for the local variables of a method just once at the start of the method (by adjusting the stack pointer) and release it at the end of method (again by adjusting the stack pointer, assuming it's not C++ or it doesn't have any destructors to be called). So all stack-based local variables in a method are allocated at once, no matter where they are declared and how much memory they require. Actually, if the compiler is dumb, there is no difference in terms of performance, but if it's smart enough, the first code can actually be better as it'll help the compiler understand the scope and the lifetime of the variable! By the way, if it's really smart, there should no absolutely no difference in performance as it infers the actual scope.
Construction of a object using new is totally different from just declaring it, of course.
I think readability is more important that performance and from a readability standpoint, the first code is definitely better.
I've got to admit I don't know java. But are these two equivalent? Are the object lifetimes the same? In the first example, I assume (not knowing java) that o will be eligible for garbage collection immediately the loop terminates.
But in the second example surely o won't be eligible for garbage collection until the outer scope (not shown) is exited?
Don't prematurely optimize. Better than either of these is:
for(Object o : someList) {
o.doSomething();
}
because it eliminates boilerplate and clarifies intent.
Unless you are working on embedded systems, in which case all bets are off. Otherwise, don't try to outsmart the JVM.
I've always thought that most compilers these days are smart enough to do the latter option. Assuming that's the case, I would say the first one does look nicer as well. If the loop gets very large, there's no need to look all around for where o is declared.
These have different semantics. Which is more meaningful?
Reusing an object for "performance reasons" is often wrong.
The question is what does the object "mean"? WHy are you creating it? What does it represent? Objects must parallel real-world things. Things are created, undergo state changes, and report their states for reasons.
What are those reasons? How does your object model and reflect those reasons?
To get at the heart of this question... [Note that non-JVM implementations may do things differently if allowed by the JLS...]
First, keep in mind that the local variable "o" in the example is a pointer, not an actual object.
All local variables are allocated on the runtime stack in 4-byte slots. doubles and longs require two slots; other primitives and pointers take one. (Even booleans take a full slot)
A fixed runtime-stack size must be created for each method invocation. This size is determined by the maximum local variable "slots" needed at any given spot in the method.
In the above example, both versions of the code require the same maximum number of local variables for the method.
In both cases, the same bytecode will be generated, updating the same slot in the runtime stack.
In other words, no performance penalty at all.
HOWEVER, depending on the rest of the code in the method, the "declaration outside the loop" version might actually require a larger runtime stack allocation. For example, compare
for (...) { Object o = ... }
for (...) { Object o = ... }
with
Object o;
for (...) { /* loop 1 */ }
for (...) { Object x =...; }
In the first example, both loops require the same runtime stack allocation.
In the second example, because "o" lives past the loop, "x" requires an additional runtime stack slot.
Hope this helps,
-- Scott
In both cases the type info for the object o is determined at compile time.In the second instance, o is seen as being global to the for loop and in the first instance, the clever Java compiler knows that o will have to be available for as long as the loop lasts and hence will optimise the code in such a way that there wont be any respecification of o's type in each iteration.
Hence, in both cases, specification of o's type will be done once which means the only performance difference would be in the scope of o. Obviously, a narrower scope always enhances performance, therefore to answer your question: no, there is no performance penalty for the first code snip; actually, this code snip is more optimised than the second.
In the second snip, o is being given unnecessary scope which, besides being a performance issue, can be also a security issue.
The first makes far more sense. It keeps the variable in the scope that it is used in. and prevents values assigned in one iteration being used in a later iteration, this is more defensive.
The former is sometimes said to be more efficient but any reasonable compiler should be able to optimise it to be exactly the same as the latter.
As someone who maintains more code than writes code.
Version 1 is much preferred - keeping scope as local as possible is essential for understanding. Its also easier to refactor this sort of code.
As discussed above - I doubt this would make any difference in efficiency. In fact I would argue that if the scope is more local a compiler may be able to do more with it!
When using multiple threads (if your doing 50+) then i found this to be a very effective way of handling ghost thread problems:
Object one;
Object two;
Object three;
Object four;
Object five;
try{
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
}catch(e){
e.printstacktrace
}
finally{
one = null;
two = null;
three = null;
four = null;
five = null;
System.gc();
}
The answer depends partly on what the constructor does and what happens with the object after the loop, since that determines to a large extent how the code is optimized.
If the object is large or complex, absolutely declare it outside the loop. Otherwise, the people telling you not to prematurely optimize are right.
I've actually in front of me a code which looks like this:
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
So, relying on compiler abilities, I can assume there would be only one stack allocation for i and one for append. Then everything would be fine except the duplicated code.
As a side note, java applications are known to be slow. I never tried to do profiling in java but I guess the performance hit comes mostly from memory allocation management.