Externalizing local variables in recursive method in Java - java

I've got a recursive method which has local String variables:
private void recursiveUpdate(int id){
String selectQuery="Select ...";
String updateQuery="Update or rollback ..."
...
for(int childID: children)
recursiveUpdate(childID);
}
Is there any reason to externalize local String variables like this:
private static final String selectQuery="Select ...";
private static final String updateQuery="Update or rollback ..."
private void recursiveUpdate(int id){
...
for(int childID: children)
recursiveUpdate(childID);
}

From a technical point of view the difference between the two should be negligible since in either case you'd always use the same string instances. If you are parsing those query in every call you might consider externalizing that as well (e.g. using prepared statements).
From a development point of view, I'd probably externalize the queries to separate them from the call logic.

In the former case, you are relying on the compiler to recognize that those strings are unchanging across all calls so it doesn't need to give a fresh copy of each variable to each invocation of recursiveUpdate, whereas in the latter case, there is no question about it.

Yes. You'd want to externalize the variables. If left as a local variable, amd depending on the size of the call stack, you could quickly accumulate many string objects and lower efficiency. Even worse if making edits to the string inside the recursive method. Also, you will not be making changes to the strings as it appears to me, so if used as a 'reference' it would be better to externalize it.

Typically you should be concerned with the size of your stack memory when making recursive calls. This tells the cpu where to jump when the method completes. It contains your method parameters and returning location.
Object instantiated within the body of the method are saved in the heap. In this case, I think the compiler will figure out that these are constants and save them to static memory. The heap is much larger and is more likely to survive the recursion when objects are instantiated so I wouldn't worry about it.. By moving object out, you'll save a little space in your heap.
IMO, it's best to move the variable out, if the values are always the same (no-dynamic). This way, if they ever change, you can find them easily.

Related

java - creating distinct variables vs 1 variable for multiple values? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have some code which is like:
String country = null;
country = getEuropeanCountry(); //Germany
//after few lines of code
country = getAsianCountry(); //Japan
//after few more lines
country = getNorthAmericanCountry(); //Canada
/*and the code follows by assigning a different country value to the same variable "country"*/
I have this kind of usage in most of my code.
For some reason, my application throws "Error java.lang.OutOfMemoryError: GC overhead limit exceeded".
So I tried with VM argument: -XX:-UseGCOverheadLimit
Then my app ran successfully but I noticed that it is consuming more memory (I had to set -Xmx to 5g or 6g; otherwise I get: out of memory error).
I checked my app and there are no memory leaks. But most of my code has the similar code that I posted above.
Can anyone tell me if it is beneficial for memory management if I refactor the above code to:
String europeanCountry = getEuropeanCountry(); //Germany
//after few lines of code
String asianCountry = getAsianCountry(); //Japan
//after few more lines
String northAmericanCountry = getNorthAmericanCountry(); //Canada
/*and the code follows by assigning a different country value to a different String variable*/
I can't use collections. I mean, in general, which way is better to use heap space and garbage collector efficiently?
For the question " I mean, in general, which way is better to use heap space and garbage collector efficiently?"
Lets look at the String implementation e.g. jdk8 https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/lang/String.java
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final char value[];
So it is a final character array - it cannot be reasigned or changed. So it is generated on the heap in your method and is never changed - only a reference(a name) is generated.
To make no mistake lets also look at the constructor of a String (doing smth. like eg. "newString = new String(otherString)":
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}
Also in that case no additional space on the heap is allocated - it stays the same single final char array on the heap.
So you can assign a new String to a reference.(Give it an additional name). But it is allways the same unique String generated in your method and no new space on the heap is allocated.
So comparing the two methods in a first narrowing:
String europeanCountry = getEuropeanCountry();
String asianCountry = getAsianCountry();
and
String country = null;
country = getEuropeanCountry();
country = getAsianCountry();
Both will formaly create the same amount of Strings on the heap as the String is allways generated in the same methods. The variable are only references to that.
The only difference is that reusing in the second case allows formaly the String to be earlier garbage collected(In the moment the reference to it is removed by reusing the variable).
So with the second approach(reusing) you may generate a smaller memory footprint for a neclectable time.
I said in a first narrowing as this is only true if there is no other reference to the String and without optimization - so if no other reference exists and no optimization would take place the above would be the case.
However in your above code the variables don't exit scope and are never used. The compiler will detect this and no variable will be assigned at all. Depending on what the methods do they may be inlined and also not called. So what the methods you call look like makes a difference. Depending how complex they are the space on the heap is allocated or not.
Also the other way round: if you use the variable and runtime detects that you will likely call the method again for the same value, the value will be kept on the heap and not freed even if there is formal no reference and it could formally be garbage collected - so the assigment again makes no difference but the call of the method.
Also the obvioue: if the methods don't only generate the Strings but pull them from somewhere(a container) or store them somewhere that other reference is the one for which space on the heap is kept (and is) allocated and your assignment makes no difference at all regarding heap: it is the same final char array on the heap.
With that in mind the problem you are facing is most probably not that assignment of Strings but the design of your code. It must be a by far more complex scenario in which references are kept longer.
So far for your question.
For your problem i would look out:
for containers
where variables are generated
for frequent use . That is calling the methods very frequently for a lot of different values as in such case they are kept in memory for the next assumed call to happen.
for code where it is not easy to follow the flow of the data. The compiler optimizes by analyzing the flow. If you can't follow, it is more likely that the compiler can't neither than in other parts.
Assuming lines of code shown are from single method(let me know if that's not the case), there are at least 3 issues I can point outwith the code:
it seems method size is too large. Prefer writing methods that are as concise as possible and do just "one thing" and do it well.
too much state change. In the 1st example where you set variable 'country' to 3 different method return values.
consider using polymorphism, rather than repeating code in an if-else fashion for fetching the country
finally, it's not clear how country values are used in methods.

Is it wise to declare a String as final if I use it many times?

I have a method repeatedMethod like this:
public static void repeatedMethod() {
// something else
anotherMethod("myString");
// something else
}
public static void anotherMethod(String str) {
//something that doesn't change the value of str
}
and I call the repeatedMethod many times.
I would like to ask if it is wise to declare myString as static final outside that method like this:
public static final String s = "myString";
public void repeatedMethod() {
anotherMethod(s);
}
I think that when I do anotherMethod("myString"), a new instance of String is created. And since I do that many times, many instances of String are created.
Therefore, it might be better to create only one instance of String outside the repeatedMethod and use only that one every time.
What you are doing is right but for the wrong reason.
When you do anotherMethod("myString"), no new instance of String is actually going to be created: it might reuse a String from the String constant pool (refer to this question).
However, factoring common String values as constants (i.e. as private static final) is a good practice: if that constant ever needs to change, you only need to change it in one place of the source code (for example, if tomorrow, "myString" needs to become "myString2", you only have one modification to make)
String literals of the same text are identical, there won't be excessive object creation as you fear.
But it's good to put that string literal in a static final variable (a constant) with a descriptive name that documents the purpose of that string. It's generally a recommended practice to extract string literals to constants with good names.
This is especially true when the same string literal appears in more than one place in the code (a "magic string"), in which case it's strongly recommended to extract to a constant.
no, you don't need to do that ,there is a "constant pool" in the JVM ,for every inline string (ex:"myString") ,it will be treated as an constant variable implicitly, and every identical inline string will be put in the constant pool just once.
for example ,for
String i="test",j="test";
there will be just one instance of constant variable "test" in the constant pool.
also refer to
http://www.thejavageek.com/2013/06/19/the-string-constant-pool/
Optimize for clarify before worrying about performance. String literals are only created once, ever (unless the literal is unloaded) Performance is not only less important usually but irreverent in this case.
You should define a constant instead repeating the same String to make it clear these strings don't happen to be the same, but must be the same. If someone trying to maintain the code later has to modify one String, does this mean they should all be changed or not.
BTW When you optimize for clarity you are also often optimizing for performance. The JIT looks for common patterns and if you try to out smart the optimizer you are more likely to confuse it resulting in less optimal code.

Helping the JVM with stack allocation by using separate objects

I have a bottleneck method which attempts to add points (as x-y pairs) to a HashSet. The common case is that the set already contains the point in which case nothing happens. Should I use a separate point for adding from the one I use for checking if the set already contains it? It seems this would allow the JVM to allocate the checking-point on stack. Thus in the common case, this will require no heap allocation.
Ex. I'm considering changing
HashSet<Point> set;
public void addPoint(int x, int y) {
if(set.add(new Point(x,y))) {
//Do some stuff
}
}
to
HashSet<Point> set;
public void addPoint(int x, int y){
if(!set.contains(new Point(x,y))) {
set.add(new Point(x,y));
//Do some stuff
}
}
Is there a profiler which will tell me whether objects are allocated on heap or stack?
EDIT: To clarify why I think the second might be faster, in the first case the object may or may not be added to the collection, so it's not non-escaping and cannot be optimized. In the second case, the first object allocated is clearly non-escaping so it can be optimized by the JVM and put on stack. The second allocation only occurs in the rare case where it's not already contained.
Marko Topolnik properly answered your question; the space allocated for the first new Point may or may not be immediately freed and it is probably foolish to bank on it happening. But I want to expand on why you're currently in a deep state of sin:
You're trying to optimise this the wrong way.
You've identified object creation to be the bottleneck here. I'm going to assume that you're right about this. You're hoping that, if you create fewer objects, the code will run faster. That might be true, but it will never run very fast as you've designed it.
Every object in Java has a pretty fat header (16 bytes; an 8-byte "mark word" full of bit fields and an 8-byte pointer to the class type) and, depending on what's happened in your program thus far, possibly another pretty fat trailer. Your HashSet isn't storing just the contents of your objects; it's storing pointers to those fat-headers-followed-by-contents. (Actually, it's storing pointers to Entry classes that themselves store pointers to Points. Two levels of indirection there.)
A HashSet lookup, then, figures out which bucket it needs to look at and then chases one pointer per thing in the bucket to do the comparison. (As one great big chain in series.) There probably aren't very many of these objects, but they almost certainly aren't stored close together, making your cache angry. Note that object allocation in Java is extremely cheap---you just increment a pointer---and that this is quite probably a bigger source of slowness.
Java doesn't provide any abstraction like C++'s templates, so the only real way to make this fast and still provide the Set abstraction is to copy HashSet's code, change all of the data structures to represent your objects inline, modify the methods to work with the new data structures, and, if you're still worried, make copies of the relevant methods that take a list of parameters corresponding to object contents (i.e. contains(int, int)) that do the right thing without constructing a new object.
This approach is error-prone and time-consuming, but it's necessary unfortunately often when working on Java projects where performance matters. Take a look at the Trove library Marko mentioned and see if you can use it instead; Trove did exactly this for the primitive types.
With that out of the way, a monomorphic call site is one where only one method is called. Hotspot aggressively inlines calls from monomorphic call sites. You'll notice that HashSet.contains punts to HashMap.containsKey. You'd better pray for HashMap.containsKey to be inlined since you need the hashCode call and equals calls inside to be monomorphic. You can verify that your code is being compiled nicely by using the -XX:+PrintAssembly option and poring over the output, but it's probably not---and even if it is, it's probably still slow because of what a HashSet is.
As soon as you have written new Point(x,y), you are creating a new object. It may happen not to be placed on the heap, but that's just a bet you can lose. For example, the contains call should be inlined for the escape analysis to work, or at least it should be a monomorphic call site. All this means that you are optimizing against a quite erratic performance model.
If you want to avoid allocation the solid way, you can use Trove library's TLongHashSet and have your (int,int) pairs encoded as single long values.

Passing big objects references instead of small objects to methods have any differences in processing or memory consumption?

I have a coding dilemma, and I don't know if there's a pattern or practice that deals with it. Whenever I have to pass some values to a method, most times I try to pass only the needed objects, instead of passing the objects which are being composed by them.
I was discussing with a friend about how Java manages heap and memory stuff and we didn't get anywhere.
Let me give two examples:
//Example 1:
private void method doSomething(String s, Car car, boolean isReal){...}
...
String s = myBigObject.getLabels.getMainName();
Car car = myBigObject.getCar();
boolean isReal = myBigObject.isRealCar();
doSomething(s, car, isReal);
//Example 2 - having in mind that BigObject is a really big object and I'll only use those 3 atributes:
private void method doSomething(BigObject bigObject){...}
...
doSomething(myBigObject);
In the 2nd example, it seems to me memory will be kind of wasted, passing a big object without really needing it.
Since Java passes only references to objects (and copies them, making it technically pass-by-value), there is no memory overhead for passing "big objects". Your Example 1 actually uses a little more memory.
However, there may still be good reason to do it that way: it removes a dependency and allows you to call doSomething on data that is not part of a BigObject. This may or may not be an advantage. If it gets called a lot with BigObject parameters, you'd have a lot of duplicate code extracting those values, which would not be good.
Note also that you don't have to assign return values to a local variable to pass them. You can also do it like this:
doSomething(myBigObject.getLabels().getMainName(),
myBigObject.getCar(),
myBigObject.isRealCar());
You're already only passing a reference to BigObject, not a full copy of BigObject. Java passes references by value.
Arguably, you're spending more memory the first way, not less, since you're now passing two references and a boolean instead of a single reference.
Java uses pass by value, when ever we pass an object to a method keep in mind that we are not going to pass all the values store in side the object we just pass the bits( some thing like this ab06789c) which is the value of the address on which the object is stored in memory(Heap Memory). So you are wasting more memory in first case rather than the 2nd one. Refer to JAVA pass-by-reference or pass-by-memory
All references are the same size, so how could it use more memory? It doesn't.

java - is initializing a temporary variable for simple getters better or not?

A very unimportant question about Java performance, but it made me wondering today.
Say I have simple getter:
public Object getSomething() {
return this.member;
}
Now, say I need the result of getSomething() twice (or more) in some function/algorithm. My question: is there any difference in either calling getSomething() twice (or more) or in declaring a temporary, local variable and use this variable from then on?
That is, either
public void algo() {
Object o = getSomething();
... use o ...
}
or
public void algo() {
... call getSomething() multiple times ...
}
I tend to mix both options, for no specific reason. I know it doesn't matter, but I am just wondering.
Thanks!
Technically, it's faster to not call the method multiple times, however this might not always be the case. The JVM might optimize the method calls to be inline and you won't see the difference at all. In any case, the difference is negligible.
However, it's probably safer to always use a getter. What if the value of the state changes between your calls? If you want to use a consistent version, then you can save the value from the first call. Otherwise, you probably want to always use the getter.
In any case, you shouldn't base this decision on performance because it's so negligible. I would pick one and stick with it consistently. I would recommend always going through your getters/setters.
Getters and setters are about encapsulation and abstraction. When you decide to invoke the getter multiple times, you are making assumptions about the inner workings of that class. For example that it does no expensive calculations, or that the value is not changed by other threads.
I'd argue that its better to call the getter once and store its result in a temporary variable, thus allowing you to freely refactor the implementing class.
As an anecdote, I was once bitten by a change where a getter returned an array, but the implementing class was changed from an array property to using a list and doing the conversion in the getter.
The compiler should optimize either one to be basically the same code.

Categories