Reclaim first reference of Immutable String - java

I see many Q&A about Immutable String saying that JVM actually create a new reference when we do the following:
String text = "apple";
text = "orange"; // a new reference is created
My question is, what happen to the previous reference "apple"? Since Java Garbage Collection is automatic, does it means that there is no intentional way to re-claim the memory?
EDIT:
The reason I am asking this question is that I would like to know how should I handle String variables in future.
Does String Literals get cleared by GC? If not, wouldn't the pool going to get so huge until a point where it goes out of memory? Considering if the program receives different string values from a textbox on the UI, each different values that the user enters are going to add on to the pool.

There is no way to intentionally reclaim the memory even with System.gc() (which is just a suggestion to the JVM).
Even when garbage collection runs, "apple" won't necessarily be reclaimed.
According JLS 3.10.5, string literals are interned in a string pool and thus never garbage collected.
Quoting:
A string literal is a reference to an instance of class String (§4.3.1, §4.3.3).
Moreover, a string literal always refers to the same instance of class
String.
This is because string literals - or, more generally, strings
that are the values of constant expressions (§15.28) - are "interned"
so as to share unique instances, using the method String.intern.
EDIT
According to this answer, even interned Strings can be garbage collected.

No, you you can't force the GC to run. One thing it is important to realize is the "apple" String won't be destroyed. It was declared as a literal String, so it will go the String pool.

There is no way to explicitly reclaim a completely dereferenced object. You can call System.gc();, but that's merely a suggestion to perform a gc and not a guarantee that a gc will be performed

Related

Are string objects and their literals garbage collected?

I have some questions revolving around the garbage collection of string objects and literals and the string pool.
Setup
Looking at a code snippet, such as:
// (I am using this constructor on purpose)
String text = new String("hello");
we create two string objects:
"hello" creates one and puts it into the string pool
new String(...) creates another, stored on the heap
Garbage collection
Now, if text falls out of scope and nobody references them anymore, it can be garbage collected, right?
But what about the literal in the pool? If it is not referenced by anyone anymore, can it be garbage collected as well? If not, why?
When we create a String via the new operator, the Java compiler will create a new object and store it in the heap space reserved for the JVM.
To be more specific, it will NOT be in the String Pool, which is a specialized part of the (heap) memory.
String text = new String("hello");
As soon as there is no more reference to the object it is eligible for GC.
In contrast, the following would be stored in the string pool:
String a = "hello";
When we call a similar line again:
String b = "hello";
The same object will be used from the String Pool, and it will never be eligible for GC.
As to why:
To reduce the memory needed to hold all the String literals (and the
interned Strings), since these literals have a good chance of being
used many times over.
The specification does not mandate a behavior. All it requires, is that all string literals (and string-typed compile-time constants in general) expressing the same string, evaluate to the same object at runtime.
JLS §3.10.5:
At run time, a string literal is a reference to an instance of class String (§4.3.3) that denotes the string represented by the string literal.
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.29) - are "interned" so as to share unique instances, as if by execution of the method String.intern (§12.5).
Its also repeated in JLS §15.29:
Constant expressions of type String are always "interned" so as to share unique instances, using the method String.intern
This implies that each Java implementation maintains a pool at runtime which can be used to look up the canonical instance of the string. But the pool doesn’t have to hinder garbage collection. If no other reference to the object exists, the string instance could be garbage collected, as the fact that a new string instance has to be constructed when necessary, is unobservable.
Note that when you add strings to the pool manually, by invoking intern(), the string instances may indeed get garbage collected when otherwise being unreachable.
But in practice, the common implementations, like the HotSpot JVM associate a reference from the code location to the string instance after the first execution, so the object is referenced by the code containing the string literal or compile-time constant. So, the object associated with the string literal can only get garbage collected, when the class itself gets garbage collected. This is only possible when its defining class loader and in turn, all other classes defined by this loader are unreachable too.
For the application class loader, this is impossible. Class unloading can only happen for additional class loader created at runtime. Then, the string instances created for compile-time constants within classes loaded by this class loader may get garbage collected, if not matching constants in other code.

Java String Pool with String constructor and the intern function

I learned about the Java String Pool recently, and there's a few things that I don't quiet understand.
When using the assignment operator, a new String will be created in the String Pool if it doesn't exist there already.
String a = "foo"; // Creates a new string in the String Pool
String b = "foo"; // Refers to the already existing string in the String Pool
When using the String constructor, I understand that regardless of the String Pool's state, a new string will be created in the heap, outside of the String Pool.
String c = new String("foo"); // Creates a new string in the heap
I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap.
String d = new String("bar"); // Creates a new string in the String Pool and in the heap
I didn't find any further information about this, but I would like to know if that's true.
If that is indeed true, then - why? Why does java create this duplicate string? It seems completely redundant to me since the strings in java are immutable.
Another thing that I would like to know is how the .intern() function of the String class works: Does it just return a pointer to the string in the String Pool?
And finally, in the following code:
String s = new String("Hello");
s = s.intern();
Will the garbage collector delete the string that is outside the String Pool from the heap?
You wrote
String c = new String("foo"); // Creates a new string in the heap
I read somewhere that even when using the constructor, the String Pool is being used. It
will insert the string into the String Pool and into the heap.
That’s somewhat correct, but you have to read the code correctly. Your code contains two String instances. First, you have the string literal "foo" that evaluates to a String instance, the one that will be inserted into the pool. Then, you are creating a new String instance explicitly, using new String(…) calling the String(String) constructor. Since the explicitly created object can’t have the same identity as an object that existed prior to its creation, two String instances must exist.
Why does java create this duplicate string? It seems completely redundant to me since the strings in java are immutable.
Well it does so, because you told it so. In theory, this construction could get optimized, skipping the intermediate step that you can’t perceive anyway. But the first assumption for a program’s behavior should be that it does precisely what you have written.
You could ask why there’s a constructor that allows such a pointless operation. In fact, this has been asked before and this answer addresses this. In short, it’s mostly a historical design mistake, but this constructor has been used in practice for other technical reasons; some do not apply anymore. Still, it can’t be removed without breaking compatibility.
String s = new String("Hello");
s = s.intern();
Will the garbage collector delete the string that is outside the String Pool from the heap?
Since the intern() call will evaluate to the instance that had been created for "Hello" and is distinct from the instance created via new String(…), the latter will definitely be unreachable after the second assignment to s. Of course, this doesn’t say whether the garbage collector will reclaim the string’s memory only that it is allowed to do so. But keep in mind that the majority of the heap occupation will be the array that holds the character data, which will be shared between the two string instances (unless you use a very outdated JVM). This array will still be in use as long as either of the two strings is in use. Recent JVMs even have the String Deduplication feature that may cause other strings of the same contents in the JVM use this array (to allow collection of their formerly used array). So the lifetime of the array is entirely unpredictable.
Q: I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap. [] I didn't find any further information about this, but I would like to know if that's true.
It is NOT true. A string created with new is not placed in the string pool ... unless something explicitly calls intern() on it.
Q: Why does java create this duplicate string?
Because the JLS specifies that every new generates a new object. It would be counter-intuitive if it didn't (IMO).
The fact that it is nearly always a bad idea to use new String(String) is not a good reason to make new behave differently in this case. The real answer is that programmers should learn not to write that ... except in the extremely rare cases that that it is necessary to do that.
Q: Another thing that I would like to know is how the intern() function of the String class works: Does it just return a pointer to the string in the String Pool?
The intern method always returns a pointer to a string in the string pool. That string may or may not be the string you called intern() or.
There have been different ways that the string pool was implemented.
In the original scheme, interned strings were held in a special heap call the PermGen heap. In that scheme, if the string you were interning was not already in the pool, then a new string would be allocated in PermGen space, and the intern method would return that.
In the current scheme, interned strings are held in the normal heap, and the string pool is just a (private) data structure. When the string being interned a not in the pool, it is simply linked into the data structure. A new string does not need to be allocated.
Q: Will the garbage collector delete the string that is outside the String Pool from the heap?
The rule is the same for all Java objects, no matter how they were created, and irrespective of where (in which "space" or "heap" in the JVM) they reside.
If an object is not reachable from the running application, then it is eligible for deletion by the garbage collector.
That doesn't mean that an unreachable object will be be garbage collected in any particular run of the GC. (Or indeed ever ... in some circumstances.)
The above rule equally applies to the String objects that correspond to string literals. If it ever becomes possible that a literal can never be used again, then it may be garbage collected.
That doesn't normally happen. The JVM keeps a hidden references to each string literal object in a private data structure associated with the class that defined it. Since classes normally exists for the lifetime of the JVM, their string literal objects remain reachable. (Which makes sense ... since the application may need to use them.)
However, if a class is loaded using a dynamically created classloader, and that classloader becomes unreachable, then so will all of its classes. So it is actually possible for a string literal object to become unreachable. If it does, it may be garbage collected.

Java: How exactly String intern() and StringPool works?

According to Javadoc about String.intern():
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
I have few questions about the same.
When a new String object (not using a string literal but using new() operator) is created like:
String str = new String("Test");
Question: I am aware that a new object will be created in heap. But will it also put String Test into the stringpool during object creation? If yes, then why the reference is not returned directly for the stringpool. If no, why not directly put the string in the pool as now the StringPool has been moved out of the PermGen and is in regular heap space (i.e. there is no space constraint apart from the heap space limit). There are some posts which state that the String is inserted in pool as soon as object is created whereas there are posts which contradicts this too.
Once we call String.intern() on a String object (as literals are already interned) what happens to the space allocated to the object? Is it reclaimed at the same moment or it waits for the next GC cycle?
Accepted answer to another question on SO, states that String intern should be used when you need speed since you can compare strings by reference (== is faster than equals).
Question: I am aware that when using String.intern() it returns reference to the string already present in the StringPool. But this requires a full scan lookup on the StringPool which can be an expensive operation in itself. So is this speed achieved during string comparison justifiable? If so, why?
I have looked at below sources:
JavaDoc
SO Question ques1, ques2, ques3
http://java-performance.info/string-intern-in-java-6-7-8/
And other misc sources from SO and outside world
All string literals are interned on compilation time. Using a string literal with the single argument constructor taking a string is a bit of an abuse of that constructor, hence you are likely to get two of them (but maybe there is a special compiler case for this, I can't say for sure). As of java 8 the implementation of the constructor (for openjdk) is this:
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}
So no special treatment on this side. If you know the literal don't use this constructor.
I don't think there is any special GC semantics for Strings. It will get collected once it's unreachable and deemed collection worthy by the GC as any other object.
Don't ever use == for comparing strings, the first step in the default equals method for Strings is doing just that. If this is your dominant case (you know you are working with interned strings most of the time) you are only paying the overhead of a method call which is tiny, the potential for future bugs you add by doing something like that is just too big of a risk for a gain that is minuscule.

Will a substring of another string prevent the parent string from being garbage collected?

String samplel = "ToBeGarbageCollected";
String sample2 = samplel.substring(0, 1);
samplel = null;
I know substring internally will keep a reference for original String.
But by explicitly defining samplel as null, will sample1 and sample2 be available for garbage Collection?
I remember seeing somewhere if a parent object is explicitly set to null all child values are available for garbage collection. Will this hold good for the above?
I am just curious if this the parent child relationship scenario? If not, will this cause sample1 or sample2 to be available for garbage collection?
String samplel = "ToBeGarbageCollected";
String sample2 = new String(samplel .substring(0, 1));
samplel = null;
First thing to say is that garbage collection doesn't happen immediately. So assigning null to anything does not / cannot cause garbage collection. What is may do is to cause an object to become unreachable ... and that will make it a potential candidate for garbage collection in a future GC run.
Now to your specific examples.
Important Note: the following only applies to older JVMs; i.e. Java 7 update 5 and earlier. In Java 7 update 6, they changed String.substring() so that the target string and resulting substring DO NOT share the backing array. This eliminates the potential storage leak issue with substring.
The substring method doesn't put a reference to the original String in the new String. What it actually does is save a reference to the original String's backing array; i.e the array that holds the characters.
But having said that, assigning null to samplel is not sufficient to make the state of the entire original string unreachable. The original String's entire backing array will be remain reachable ... and that means it won't be a candidate for garbage collection.
But there is another complication. You set sample1 to a String literal, and the String object that represents a String literal is always reachable (unless the entire class gets unloaded!)
But by explicitly defining samplel as null, will sample1 and sample2 be available for garbage Collection?
The original sample1 object will remain fully reachable, and sample2 will remain be reachable unless that variable goes out of scope.
If sample1 had not been a literal and there were no other references to it, then the answer would be different. The sample1 object would be unreachable, but its backing array would still be reachable via sample2.
In your second example, copying the substring causes a new String to be created. And it is guaranteed that the new String won't share the backing array with the original String and the temporary substring. In that case, assigning null is unnecessary.
Will now both sample1 and sample2 be available for garbage Collection?
The answer is the same as for the above for the case where sample1 is a literal.
If sample1 is not a literal and there are no other references to it, then sample1 and the temporary substring would now be unreachable.
I just want to know where does String constructor be helpful.
In theory it will be.
In practice it depends on whether the references are still reachable when the GC eventually gets around to looking ... and also on whether the Strings in question are large enough and numerous enough to have a significant impact on memory usage.
And in practice, that precondition is usually NOT satisfied and creating a fresh String like that usually does NOT help.
Remember that in Java String is immutable. In this case, sample1 will be discarded, but sample2 never pointed to sample1: it pointed to a separately held immutable string in the JVM that was created at the latest when substring was called.
When you set sample1 to null, the memory it pointed to became available for garbage collection (assuming no other strings held the same value and no other variables were pointed at that location). When you use the new keyword (or implicitly do so through the assignment of a primitive) new memory is allocated on the heap (usually; again, strings are immutable and share the same memory). If no pointers (read: any named variables) point to a given location of memory, it is available for garbage collection.
Remember: in any case where there are no references to an object, it becomes available for garbage collection. Objects are not defined by the variable names assigned to them, but rather are locations in memory, and the variable names act as pointers (references) to those objects. Strings are somewhat different because they are immutable, and the JVM may opt not to garbage collect for reasons independent of references to them.

Garbage collection of String literals

I am reading about Garbage collection and i am getting confusing search results when i search for String literal garbage collections.
I need clarification on following points:
If a string is defined as literal at compile time [e.g: String str = "java"] then will it be garbage collected?
If use intern method [e.g: String str = new String("java").intern()] then will it be garbage collected? Also will it be treated differently from String literal in point 1.
Some places it is mentioned that literals will be garbage collected only when String class will be unloaded? Does it make sense because I don't think String class will ever be unloaded.
If a string is defined as literal at compile time [e.g: String str = "java";] then will it be garbage collected?
Probably not. The code objects will contain one or more references to the String objects that represent the literals. So as long as the code objects are reachable, the String objects will be to.
It is possible for code objects to become unreachable, but only if they were dynamically loaded ... and their classloader is destroyed.
If I use the intern method [e.g: String str = new String("java").intern()] then will it be garbage collected?
The object returned by the intern call will be the same object that represents the "java" string literal. (The "java" literal is interned at class loading time. When you then intern the newly constructed String object in your code snippet, it will lookup and return the previously interned "java" string.)
However, interned strings that are not identical with string literals can be garbage collected once they become unreachable. The PermGen space is garbage collected on all recent HotSpot JVMs. (Prior to Java 8 ... which drops PermGen entirely.)
Also will it be treated differently from string literal in point 1.
No ... because it is the same object as the string literal.
And indeed, once you understand what is going on, it is clear that string literals are not treated specially either. It is just an application of the "reachability" rule ...
Some places it is mentioned that literals will be garbage collected only when String class will be unloaded? Does it make sense because I don't think the String class will ever be unloaded.
You are right. It doesn't make sense. The sources that said that are incorrect. (It would be helpful if you posted a URL so that we can read what they are saying for ourselves ...)
Under normal circumstances, string literals and classes are all allocated into the JVM's permanent generation ("PermGen"), and usually won't ever be collected. Strings that are interned (e.g. mystring.intern()) are stored in a memory pool owned by the String class in permgen, and it was once the case that aggressive interning could cause a space leak because the string pool itself held a reference to every string, even if no other references existed. Apparently this is no longer true, at least as of JDK 1.6 (see, e.g., here).
For more on permgen, this is a decent overview of the topic. (Note: that link goes to a blog associated with a product. I don't have any association with the blog, the company, or the product, but the blog entry is useful and doesn't have much to do with the product.)
The literal string will remain in memory as long as the program is in memory.
str will be garbage collected, but the literal it is created from will not.
That makes perfect sense, since the string class is unloaded when the program is unloaded.
intern() method checks the availability of the object in String pool. If the object/literal is available then reference of it will be returned. If the literal is not there in the pool then object is loaded in the perm area (String pool) and then reference to it will be return. We have to use intern() method judiciously.

Categories