Let's look at the folloing code snippet:
String s1 = "Hello";
String s2 = "Hello";
Both variables refer to the same object due to interning. Since strings are immutable, only one object is created and both refer to the same object.
A constant pool is also something, which holds all the constants (integer, string, etc.) that are declared in a class. It is specific to each class.
System.out.println("Hello"); // I believe this Hello is different from above.
Questions:
Does string pool refer to the pool of a constant string object in the constant pool?
If yes, is String pool common throughout the whole application or specific to a class?
My questions are,
Does string pool refers to the pool of constant string object in the constant pool?
No.
"Constant pool" refers to a specially formatted collection of bytes in a class file that has meaning to the Java class loader. The "strings" in it are serialized, they are not Java objects. There are also many kinds of constants, not just strings in it.
See Chapter 4.4 the constant pool table
Java Virtual Machine instructions do not rely on the run-time layout of classes, interfaces, class instances, or arrays. Instead, instructions refer to symbolic information in the constant_pool table.
In contrast, the "String pool" is used at runtime (not just during class loading), contains only strings, and the "strings" in the string pool are java objects.
The "string pool" is a thread-safe weak-map from java.lang.String instances to java.lang.String instances used to intern strings.
Chapter 3.10.5. String Literals says
A string literal is a reference to an instance of class String (§4.3.1, §4.3.3).
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
There is only one string pool, and all string literals are automatically interned.
Also, there are other pools for autoboxing and such.
The constant pool is where those literals are put for the class.
constans_pool(all constans, including Strings) is a data structure in class file(out of JVM).
When class file is loaded into JVM, then constans_pool -> run-time constans_pool(General), in hotspot & SE8:
Strings in constans_pool will be stored in Heap, and we call it string-pool; https://openjdk.org/jeps/122 https://wiki.openjdk.org/display/HotSpot/Caching+Java+Heap+Objects
the other data in constans_pool will be stored in native-memory(Metaspace),and we call it run-time constans_pool(Special).
Related
I have some questions revolving around the garbage collection of string objects and literals and the string pool.
Setup
Looking at a code snippet, such as:
// (I am using this constructor on purpose)
String text = new String("hello");
we create two string objects:
"hello" creates one and puts it into the string pool
new String(...) creates another, stored on the heap
Garbage collection
Now, if text falls out of scope and nobody references them anymore, it can be garbage collected, right?
But what about the literal in the pool? If it is not referenced by anyone anymore, can it be garbage collected as well? If not, why?
When we create a String via the new operator, the Java compiler will create a new object and store it in the heap space reserved for the JVM.
To be more specific, it will NOT be in the String Pool, which is a specialized part of the (heap) memory.
String text = new String("hello");
As soon as there is no more reference to the object it is eligible for GC.
In contrast, the following would be stored in the string pool:
String a = "hello";
When we call a similar line again:
String b = "hello";
The same object will be used from the String Pool, and it will never be eligible for GC.
As to why:
To reduce the memory needed to hold all the String literals (and the
interned Strings), since these literals have a good chance of being
used many times over.
The specification does not mandate a behavior. All it requires, is that all string literals (and string-typed compile-time constants in general) expressing the same string, evaluate to the same object at runtime.
JLS §3.10.5:
At run time, a string literal is a reference to an instance of class String (§4.3.3) that denotes the string represented by the string literal.
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.29) - are "interned" so as to share unique instances, as if by execution of the method String.intern (§12.5).
Its also repeated in JLS §15.29:
Constant expressions of type String are always "interned" so as to share unique instances, using the method String.intern
This implies that each Java implementation maintains a pool at runtime which can be used to look up the canonical instance of the string. But the pool doesn’t have to hinder garbage collection. If no other reference to the object exists, the string instance could be garbage collected, as the fact that a new string instance has to be constructed when necessary, is unobservable.
Note that when you add strings to the pool manually, by invoking intern(), the string instances may indeed get garbage collected when otherwise being unreachable.
But in practice, the common implementations, like the HotSpot JVM associate a reference from the code location to the string instance after the first execution, so the object is referenced by the code containing the string literal or compile-time constant. So, the object associated with the string literal can only get garbage collected, when the class itself gets garbage collected. This is only possible when its defining class loader and in turn, all other classes defined by this loader are unreachable too.
For the application class loader, this is impossible. Class unloading can only happen for additional class loader created at runtime. Then, the string instances created for compile-time constants within classes loaded by this class loader may get garbage collected, if not matching constants in other code.
By reading Oracle JVM architecture document:
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html
A run-time constant pool is a per-class or per-interface run-time
representation of the constant_pool table in a class file (§4.4).
I understand that for each class, it has a runtime constant pool (please correct me if I am wrong).
However, what I am confused is that if I have two different classes A and B and each class has a private String variable say String value = "abc".
if I compare A.value with B.value using == rather than equals, I will get a true which make me think that "abc" in both A and B are in the same runtime constant pool? Could someone point me out where I am wrong ?
This is a preemptive optimization that the JLS superimposes.
From JLS 7, §3.10.5 (formatting mine)
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
However, note that this is only true of String literals and constant expressions. Dynamically constructed strings (e.g. x + y for Strings x and y) are not automatically interned to share the same unique instances. As a result, you will still have to use .equals in general unless you can guarantee that your operands are constant expressions.
This is because '==' is comparing references. Objects of both A and B have different String value variables (and so each class' constant pool has a separate entry for it); but they are both initialized to the same value. The compiler/JVM is most likely optimizing for space by having them both point to the same compile-time constant value in the bytecode. The '==' operator is NOT comparing constant pool locations.
Edit: to clear up some confusion, this does NOT mean that "==" can be used for string comparison. All I was saying was that it cannot be used to compare constant pool location either. It is for one thing and one thing only: comparing whether two references point to the same object. The situation in the question will SOMETIMES result in == returning true, but sometimes not. It depends on decisions the compiler and JVM make (or depending on what the JSL says as an astute answerer has said).
I have seen many questions regarding object created using string literal and new keyword like:
How many String objects using new operator
But it doesn't clarify my doubts.
Case 1: String object using string literal.
It creates one object in string constant pool if,it is not present otherwise, return the reference of this object.This object is implicitly interned.
Case 2:String object using new().
it creates 2 objects,one in string constant pool and another one in heap area.Reference variable refer to the heap area object.For this object we need to call intern method to put this object into string constant pool explicitly.
My question is if new() already creates one object in string constant pool then, what is use of calling intern method on the object which is there in heap area?
Case 2:String object using new(). it creates 2 objects,one in string constant pool and another one in heap area.
Only if you create a new String object by passing it a string literal, like this:
String s = new String("hello");
The literal "hello" will cause an object in the string constant pool to be created. The new String will create a new String object on the heap, with a copy of the content of the object for the literal.
You should never create String object like that, because it's unnecessary and inefficient.
There are however other reasons why you would want to do new String(...), when the value that you pass to the constructor is not a string literal. For example, the value is data read from a file.
Case 1: String object using string literal. It creates one object in string constant pool
Correct.
if,it is not present
Wrong. It is present.
otherwise, return the reference of this object.
It always return the reference of the object. No 'otherwise' about it.
This object is implicitly interned.
Not really. It is already interned, because it is a string literal. The compiler and class loader see to that. Not thenew operator.
Case 2:String object using new(). it creates 2 objects,one in string constant pool
Not really. It was already there: see above.
and another one in heap area.
Correct.
Reference variable refer to the heap area object.For this object we need to call intern method to put this object into string constant pool explicitly.
Correct.
My question is if new() already creates one object in string constant pool
It doesn't. See above.
When I define a StringBuffer variable with new, this string is not added to the String pool, right?
Now, when I define another StringBuffer but not with new, I define it as StrPrev.append("XXX") suddenly it is.(or so says my college teacher). Why is that? What makes this string to suddenly become a string-pool string?
When I define a StringBuffer variable with new, this string is not added to the String pool, right?
Creating a StringBuffer does not create a String at all.
Now, when I define another StringBuffer but not with new, I define it as StrPrev.append("XXX") suddenly it is.
This is totally confused:
When you call strBuff.append("XXX") you are NOT defining a new StringBuffer. You are updating the existing StringBuffer that strBuff refers to. Specifically, you are adding extra characters to the end of the buffer.
You only get a new String from the StringBuffer when you call strBuff.toString().
You only add a String to the string pool when you call intern() on the String. And that only adds the string to the pool if there is not already an equal string in the pool.
The String object that represents the literal "XXX" is a member of the string pool. But that happens (i.e. the String is added to the pool) when the class is loaded, not when you execute the append call.
(If you teacher told you that StringBuffer puts strings into the Java string pool, he / she is wrong. But, given your rather garbled description, I suspect that you actually misheard or misunderstood what your teacher really said.)
"XXX" in StrPrev.append("XXX") is a string literal that is interned at class loading time (class loading time of the class that contains the code).
"XXX" is not added to the pool by the StringBuffer.
From the JLS section 3.10.5:
Moreover, a string literal always refers to the same instance of class
String. This is because string literals - or, more generally, strings
that are the values of constant expressions (§15.28) - are "interned"
so as to share unique instances
From the JLS section 12.5:
Loading of a class or interface that contains a String literal
(§3.10.5) may create a new String object to represent that literal.
(This might not occur if the same String has previously been interned
(§3.10.5).)
buf.append("XXX") followed by buf.toString(), and then returning the string to the pool. With the pool in place, only one StringBuffer object is ever allocated.
Actually your teacher is referring to XXX . which goes to StringPool because all string literals written in java program goes to StringPool while execution...
Java has string pool, due to which objects of string class are immutable.
But my question stands -
What was the need to make String POOL?
Why string class was not kept like other class to hold its own values?
Is internally JVM need some strings or is this a performance benefit. If yes how?
A pool is possible because the strings are immutable. But the immutability of the String hasn't been decided only because of this pool. Immutability has numerous other benefits. BTW, a Double is also immutable, and there is no pool of Doubles.
The need for the String pool is to reduce the memory needed to hold all the String literals (and the interned Strings) a program uses, since these literals have a good chance of being used many times, in many places of the program. Instead of having thousands of copies of the same String literal, you just have thousand references to the same String, which reduces the memory usage.
Note that the String class is not different from other classes: it holds its own char array. It may also share it with other String instances, though, when substring is called.
When we compiler see's that a new String literal has to be created,it first check's the pool for an identical string,if found no new String literal is created,the existing String is referred.
the benifit of making string as immutable was for the security feature. Read below
Why String has been made immutable in Java?
Though, performance is also a reason (assuming you are already aware of the internal String pool maintained for making sure that the same String object is used more than once without having to create/re-claim it those many times), but the main reason why String has been made immutable in Java is 'Security'. Surprised? Let's understand why.
Suppose you need to open a secure file which requires the users to authenticate themselves. Let's say there are two users named 'user1' and 'user2' and they have their own password files 'password1' and 'password2', respectively. Obviously 'user2' should not have access to 'password1' file.
As we know the filenames in Java are specified by using Strings. Even if you create a 'File' object, you pass the name of the file as a String only and that String is maintained inside the File object as one of its members.
Had String been mutable, 'user1' could have logged into using his credentials and then somehow could have managed to change the name of his password filename (a String object) from 'password1' to 'password2' before JVM actually places the native OS system call to open the file. This would have allowed 'user1' to open user2's password file. Understandably it would have resulted into a big security flaw in Java. I understand there are so many 'could have's here, but you would certainly agree that it would have opened a door to allow developers messing up the security of many resources either intentionally or un-intentionally.
With Strings being immutable, JVM can be sure that the filename instance member of the corresponding File object would keep pointing to same unchanged "filename" String object. The 'filename' instance member being a 'final' in the File class can anyway not be modified to point to any other String object specifying any other file than the intended one (i.e., the one which was used to create the File object).
What was the need to make String POOL?
When created, a String object is stored in heap, and the String literal, that is sent in the constructor, is stored in SP. Thats why using String objects is not a good practice. Becused it creates two objects.
String str = new String("stackoverflow");
Above str is saved in heap with the reference str, and String literal from the constructor -"stackoverflow" - is stored in String Pool. And that is bad for performance. Two objects are created.
The flow: Creating a String literal -> JVM looks for the value in the String Pool as to find whether same value exists or not (no object to be returned) -> The value is not find -> The String literal is created as a new object (internally with the new keyword) -> But now is not sent to the heap , it is send instead in String Pool.
The difference consist where the object is created using new keyword. If it is created by the programmer, it send the object in the heap, directly, without delay. If it is created internally it is sent to String Poll. This is done by the method intern(). intern() is invoke internally when declaring a String literal. And this method is searching SP for identical value as to return the reference of an existing String object or/and to send the object to the SP.
When creating a String obj with new, intern() is not invoked and the object is stored in heap. But you can call intern() on String obj's: String str = new String().intern(); now the str object will be stored in SP.
ex:
String s1 = new String("hello").intern();
String s2 = "hello";
System.out.println(s1 == s2); // true , because now s1 is in SP