I have read many artiles regarding string interning.
If I create a String object
Method 1
String str= new String("test")
2 Objects are created one in heap and other in string pool.
Method 2 if method 1 is not executed
String str= new String("test").intern()
it will create a copy of string frpoom heap to string pool .How many objects will be created.I guess 3.One will be in heap ,other in pool and one "test" literal.
Which one will be eligible for GC in both cases.I have seen artilces that say 2 are getting created but i am unable to understand why?
Method 3
String s= new String("test")
String s1=s.intern()
It does the same thing except the s point to heap object and s1 to pool object and none of them are eligible for Gc.
Is my understanding correct???I am confused a lot on this concept.
If I create a String object
String str= new String("test")
Objects are created one in heap and other in string pool.
A String consists of two objects, the String and the char[] In some version of Java it could be a byte[] Or in fact a char[] which is later replaced by a byte[]. This means that 4, perhaps 5 objects could be created, unless the String for the string literal already exists, in which cases it is 2 for Java 7 update 4+, before that the char[] would be shared so it could be three objects or only 1.
String str= new String("test").intern()
This is exactly the same except, if this is called enough the new String could be allocated on the stack and you might find that only the char[]` is created and this cannot be placed on the stack, at the moment. In future this might be optimised away also.
Which one will be eligible for GC in both cases.I have seen artilces that say 2 are getting created but i am unable to understand why?
The answer is anywhere from 1 to 4 depending on the situation. All of there eligible for collection unless they are being strongly referenced somewhere.
String intern() method:
The most common methods for String comparison are the equals() and equalsIgnoreCase() methods. However, these methods may need large amount of memory for large sequence of characters. The Java String intern() method helps us to improve the performance of the comparison between two Strings.
The intern() method, when applied to a String object, returns a reference to this object (from the hash set of Strings that Java makes), that has the same contents as the original object. Thus, if a code uses the intern() method for several String objects, then our program will use significantly less memory , because it will reuse the references of the objects in the comparison between these Strings.
Keep in mind, that Java automatically interns String literals. This means that the intern() method is to be used on Strings that are constructed with new String().
Example:
JavaStringIntern.java
package com.javacodegeeks.javabasics.string;
public class JavaStringIntern {
public static void main(String[] args) {
String str1 = "JavaCodeGeeks";
String str2 = "JavaCodeGeeks";
String str3 = "JavaCodeGeeks".intern();
String str4 = new String("JavaCodeGeeks");
String str5 = new String("JavaCodeGeeks").intern();
System.out.println("Are str1 and str2 the same: " + (str1 == str2));
System.out.println("Are str1 and str3 the same: " + (str1 == str3));
System.out.println("Are str1 and str4 the same: " + (str1 == str4)); //this should be "false" because str4 is not interned
System.out.println("Are str1 and str4.intern() the same: " + (str1 == str4.intern())); //this should be "true" now
System.out.println("Are str1 and str5 the same: " + (str1 == str5));
}
}
Output:
Are str1 and str2 the same: true
Are str1 and str3 the same: true
Are str1 and str4 the same: false
Are str1 and str4.intern() the same: true
Are str1 and str5 the same: true
Points to note is
Interning is automatic for String literals, the intern() method is to be used on Strings constructed with new String()
The Strings (more specifically, string objects) will be garbage collected if they ever become unreachable, like any other java objects.
String literals typically are not candidates for garbage collection. There will be a implicit reference from the Object to that literal.
Reason for point#3 is If a literal is being used inside a method to build a String is reachable for as long as the method could be executed.
Related
What is String Interning in Java, when I should use it, and why?
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern()
Basically doing String.intern() on a series of strings will ensure that all strings having same contents share same memory. So if you have list of names where 'john' appears 1000 times, by interning you ensure only one 'john' is actually allocated memory.
This can be useful to reduce memory requirements of your program. But be aware that the cache is maintained by JVM in permanent memory pool which is usually limited in size compared to heap so you should not use intern if you don't have too many duplicate values.
More on memory constraints of using intern()
On one hand, it is true that you can remove String duplicates by
internalizing them. The problem is that the internalized strings go to
the Permanent Generation, which is an area of the JVM that is reserved
for non-user objects, like Classes, Methods and other internal JVM
objects. The size of this area is limited, and is usually much smaller
than the heap. Calling intern() on a String has the effect of moving
it out from the heap into the permanent generation, and you risk
running out of PermGen space.
--
From: http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html
From JDK 7 (I mean in HotSpot), something has changed.
In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
-- From Java SE 7 Features and Enhancements
Update: Interned strings are stored in main heap from Java 7 onwards. http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes
There are some "catchy interview" questions, such as why you get equals! if you execute the below piece of code.
String s1 = "testString";
String s2 = "testString";
if(s1 == s2) System.out.println("equals!");
If you want to compare Strings you should use equals(). The above will print equals because the testString is already interned for you by the compiler. You can intern the strings yourself using intern method as is shown in previous answers....
JLS
JLS 7 3.10.5 defines it and gives a practical example:
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
Example 3.10.5-1. String Literals
The program consisting of the compilation unit (§7.3):
package testPackage;
class Test {
public static void main(String[] args) {
String hello = "Hello", lo = "lo";
System.out.print((hello == "Hello") + " ");
System.out.print((Other.hello == hello) + " ");
System.out.print((other.Other.hello == hello) + " ");
System.out.print((hello == ("Hel"+"lo")) + " ");
System.out.print((hello == ("Hel"+lo)) + " ");
System.out.println(hello == ("Hel"+lo).intern());
}
}
class Other { static String hello = "Hello"; }
and the compilation unit:
package other;
public class Other { public static String hello = "Hello"; }
produces the output:
true true true true false true
JVMS
JVMS 7 5.1 says says that interning is implemented magically and efficiently with a dedicated CONSTANT_String_info struct (unlike most other objects which have more generic representations):
A string literal is a reference to an instance of class String, and is derived from a CONSTANT_String_info structure (§4.4.3) in the binary representation of a class or interface. The CONSTANT_String_info structure gives the sequence of Unicode code points constituting the string literal.
The Java programming language requires that identical string literals (that is, literals that contain the same sequence of code points) must refer to the same instance of class String (JLS §3.10.5). In addition, if the method String.intern is called on any string, the result is a reference to the same class instance that would be returned if that string appeared as a literal. Thus, the following expression must have the value true:
("a" + "b" + "c").intern() == "abc"
To derive a string literal, the Java Virtual Machine examines the sequence of code points given by the CONSTANT_String_info structure.
If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode code points identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String.
Otherwise, a new instance of class String is created containing the sequence of Unicode code points given by the CONSTANT_String_info structure; a reference to that class instance is the result of string literal derivation. Finally, the intern method of the new String instance is invoked.
Bytecode
Let's decompile some OpenJDK 7 bytecode to see interning in action.
If we decompile:
public class StringPool {
public static void main(String[] args) {
String a = "abc";
String b = "abc";
String c = new String("abc");
System.out.println(a);
System.out.println(b);
System.out.println(a == c);
}
}
we have on the constant pool:
#2 = String #32 // abc
[...]
#32 = Utf8 abc
and main:
0: ldc #2 // String abc
2: astore_1
3: ldc #2 // String abc
5: astore_2
6: new #3 // class java/lang/String
9: dup
10: ldc #2 // String abc
12: invokespecial #4 // Method java/lang/String."<init>":(Ljava/lang/String;)V
15: astore_3
16: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
19: aload_1
20: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
23: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
26: aload_2
27: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
30: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
33: aload_1
34: aload_3
35: if_acmpne 42
38: iconst_1
39: goto 43
42: iconst_0
43: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V
Note how:
0 and 3: the same ldc #2 constant is loaded (the literals)
12: a new string instance is created (with #2 as argument)
35: a and c are compared as regular objects with if_acmpne
The representation of constant strings is quite magic on the bytecode:
it has a dedicated CONSTANT_String_info structure, unlike regular objects (e.g. new String)
the struct points to a CONSTANT_Utf8_info Structure that contains the data. That is the only necessary data to represent the string.
and the JVMS quote above seems to say that whenever the Utf8 pointed to is the same, then identical instances are loaded by ldc.
I have done similar tests for fields, and:
static final String s = "abc" points to the constant table through the ConstantValue Attribute
non-final fields don't have that attribute, but can still be initialized with ldc
Conclusion: there is direct bytecode support for the string pool, and the memory representation is efficient.
Bonus: compare that to the Integer pool, which does not have direct bytecode support (i.e. no CONSTANT_String_info analogue).
Update for Java 8 or plus.
In Java 8, PermGen (Permanent Generation) space is removed and replaced by Meta Space. The String pool memory is moved to the heap of JVM.
Compared with Java 7, the String pool size is increased in the heap. Therefore, you have more space for internalized Strings, but you have less memory for the whole application.
One more thing, you have already known that when comparing 2 (referrences of) objects in Java, '==' is used for comparing the reference of object, 'equals' is used for comparing the contents of object.
Let's check this code:
String value1 = "70";
String value2 = "70";
String value3 = new Integer(70).toString();
Result:
value1 == value2 ---> true
value1 == value3 ---> false
value1.equals(value3) ---> true
value1 == value3.intern() ---> true
That's why you should use 'equals' to compare 2 String objects. And that's is how intern() is useful.
Since strings are objects and since all objects in Java are always stored only in the heap space, all strings are stored in the heap space. However, Java keeps strings created without using the new keyword in a special area of the heap space, which is called "string pool". Java keeps the strings created using the new keyword in the regular heap space.
The purpose of the string pool is to maintain a set of unique strings. Any time you create a new string without using the new keyword, Java checks whether the same string already exists in the string pool. If it does, Java returns a reference to the same String object and if it does not, Java creates a new String object in the string pool and returns its reference. So, for example, if you use the string "hello" twice in your code as shown below, you will get a reference to the same string. We can actually test this theory out by comparing two different reference variables using the == operator as shown in the following code:
String str1 = "hello";
String str2 = "hello";
System.out.println(str1 == str2); //prints true
String str3 = new String("hello");
String str4 = new String("hello");
System.out.println(str1 == str3); //prints false
System.out.println(str3 == str4); //prints false
== operator is simply checks whether two references point to the same object or not and returns true if they do. In the above code, str2 gets the reference to the same String object which was created earlier. However, str3 and str4 get references to two entirely different String objects. That is why str1 == str2 returns true but str1 == str3 and str3 == str4 return false .
In fact, when you do new String("hello"); two String objects are created instead of just one if this is the first time the string "hello" is used in the anywhere in program - one in the string pool because of the use of a quoted string, and one in the regular heap space because of the use of new keyword.
String pooling is Java's way of saving program memory by avoiding the creation of multiple String objects containing the same value. It is possible to get a string from the string pool for a string created using the new keyword by using String's intern method. It is called "interning" of string objects. For example,
String str1 = "hello";
String str2 = new String("hello");
String str3 = str2.intern(); //get an interned string obj
System.out.println(str1 == str2); //prints false
System.out.println(str1 == str3); //prints true
OCP Java SE 11 Programmer, Deshmukh
String interning is an optimization technique by the compiler. If you have two identical string literals in one compilation unit then the code generated ensures that there is only one string object created for all the instance of that literal(characters enclosed in double quotes) within the assembly.
I am from C# background, so i can explain by giving a example from that:
object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;
output of the following comparisons:
Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true
Console.WriteLine(obj == str2); // false !?
Note1:Objects are compared by reference.
Note2:typeof(int).Name is evaluated by reflection method so it does not gets evaluated at compile time. Here these comparisons are made at compile time.
Analysis of the Results:
1) true because they both contain same literal and so the code generated will have only one object referencing "Int32". See Note 1.
2) true because the content of both the value is checked which is same.
3) FALSE because str2 and obj does not have the same literal. See Note 2.
Java interning() method basically makes sure that if String object is present in SCP, If yes then it returns that object and if not then creates that objects in SCP and return its references
for eg: String s1=new String("abc");
String s2="abc";
String s3="abc";
s1==s2// false, because 1 object of s1 is stored in heap and other in scp(but this objects doesn't have explicit reference) and s2 in scp
s2==s3// true
now if we do intern on s1
s1=s1.intern()
//JVM checks if there is any string in the pool with value “abc” is present? Since there is a string object in the pool with value “abc”, its reference is returned.
Notice that we are calling s1 = s1.intern(), so the s1 is now referring to the string pool object having value “abc”.
At this point, all the three string objects are referring to the same object in the string pool. Hence s1==s2 is returning true now.
By using heap object reference, if we want to corresponding SCP object reference we should go for intern() method.
Example :
class InternDemo
{
public static void main(String[] args)
{
String s1=new String("smith");
String s2=s1.intern();
String s3="smith";
System.out.println(s2==s3);//true
}
}
intern flow chart
I read this when should we use intern method of string on string constants but still not very clear with String == compare also with intern(). I have a couple examples. Can someone help me understand this better.
String s1 = "abc";
String s2 = "abc";
String s3 = "abcabc";
String s4 = s1 + s2;
System.out.println(s3 == s4); // 1. why false ?
System.out.println(s3 == s4.intern()); // 2. why true ?
System.out.println(s4 == s1 + s2); // 3. why false ?
System.out.println(s4 == (s1 + s2).intern()); // 4. why false ?
System.out.println(s4.intern() == (s1 + s2).intern()); // 5. why true ?
There are quite a lot of answers here which exlain that, but let me give you another one.
A string is interned into the String literal pool only in two situations: when a class is loaded and the String was a literal or compile time constant. Otherwise only when you call .intern() on a String. Then a copy of this string is listed in the pool and returned. All other string creations will not be interned. String concatenation (+) is producing new instances as long as it is not a compile time constant expression*.
First of all: never ever use it. If you do not understand it you should not use it. Use .equals(). Interning strings for the sake of comparison might be slower than you think and unnecessarily filling the hashtable. Especially for strings with highly different content.
s3 is a string literal from the constant pool and therefore interned. s4 is a expression not producing an interned constant.
when you intern s4 it has the same content as s3 and is therefore the same instance.
same as s4, expression not a constant
if you intern s1+s2 you get the instance of s3, but s4 is still not s3
if you intern s4 it is the same instance as s3
Some more questions:
System.out.println(s3 == s3.intern()); // is true
System.out.println(s4 == s4.intern()); // is false
System.out.println(s1 == "abc"); // is true
System.out.println(s1 == new String("abc")); // is false
* Compile time constants can be expressions with literals on both sides of the concatenation (like "a" + "bc") but also final String variables initialized from constants or literals:
final String a = "a";
final String b = "b";
final String ab = a + b;
final String ab2 = "a" + b;
final String ab3 = "a" + new String("b");
System.out.println("ab == ab2 should be true: " + (ab == ab2));
System.out.println("a+b == ab should be true: " + (a+b == ab));
System.out.println("ab == ab3 should be false: " + (ab == ab3));
One thing you have to know is, that Strings are Objects in Java. The variables s1 - s4 do not point directly to the text you stored. It is simply a pointer which says where to find the Text within your RAM.
It is false because you compare the Pointers, not the actual text. The text is the same, but these two Strings are 2 completely different Objects which means they have diferent Pointers. Try printing s1 and s2 on the console and you will see.
Its true, because Java does some optimizing concerning Strings. If the JVM detects, that two different Strings share the same text, they will be but in something called "String Literal Pool". Since s3 and s4 share the same text they will also sahe the same slot in the "String Literal Pool". The inter()-Method gets the reference to the String in the Literal Pool.
Same as 1. You compare two pointers. Not the text-content.
As far as I know added values do not get stored in the pool
Same as 2. You they contain the same text so they get stored in the String Literal Pool and therefore share the same slot.
To start off with, s1, s2, and s3 are in the intern pool when they are declared, because they are declared by a literal. s4 is not in the intern pool to start off with. This is what the intern pool might look like to start off with:
"abc" (s1, s2)
"abcabc" (s3)
s4 does not match s3 because s3 is in the intern pool, but s4 is not.
intern() is called on s4, so it looks in the pool for other strings equaling "abcabc" and makes them one object. Therefore, s3 and s4.intern() point to the same object.
Again, intern() is not called when adding two strings, so it does not match from the intern() pool.
s4 is not in the intern pool so it does not match objects with (s1 + s2).intern().
These are both interned, so they both look in the intern pool and find each other.
This question already has answers here:
Comparing strings with == which are declared final in Java
(6 answers)
What is the difference between these two ways of initializing a String?
(3 answers)
Closed 7 years ago.
I am unable to recognize the difference in the following declarations of Strings in Java.
Suppose I am having two string
String str1="one";
String str2="two";
What is the difference between
String str3=new String(str1+str2);
and
String str3=str1+str2;
In both the above declarations, the content of str3 will be onetwo.
Suppose I create a new string
String str4="onetwo";
Then in none of the above declarations,
if(str4==str3) {
System.out.println("This is not executed");
}
Why are str3 and str4 not referring to the same object?
str1 + str2 for non-compilation-constant strings will be compiled into
new StringBuilder(str1).append(str2).toString(). This result will not be put, or taken from string pool (where interned strings go).
It is different story in case of "foo"+"bar" where compiler knows which values he works with, so he can concatenate this string once to avoid it at runtime. Such string literal will also be interned.
So String str3 = str1+str2; is same as
String str3 = new StringBuilder(str1).append(str2).toString();
and String str3 = new String(str1+str2); is same as
String str3 = new String(new StringBuilder(str1).append(str2).toString());
Again, strings produced as result of method (like substring, replace, toString) are not interned.
This means you are comparing two different instances (which store same characters) and that is why == returns false.
Java does not have memory of "how this variable got the value", therefore it really does not matter which method you use, if the result is same.
About comparing, if you compare strings with ==, you are comparing address of objects in memory, because String is not primitive data type, not values. You have to use if(str4.equals(str3))
Because Strings in Java are immutable the compiler will optimize and reuse String literals. Thus
String s1 = "one";
String s2 = "one";
s1 == s2; //true because the compiler will reuse the same String (with the same memory address) for the same string literal
s1 == "o" + "ne"; //true because "Strings computed by constant expressions are computed at compile time and then treated as if they were literals"
s3 = "o";
s1 == s3 + "ne"; //false because the second string is created a run time and is therefore newly created
for a reference see http://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.10.5
Strings are kind of tricky, because there is some effort to share their representation. Plus they're immutable.
The short answer is: unless you're really working in a low level, you should never compare strings using "==".
Even if it works for you, it will be a nightmare for your teammates to maintain.
For a longer answer and a bit of amusement, try the following:
String s1= "a" + "b";
String s2= "a" + "b";
String s3=new String("a"+"b");
System.out.println(s1==s2);
System.out.println(s3==s2);
You'll notice that s1==s2 due to the compiler's effort to share.
However s2 != s3 because you've explicitly asked for a new string.
You're not likely to do anything very smart with it, because it's immutable.
What is String Interning in Java, when I should use it, and why?
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern()
Basically doing String.intern() on a series of strings will ensure that all strings having same contents share same memory. So if you have list of names where 'john' appears 1000 times, by interning you ensure only one 'john' is actually allocated memory.
This can be useful to reduce memory requirements of your program. But be aware that the cache is maintained by JVM in permanent memory pool which is usually limited in size compared to heap so you should not use intern if you don't have too many duplicate values.
More on memory constraints of using intern()
On one hand, it is true that you can remove String duplicates by
internalizing them. The problem is that the internalized strings go to
the Permanent Generation, which is an area of the JVM that is reserved
for non-user objects, like Classes, Methods and other internal JVM
objects. The size of this area is limited, and is usually much smaller
than the heap. Calling intern() on a String has the effect of moving
it out from the heap into the permanent generation, and you risk
running out of PermGen space.
--
From: http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html
From JDK 7 (I mean in HotSpot), something has changed.
In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
-- From Java SE 7 Features and Enhancements
Update: Interned strings are stored in main heap from Java 7 onwards. http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes
There are some "catchy interview" questions, such as why you get equals! if you execute the below piece of code.
String s1 = "testString";
String s2 = "testString";
if(s1 == s2) System.out.println("equals!");
If you want to compare Strings you should use equals(). The above will print equals because the testString is already interned for you by the compiler. You can intern the strings yourself using intern method as is shown in previous answers....
JLS
JLS 7 3.10.5 defines it and gives a practical example:
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
Example 3.10.5-1. String Literals
The program consisting of the compilation unit (§7.3):
package testPackage;
class Test {
public static void main(String[] args) {
String hello = "Hello", lo = "lo";
System.out.print((hello == "Hello") + " ");
System.out.print((Other.hello == hello) + " ");
System.out.print((other.Other.hello == hello) + " ");
System.out.print((hello == ("Hel"+"lo")) + " ");
System.out.print((hello == ("Hel"+lo)) + " ");
System.out.println(hello == ("Hel"+lo).intern());
}
}
class Other { static String hello = "Hello"; }
and the compilation unit:
package other;
public class Other { public static String hello = "Hello"; }
produces the output:
true true true true false true
JVMS
JVMS 7 5.1 says says that interning is implemented magically and efficiently with a dedicated CONSTANT_String_info struct (unlike most other objects which have more generic representations):
A string literal is a reference to an instance of class String, and is derived from a CONSTANT_String_info structure (§4.4.3) in the binary representation of a class or interface. The CONSTANT_String_info structure gives the sequence of Unicode code points constituting the string literal.
The Java programming language requires that identical string literals (that is, literals that contain the same sequence of code points) must refer to the same instance of class String (JLS §3.10.5). In addition, if the method String.intern is called on any string, the result is a reference to the same class instance that would be returned if that string appeared as a literal. Thus, the following expression must have the value true:
("a" + "b" + "c").intern() == "abc"
To derive a string literal, the Java Virtual Machine examines the sequence of code points given by the CONSTANT_String_info structure.
If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode code points identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String.
Otherwise, a new instance of class String is created containing the sequence of Unicode code points given by the CONSTANT_String_info structure; a reference to that class instance is the result of string literal derivation. Finally, the intern method of the new String instance is invoked.
Bytecode
Let's decompile some OpenJDK 7 bytecode to see interning in action.
If we decompile:
public class StringPool {
public static void main(String[] args) {
String a = "abc";
String b = "abc";
String c = new String("abc");
System.out.println(a);
System.out.println(b);
System.out.println(a == c);
}
}
we have on the constant pool:
#2 = String #32 // abc
[...]
#32 = Utf8 abc
and main:
0: ldc #2 // String abc
2: astore_1
3: ldc #2 // String abc
5: astore_2
6: new #3 // class java/lang/String
9: dup
10: ldc #2 // String abc
12: invokespecial #4 // Method java/lang/String."<init>":(Ljava/lang/String;)V
15: astore_3
16: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
19: aload_1
20: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
23: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
26: aload_2
27: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
30: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
33: aload_1
34: aload_3
35: if_acmpne 42
38: iconst_1
39: goto 43
42: iconst_0
43: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V
Note how:
0 and 3: the same ldc #2 constant is loaded (the literals)
12: a new string instance is created (with #2 as argument)
35: a and c are compared as regular objects with if_acmpne
The representation of constant strings is quite magic on the bytecode:
it has a dedicated CONSTANT_String_info structure, unlike regular objects (e.g. new String)
the struct points to a CONSTANT_Utf8_info Structure that contains the data. That is the only necessary data to represent the string.
and the JVMS quote above seems to say that whenever the Utf8 pointed to is the same, then identical instances are loaded by ldc.
I have done similar tests for fields, and:
static final String s = "abc" points to the constant table through the ConstantValue Attribute
non-final fields don't have that attribute, but can still be initialized with ldc
Conclusion: there is direct bytecode support for the string pool, and the memory representation is efficient.
Bonus: compare that to the Integer pool, which does not have direct bytecode support (i.e. no CONSTANT_String_info analogue).
Update for Java 8 or plus.
In Java 8, PermGen (Permanent Generation) space is removed and replaced by Meta Space. The String pool memory is moved to the heap of JVM.
Compared with Java 7, the String pool size is increased in the heap. Therefore, you have more space for internalized Strings, but you have less memory for the whole application.
One more thing, you have already known that when comparing 2 (referrences of) objects in Java, '==' is used for comparing the reference of object, 'equals' is used for comparing the contents of object.
Let's check this code:
String value1 = "70";
String value2 = "70";
String value3 = new Integer(70).toString();
Result:
value1 == value2 ---> true
value1 == value3 ---> false
value1.equals(value3) ---> true
value1 == value3.intern() ---> true
That's why you should use 'equals' to compare 2 String objects. And that's is how intern() is useful.
Since strings are objects and since all objects in Java are always stored only in the heap space, all strings are stored in the heap space. However, Java keeps strings created without using the new keyword in a special area of the heap space, which is called "string pool". Java keeps the strings created using the new keyword in the regular heap space.
The purpose of the string pool is to maintain a set of unique strings. Any time you create a new string without using the new keyword, Java checks whether the same string already exists in the string pool. If it does, Java returns a reference to the same String object and if it does not, Java creates a new String object in the string pool and returns its reference. So, for example, if you use the string "hello" twice in your code as shown below, you will get a reference to the same string. We can actually test this theory out by comparing two different reference variables using the == operator as shown in the following code:
String str1 = "hello";
String str2 = "hello";
System.out.println(str1 == str2); //prints true
String str3 = new String("hello");
String str4 = new String("hello");
System.out.println(str1 == str3); //prints false
System.out.println(str3 == str4); //prints false
== operator is simply checks whether two references point to the same object or not and returns true if they do. In the above code, str2 gets the reference to the same String object which was created earlier. However, str3 and str4 get references to two entirely different String objects. That is why str1 == str2 returns true but str1 == str3 and str3 == str4 return false .
In fact, when you do new String("hello"); two String objects are created instead of just one if this is the first time the string "hello" is used in the anywhere in program - one in the string pool because of the use of a quoted string, and one in the regular heap space because of the use of new keyword.
String pooling is Java's way of saving program memory by avoiding the creation of multiple String objects containing the same value. It is possible to get a string from the string pool for a string created using the new keyword by using String's intern method. It is called "interning" of string objects. For example,
String str1 = "hello";
String str2 = new String("hello");
String str3 = str2.intern(); //get an interned string obj
System.out.println(str1 == str2); //prints false
System.out.println(str1 == str3); //prints true
OCP Java SE 11 Programmer, Deshmukh
String interning is an optimization technique by the compiler. If you have two identical string literals in one compilation unit then the code generated ensures that there is only one string object created for all the instance of that literal(characters enclosed in double quotes) within the assembly.
I am from C# background, so i can explain by giving a example from that:
object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;
output of the following comparisons:
Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true
Console.WriteLine(obj == str2); // false !?
Note1:Objects are compared by reference.
Note2:typeof(int).Name is evaluated by reflection method so it does not gets evaluated at compile time. Here these comparisons are made at compile time.
Analysis of the Results:
1) true because they both contain same literal and so the code generated will have only one object referencing "Int32". See Note 1.
2) true because the content of both the value is checked which is same.
3) FALSE because str2 and obj does not have the same literal. See Note 2.
Java interning() method basically makes sure that if String object is present in SCP, If yes then it returns that object and if not then creates that objects in SCP and return its references
for eg: String s1=new String("abc");
String s2="abc";
String s3="abc";
s1==s2// false, because 1 object of s1 is stored in heap and other in scp(but this objects doesn't have explicit reference) and s2 in scp
s2==s3// true
now if we do intern on s1
s1=s1.intern()
//JVM checks if there is any string in the pool with value “abc” is present? Since there is a string object in the pool with value “abc”, its reference is returned.
Notice that we are calling s1 = s1.intern(), so the s1 is now referring to the string pool object having value “abc”.
At this point, all the three string objects are referring to the same object in the string pool. Hence s1==s2 is returning true now.
By using heap object reference, if we want to corresponding SCP object reference we should go for intern() method.
Example :
class InternDemo
{
public static void main(String[] args)
{
String s1=new String("smith");
String s2=s1.intern();
String s3="smith";
System.out.println(s2==s3);//true
}
}
intern flow chart
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
difference between string object and string literal
When initializing a String object there are at least two ways, like such:
String s = "some string";
String s = new String("some string");
What's the difference?
The Java language has special handling for strings; a string literal automatically becomes a String object.
So in the first case, you're initializing the reference s to that String object.
In the second case, you're creating a new String object, passing in a reference to the original String object as a constructor parameter. In other words, you're creating a copy. The reference s is then initialized to refer to that copy.
In first case you can take this string from pool if it exist there.
In second case you explicitly create new string object.
You can check this by these lines:
String s1 = "blahblah";
String s2 = "blahblah";
String s3 = new String("blahblah");
String s4 = s3.intern();
System.out.println(s1 == s2);
System.out.println(s1 == s3);
System.out.println(s2 == s3);
System.out.println(s1 == s4);
Output:
true
false
false
true
String s = "some string"; assigns that value to s from string pool (perm.gen.space) (creates one if it does not exist)
String s = new String("some string"); creates a new string with value given in constructor, memory allocated in heap
The first method is recommended as it will help to reuse the literal from string pool
Semantically, the first one assigns "some string" to s, while the second one assigns a copy of "some string" to s (since "some string" is already a String). I see no practical reasons to do this in 99.9% of cases, thus I would say that in most contexts, the only difference between the two lines is that:
The second line is longer.
The second line might consume more memory than the first one.
As #Anish-Dasappan mentions, the second one will have it's value in heap, whereas the first one will be in the string pool - I'm not sure this has any interest for the programmer, but I might be missing a clever trick there.