I'm aware about string pool in JVM and difference between literals and string objects. I know that literals are automatically interned, but what is the purpose of this line then:
public static final String PARAMETER = "value".intern();
On my question I always find a ton of text which explains me the same with emphasis on the difference between literals and objects and mention that literals already interned. Therefore, I'd like to know about underlyings of using that tricky line with intern() over literal.
The main benefit of that type of code is to prevent compile-time constants from being inlined.
For instance, suppose you have a constant in one class and you have many other classes refer to that constant. Normally if you change the constant value, you'd need to recompile all the classes involved. Using intern() on the field will prevent inlining of the constant and means it would be sufficient to recompile just your constant class. In theory this approach would have inferior performance, although I have no idea how significant that might be.
This SO question covers the same topic and is a useful read.
To prevent compile time inlining
The only use case I can think of is to prevent it from being considered as compile time constant, so that it prevents inlining.
Now one reason can be that someone can change the value using Reflection, which will only work if the string is declared like that.
Also it allows you to change the class file containing constant as told by #Duncan in his answer. That is a good reason too.
How does that work?
When you declare String as public static final String CONST = "abc";, then it is a compile time constant and will be inlined in the class using that constant.
class A{
private void foo(){
String newString = CONST + someDynamicValue;
}
}
After compiling this, if you decompile the class you will find
class A{
private void foo(){
String newString = "abc" + someDynamicString;
}
}
But when you declare it with .intern() or any other method, then it cannot be considered as compile time constant and every time the value will be fetched from the class. Java does this to be efficient.
Example from #Marko's comment:
the SWT library uses a similar trick to stop from inlining int
constants, and the only reason is to be consistent with respect to
upgrading the SWT library
Java automatically interns String literals. This means that in many cases, the == operator appears to work for Strings in the same way that it does for ints or other primitive values.
Since interning is automatic for String literals, the intern() method is to be used on Strings constructed with new String()
Using your example:
String s1 = "Shoaib";
String s2 = "Shoaib";
String s3 = "Shoaib".intern();
String s4 = new String("Shoaib");
String s5 = new String("Shoaib").intern();
if ( s1 == s2 ){
System.out.println("s1 and s2 are same"); // 1.
}
if ( s1 == s3 ){
System.out.println("s1 and s3 are same" ); // 2.
}
if ( s1 == s4 ){
System.out.println("s1 and s4 are same" ); // 3.
}
if ( s1 == s5 ){
System.out.println("s1 and s5 are same" ); // 4.
}
will return:
s1 and s2 are same
s1 and s3 are same
s1 and s5 are same
Explaination in simple words:
->In String pooled region if suppose we have "Shoaib" as String whose reference is say 2020
->Now Any object you create by new String("Shoaib") will point to another reference say 3030
->But if you want to assign reference of "Shoaib" in String pooled region to new String("Shoaib") then we use intern() on it.
So above you asked that "value".intern() which doesn't make any sense in case of interning.
The intern() is used to access reference ID of your String Constant Pool.
It is possible that your variable may contain the same value so, we do not need to allocate it again.
We just use intern() to check it is already in String Constant Pool or not ?
If it is not in pool then just JVM allocate in pool area and return it reference ID & We used to share same data member and benefit is that we use same memory.
The purpose of this line
public static final String PARAMETER = "value".intern();
is just used to check from string constant pool. If it available then it's return not again added in pool.
Related
I was trying to understand String#intern method. Now it has caused even more confusion.
package com;
public class Main {
public static void main(String[] args) {
String s1 = new String("GFG"); // Line-1
String s2 = s1.concat("GFG"); // Line-2
s2.intern(); // Line-3
String s4 = "GFGGFG"; // Line-4
// s2.intern(); // Line -5
System.out.println(s2 == s4);
}
}
The above code prints true. If I comment line 3 and uncomment line 5 it is printed false.
For line 3 the SCP is checked and the String is added to SCP.
But how did s2 equals to s4 in that case ?
s2 is still referencing to an object in heap which is pointing to SCP constant. Right ?
Can anyone please explain what is happening ? I've gone through different SO questions, but was not able to understand it still.
EDIT
I'm just trying to understand the intern method. I know the difference b/w == and equals and the latter is preferred.
String.intern() returns a canonical representation for the string object.
A pool of strings, initially empty, is maintained privately by the class String.
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.
In simple words, Intern is responsible to make exact copy of contents in memory (string constant pool). Memory will be shared for all copied contents.
By applying String.intern() on a couple of strings will ensure that all strings having same contents share same memory. For example, if a name ‘Amy’ appears 100 times, by interning you ensure only one ‘Amy’ is actually allocated memory.
To prove it, we can use an operator == (used to compare reference) and equals method (to compare content).
public class InternExample{
public static void main(String args[]){
String s1=new String("hello");
String s2="hello";
String s3=s1.intern();//returns string from pool, now it will be same as s2
System.out.println(s1==s2);//false because reference variables are pointing to different instance
System.out.println(s2==s3);//true because reference variables are pointing to same instance
System.out.println(s2.equals(s3));//true because content are same
}}
Output:
false
true
true
Explanation:
Case-1: s1 and s2 has same content but pointing to different reference in memory as new String always create new reference so it is false.
Case-2: s3 is interned or copied from s1 and pointing to same reference in memory as intern function just make a copy and keep same reference so it is true.
Case-3: s2 and s3 has same content
It is very simple ... on the surface.
If you would have written:
s2 = s2.intern();
Then the location is irrelevant, always yielding s2 == s4.
What happens without the assignment seems a miracle for retrieving s2.
The JVM exchanging the string of s2 under the hood.
(Disassembly with javap -c did not show me something special.)
As intern is native, slow,
and the JVM is involved, I am not willing to dive further in this esoteric subject of String interning; pure speculations what could be happening.
But definitely baffling, an interesting issue.
With Java version 1.6 the output is false true, but with version 1.8 the output changed to true true.
Can some one explain why is this happening?
Intern method is used to refer the corresponding string constant pool of created objects in the heap, and if the object is not there then it will create a String constant pool. Please correct me if my understanding is wrong.
public class Intern_String2 {
public static void main(String[] args) {
String s1 = new String("durga"); //object created in heap
String s2 = s1.concat("software");
//object durga software created in heap at runtime
String s3 = s2.intern();
// create durga software object in string constant pool as none exist.
System.out.println(s2==s3);//should be false but print true in 1.8 version.
String s4 = "durgasoftware";
System.out.println(s3==s4);//prints true in both version..
}
}
String.intern() returns the canonical instance of String. But it does allow that the String you passed to intern() (e.g. the call receiver / object you call the method on) is returned -- this may happen if String is not in the internal table yet -- that is the canonical instance now. In the same way, if that String was already in the internal String table, intern() would return it.
String s2 = "web".concat("sarvar");
String s3 = s2.intern();
System.out.println(s2 == s3); // prints "true"
String s4 = "web".concat("sarvar");
String s5 = s4.intern();
System.out.println(s4 == s5); // prints "false"
I would say that this happens at JAVA6 because the String pool was implemented used the PermGen... later, at JAVA7, the String.intern() begins to use the HEAP memory...
See this link for more details...
The jls does specify what becomes part of the constant pool. String literals and stuff retrieved by String.intern().
There are no real specification when it becomes part of it(First use, or load of the class defining the literal). It also doesnt state what doesnt become part of it, and what other stuff might be interned.
So based on your experiment i guess they changed the part when Strings become part of the constant pool. Basically changed it from loading of the class to first use. So String.intern() can return "this" while still adding this to the constant pool becoming the same instance with the literal as as it is first used.
If I write this code :
String s = new String("TestString");
I understand how s refers to a string created dynamically. s is not an object in itself, but refers to one.
But I am not able to figure out what this means :
String s = "TestString";
Q1. If it would have been some primitive data type I would understand, but what does this signify for a class type ?
Q2. Is this kind of initialization allowed for user created classes as well ?
Java Level : Beginner
Q1. If it would have been some primitive data type I would understand, but what does this signify for a class type ?
In this case, "TestString" is a string literal. A string literal also serves as a reference to an instance of String. This is per the language specification, §3.10.5. So, in your particular case "TestString" is a reference to an instance of String, and you are assigning that same reference to your variable s.
Now, there are some rather special things about Strings that are referred to by literals. Two string literals with the same value (logically, as strings) always refer to the same instance of String. This is due to the "interning" of string literals.
However, when you say
String s = new String("TestString");
it is still the case that "TestString" refers to an instance of String, in fact to an instance in the string intern pool, but it is not the case that s refers to this same string. Instead, s is initialized to have its value equal to "TestString", but it is in fact a new reference. That is:
String s = new String("TestString");
String t = "TestString";
System.out.println(s == t);
This will print false.
Q2. Is this kind of initialization allowed for user created classes as well ?
No.
String s = "TestString";
Is the normal way to create a String. In fact when you do:
String s = new String("TestString");
What you're doing is create a string first, then passing it as an argument to new String(); So the question is not why the first one exists, but why the second one does. The answer is pretty subtle and you probably won't ever care: The first way creates a String literal that doesn't get garbage collected, and is shared on all the VM. The second one, instead, does. This means, for performance reasons, there are cases when you want to use the second form, like when working with very very large strings.
You can read more about it here:
http://kjetilod.blogspot.com.es/2008/09/string-constructor-considered-useless.html
From the Oracle Documentation:
The most direct way to create a string is to write:
String greeting = "Hello world!";
In this case, "Hello world!" is a string literal—a series of
characters in your code that is enclosed in double quotes. Whenever it
encounters a string literal in your code, the compiler creates a
String object with its value—in this case, Hello world!.
As with any other object, you can create String objects by using the
new keyword and a constructor.
Q1. If it would have been some primitive data type I would understand,
but what does this signify for a class type ?
No, this is special case, in case of String literals, String s = "someString" statement means we are referring to someString which is stored in string constant pool. someString will be an instance of String class but will be stored in string literal pool.
The special thing about String literal pool will be.
String s = "someString";
String s1 = "someString";
Here, s == s1' will returntrue` as they will refer to the same object in string literal pool.
String s2 = new String("someString");
String s3 = new String("someString");
Here, s2 == s3 will return false as both string will be created in non-constant pool memory.
You can find a good tutorial regarding strings here
http://www.thejavageek.com/2013/06/19/the-string-constant-pool/
http://www.thejavageek.com/2013/06/17/string-immutability-in-java/
Q2. Is this kind of initialization allowed for user created classes as
well ?
No we can't.
String s1 = new String("string");
String s2 = new String("string");
String s3 = "string";
String s4 = "string";
System.out.println(s1 == s2); //FALSE
System.out.println(s2.equals(s1)); //TRUE
System.out.println(s3 == s4); //TRUE
System.out.println(s3.equals(s4)); //TRUE
What is the difference between creation of s1 and s3 ?
Please let me know
In String we are having only String object then why it treats this two differently.
s1 and s2 are having different memory address while s3 and s4 has same memory address.
why it works based on new operator.?
The String objects that represent string literals in your Java source code are added to a shared String pool when the classes that defines them are loaded1. This ensures that all "copies" of a String literal are actually the same object ... even if the literal appears in multiple classes. That is why s3 == s4 is true.
By contrast, when you new a String, a distinct new String object is created. That is why s1 == s2 is false. (This is a fundamental property of new. It is guaranteed to create and return a new object ... if it completes normally.)
However, in either case, the strings will have the same characters, and that is why equals is returning true.
While it is important to understand what is going on, the real lesson is that the correct way to compare Java strings is to use equals and not ==.
If you want to arrange that your String objects can be tested for equality using ==, you can "intern" them using the String.intern method. However, you have to do this consistently ... and interning is an expensive process in various respects ... so it is generally not a good idea.
1 - Actually, it is a bit more complicated than that. They objects get added to the pool at some time between class loading and first use of the literals. The precise timing is unspecified and JVM implementation dependent. However it is guaranteed to happen just once, and before any application code sees the String object reference corresponding to the literal.
s1 is a new String object that does not belong to a part of any pooled instance. s3 is an instance of a string that comes from a pool. Lookup java String pool. Take a look at the related intern() method on String.
The concept is not unique to java. String interning is supported in other languages. On that related note, pooling frequently used objects follows the flyweight pattern and is not limited to Strings. Take a look at Integer.valueOf(). Integers have a constant pool of their own too.
The JVM has an automatic optimisation. Unless you specifically create a new String object, and another String object already exists with the same value, the JVM automatically assumes that a new object is not a necessity, and will assign you a pointer to the equal String object that already exists.
Essentially, when you use the second option, this is what happens:
Step 1
First Object is created no problem.
Step 2
Before the second object is created, the String pool is checked for a value.
If that value currently exists, then there is no need to create a new object. It just returns the reference to the String object.
Step 3
Instead of being assigned a new Object, it is simply given a reference to the object made in step 1. This is to save memory.
This happens because the new operator forces creation of a new instance of String, while in the second case, as String is an immutable class, the JVM provides you with the same String instance for both variables to save memory. As there is no chance one of such objects will change causing the second one change as well (immutable, remember?) this is OK.
a quick and confusing question. If Class A and Class B have this inside them:-
String name="SomeName";
and both classes are instantiated, is it true that both instances refer to same memory location of variable "name" say when we do this objA.name or objB.name ? which has value "SomeName" and since String is immutable, several instances of both classes of same JVM use the same variable repeatedly? I read somewhere online that, unless there is
String example=new String("something");
is used, the former declaration always creates one copy and it is used until all its applications are terminated for reclaiming memory.
Note: I see several answers, which one do I count on, can someone conclude. Thank you all for your effort, appreciate it.
Yes, if you create two strings like:
String a = "Hello";
String b = "Hello";
They will be the exact same object. You can test it yourself by doing
System.out.println(a == b);
If they are the same object, then their internal reference to the character array will be exactly the same.
Now, if you did String c = "Hell" + "o";, it would not have the same reference since it would have been (internally) built using StringBuilder.
There is a lot of good information here.
The relevant sections has (Note: The following is copied from that web site):
As mentioned, there are two ways to construct a string: implicit construction by assigning a String literal or explicitly creating a String object via the new operator and constructor. For example,
String s1 = "Hello"; // String literal
String s2 = "Hello"; // String literal
String s3 = s1; // same reference
String s4 = new String("Hello"); // String object
String s5 = new String("Hello"); // String object
Java has designed a special mechanism for keeping the String literals - in a so-called string common pool. If two String literals have the same contents, they will share the same storage locations inside the common pool. This approach is adopted to conserve storage for frequently-used strings. On the other hands, String object created via the new operator are kept in the heap. Each String object in the heap has its own storage just like any other object. There is no sharing of storage in heap even if two String objects have the same contents.
You can use the method equals() of the String class to compare the contents of two Strings. You can use the relational equality operator '==' to compare the references (or pointers) of two objects. Study the following codes:
s1 == s1; // true, same pointer
s1 == s2; // true, s1 and s1 share storage in common pool
s1 == s3; // true, s3 is assigned same pointer as s1
s1.equals(s3); // true, same contents
s1 == s4; // false, different pointers
s1.equals(s4); // true, same contents
s4 == s5; // false, different pointers in heap
s4.equals(s5); // true, same contents
Edit to add: Run this SSCE to test reference equality between two constant strings in to different classes:
class T {
String string = "Hello";
public static void main(String args[]) {
T t = new T();
T2 t2 = new T2();
System.out.println(t.string == t2.string);
}
}
class T2 {
String string = "Hello";
}
prints out true.
If "something" is literally hard-coded into your source code, then the two variables will point to the same in-memory String object.
Per the Java spec, a string literal (one that's defined as a literal in the byte codes) is "interned", so that any reference to that literal will obtain the exact same pointer, even if the reference is to an identical literal in an entirely separate class.
A string constructed at runtime (eg, "abc" + "xyz" or new String("abc")) will not be interned, and so the pointer will generally be unique. (But note that an optimizing compiler may combine "abc" + "xyz" into the single literal "abcxyz", resulting in an interned value.)
the former declaration always creates one copy and it is used until all its applications are terminated for reclaiming memory.
Strings, like other object are reclaimed when a GC is performed and there is no strong reference to it. Even intern'ed Strings can be cleaned up when they are no longer used.
I would add one more detail to all the solutions above. String interning is just an optimization of Java/C# compiler. It's not good to rely on it as it can be turned off in both cases.
It may also behave differently in different compilers/VM's implementations