Is string interning done at compile time in Java? [duplicate] - java

This question already has answers here:
When are Java Strings interned?
(2 answers)
Closed 7 years ago.
I am really confused with how string interning works in Java. When I write:
String a = "ABC";
String b = "ABC";
if (a==b)
System.out.println("Equal");
Does the compiler store the string literal "ABC" into the string constant pool at compile time?
That sounds illogical, because I thought the string constant pool was created by the JVM at runtime, and I don't see how that is possible if it is done at compile time since the Java compiler does not even invoke the JVM.
If it is not done at compile time and it is done at runtime then why does the following return false (taken from this answer)?
// But .substring() is invoked at runtime, generating distinct objects
"test" == "!test".substring(1) // --> false
If it is done at runtime then why can't the JVM figure out that they are the same string?
I am really confused as to how string interning works in Java and where exactly the Java string pool is stored.

The compiler puts the literal strings in the class file (and only unique ones, it consolidates all equivalent literals); the JVM loads those strings into the string pool when the class file is loaded.
If it is done at runtime then why can't the JVM figure out that they are the same String.
Because the string being returned by .substring has not been interned, and so is a different object than the equivalent "test" string in the string pool. If you interned it, you'd get true:
"test" == "!test".substring(1).intern() // true
Sections §4.4 of the JLS and §5.3 of the JVM spec look relevant.
Just to be clear: The correct way to compare strings in Java is to use the .equals method or similar, not ==. Using == with string instances is usually incorrect. (Unless you're playing with understanding when and how things are interned...)

I checked .class for
String a = "ABC";
String b = "ABC";
and found only one "ABC" in it. That is javac creates one constant of the same string at compile time.
But if 2 or more classes have the same "ABC" constant then JVM will place them at the same location in string pool

Related

Difference between concatenation at run time and compile time in java

public class First
{
public static void main(String[] args)
{
String str1="Hello ",str2="World",str3="Hello World";
System.out.println(str3==("Hello "+"World")); //Prints true
System.out.println(str3==("Hello "+str2)); //Prints false
}
}
The reason of the above is given in JLS-
• Strings computed by constant expressions (§15.28) are computed at
compile time and then treated as if they were literals.
• Strings computed by concatenation at run time are newly created and
therefore distinct.
What I wanted to ask is-
Why the strings which are computed at run time differ from those which are computed at compile time?
Is it because of the memory allocation,one is allocated memory in heap and one in String pool or there is some other reason?Please clarify.
The compiler can't know what str2 contains because it would have to execute the code to know the contents of str2 when you are concatenating it with "Hello " (it could make some optimizations and inline it, since it doesn't change, but it doesn't do it).
Imagine a more complex scenario where str2 is something that a user typed in. Even if the user had typed "World" there was no way the compiler could've known that.
Therefore, it can't perform the comparison str3 == "Hello World" using the same "Hello World" from the constant pool that's assigned to str3 and used in the first comparison.
So the compiler will generate the concatenation by using StringBuilder and will end up creating another String with value Hello World, so the identity comparison will fail because one object is the one from the constant pool and the other one is the one that was just created.
You should use equals when comparing Objects and not the == operator.
Strings are immutable in Java. So, when you concatenate two strings, a third one is created at runtime to represent the concatenated value. So using == returns false as both arguments are pointing to different instances of String object.
For compile time scenario, due to compiler optimization, the concatenated string is already created, and at runtime, boht arguments of == are being represented by same instances. Hence, == returns true as both arguments point to same instance (reference).
The compiler recognizes that constants won't change and if you are using the + operator, will concatenate them in the compiled code.
That's why in first case it will run the execution as str3==("HelloWorld") since "Helloworld" literal is already present in the string pool they both will point at the same location in the String pool it will print true .
In case of str3==("Hello"+str2),the compiler won't check that str2 has World in it, it will consider it as a variable that can have any value so at run time they will create a new string variable which point to different HelloWorld than the str3 in the string pool, so it will print false.

String s=new String("Rohit"); Does this statement creates an object in heap only or it makes an entry in string pool as well? [duplicate]

This question already has answers here:
Difference between string object and string literal [duplicate]
(13 answers)
What is the difference between "text" and new String("text")?
(13 answers)
Closed 8 years ago.
I attended interview and i was asked this question.
String s=new String("Rohit");
Does this statement creates an object in heap only or it makes an entry in string pool as well?
I answered it does not make entry in pool. I think with .intern() it would make entry in string pool. Interviewer's thought was opposite.
Could you please guide me if i was wrong or interviewer?
Thanks in Advance.
EDIT:
String s1=new String("Rohit");
String s2="Rohit";
String s3=new String("Rohit").intern();
System.out.println(" "+(s2==s3)+" "+(s1==s2)+" "+(s1==s3)+" "+(s2==s3));
results as :true false false true
This makes me to think that without using intern() with new, there is no entry in pool for this object
Several things wrong with what you say he said:
First, doing new String always returns a new string, and never one that is interned.
Second, while it is true that the presence of the string literal "Rohit" might cause a String of that value to be "interned" (what is erroneously referred to as placing in the "string pool" or "string constant pool"), that would be done (if it was done) when the class was loaded, not when the statement was executed.
Third, since there can only ever be one copy of a String with a given pattern in the interned string table, even loading the class is not guaranteed to add a new entry, since one might already be there.
Of course, as is often the case, there may have been some misunderstanding on the part of one or both of you, or the question (or your answer) may have been poorly/unclearly worded.
I agree with #Hot Licks, but the behaviour of when a String literal is loaded changed in Java 7 AFAICS.
String literals are loaded when the class is loaded in Java 6 but in Java 7 they changed this to be when the first line which uses the string is executed. You can detect this by looking at the String returned by String.intern(); The first time it is called for a String it will return the same object, however when called again with an equals() String it will return the previous object.
StringBuilder sb = new StringBuilder("Hell");
sb.append("o");
String s = sb.toString();
String si = s.intern(); /* same string if not loaded. */
String s2 = "Hello";
System.out.println( System.getProperty("java.version")+" " + (s == si) + " "+(s2 == s));
prints
1.6.0_45 false false
as you expect, but in Java 7+
1.7.0_45 true true
If s2 is loaded first, the intern() string si will be the same as it, and thus different to s However, if s2 is loaded after, all the Strings use the same object.

when is string pool create in java at compile time or run time?

i know that when there is already an existing string in pool then a new string literal wont be made again.
i know the difference between string constant pool and heap also
i just want to know when is string pool created for a class for the below example.
String s="qwerty";
String s1="qwer"+"ty";// this will be resolved at compile time and no new string literal will be made
String s2=s.subString(1); will create qwerty at run time
s==s1; //true
s==s2;//false
i want to know for String s1 is resolved at compile time does that mean string pool is created at compile time ??
The constant pool contains String instances, which are runtime artifacts. Clearly, you cannot create objects before you start the program they are used in. The data specifying which string constants will be created is prepared at compile time and is a part of the class file format.
However, note that the string constants are created at class loading time, and not on either class initialization time or their first use. This is a point which people often confuse.
In your example, the difference is not between compile time and runtime, but between creating the string once in the constant pool, and creating it every time a line of code is executed.
Also do note that the string pool has been a part of the regular heap for a long time in OpenJDK (even before it has become OpenJDK).
As per your code :
String s2=s.subString(1); //this will create werty not qwerty so s==s2 will be anyways false
If you use
String s2=s.subString(0); //this will create qwerty
and then s==s2 will return true.
Also there is a method intern() which looks into constant pool for below case as well:
String s2 = new String("Qwerty").intern();
In this case, s==s2 will return true
but if String s2==new String("Qwerty"); then s==s2 will return false.
Also string literal were part of permgen space before JDK 7 after which they became a part of heap space.

Why are equal java strings taking the same address? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
String object creation using new and its comparison with intern method
I was playing around with Strings to understand them more and I noticed something that I can't explain :
String str1 = "whatever";
String str2 = str1;
String str3 = "whatever";
System.out.println(str1==str2); //prints true...that's normal, they point to the same object
System.out.println(str1==str3); //gives true..how's that possible ?
How is the last line giving true ? this means that both str1 and str3 have the same address in memory.
Is this a compiler optimization that was smart enough to detect that both string literals are the same ("whatever") and thus assigned str1 and str3 to the same object ? Or am I missing something in the underlying mechanics of strings ?
Because Java has a pool of unique interned instances, and that String literals are stored in this pool. This means that the first "whatever" string literal is exactly the same String object as the third "whatever" literal.
As the Document Says:
public String intern()
Returns a canonical representation for the
string object. A pool of strings, initially empty, is maintained
privately by the class String.
When the intern method is invoked, if the pool already contains a
string equal to this String object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this
String object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are
interned. String literals are defined in §3.10.5 of the Java Language
Specification
Returns: a string that has the same contents as this string, but is
guaranteed to be from a pool of unique strings.
http://www.xyzws.com/Javafaq/what-is-string-literal-pool/3
As the post says:
String allocation, like all object allocation, proves costly in both time and memory. The JVM performs some trickery while instantiating string literals to increase performance and decrease memory overhead. To cut down the number of String objects created in the JVM, the String class keeps a pool of strings. Each time your code create a string literal, the JVM checks the string literal pool first. If the string already exists in the pool, a reference to the pooled instance returns. If the string does not exist in the pool, a new String object instantiates, then is placed in the pool.
If you do:
String str1 = new String("BlaBla"); //In the heap!
String str2 = new String("BlaBla"); //In the heap!
then you're explicitly creating a String object through new operator (and constructor).
In this case you'll have each object pointing to a different storage location.
But if you do:
String str1 = "BlaBla";
String str2 = "BlaBla";
then you've implicit construction.
Two strings literals share the same storage if they have the same values, this is because Java conserves the storage of the same strings! (Strings that have the same value)
The javac compiler combines String literals which are the same in a given class file.
However at runtime, String literals are combined using the same approach as String.intern() This means even Strings in different class in different applications (in the same JVM which use the same object.

Using == when comparing objects

Recently in a job interview I was asked this following question (for Java):
Given:
String s1 = "abc";
String s2 = "abc";
What is the return value of
(s1 == s2)
I answered with it would return false because they are two different objects and == is a memory address comparison rather than a value comparison, and that one would need to use .equals() to compare String objects. I was however told that although the .equals(0 methodology was right, the statement nonetheless returns true. I was wondering if someone could explain this to me as to why it is true but why we are still taught in school to use equals()?
String constants are interned by your JVM (this is required by the spec as per here):
All literal strings and string-valued constant expressions are interned. String literals are defined in §3.10.5 of the Java Language Specification
This means that the compiler has already created an object representing the string "abc", and sets both s1 and s2 to point to the same interned object.
java will intern both strings, since they both have the same value only one actual string instance will exist in memory - that's why == will return true - both references point to the same instance.
String interning is an optimization technique to minimize the number of string instances that have to be held in memory. String literals or strings that are the values of constant expressions, are interned so as to share unique instances. Think flyweight pattern.
Since you are not actually creating new instances of String objects for either one of these, they are sharing the same memory space. If it were
String s1 = new String("abc");
String s2 = new String("abc");
the result would be false.
The reason is strings are interned in Java. String interning is a method of storing only one copy of each distinct string value which is immutable. Interning string makes some string processing tasks more efficient. The distinct values are stored in a string intern pool.
(From wiki)
You're right that == uses memory address. However when the java compiler notices that you're using the same string literal multiple times in the same program, it won't create the same string multiple times in memory. Instead both s1 and s2 in your example will point to the same memory. This is called string interning.
So that's why == will return true in this case. However if you read s2 from a file or user input, the string will not automatically interned. So now it no longer points to the same memory. Therefor == would now return false, while equals returns true. And that's why you shouldn't use ==.
Quick and dirty answer: Java optimizes strings so if it encounters the same string twice it will reuse the same object (which is safe because String is immutable).
However, there is no guarantee.
What usually happens is that it works for a long time, and then you get a nasty bug that takes forever to figure out because someone changed the class loading context and your == no longer works.
You should continue to use equals() when testing string equality. Java makes no guarantees about identity testing for strings unless they are interned.
The reason the s1 == s2 in your example is because the compiler is simply optimizing 2 literal references in a scope it can predict.

Categories