Why are equal java strings taking the same address? [duplicate] - java

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
String object creation using new and its comparison with intern method
I was playing around with Strings to understand them more and I noticed something that I can't explain :
String str1 = "whatever";
String str2 = str1;
String str3 = "whatever";
System.out.println(str1==str2); //prints true...that's normal, they point to the same object
System.out.println(str1==str3); //gives true..how's that possible ?
How is the last line giving true ? this means that both str1 and str3 have the same address in memory.
Is this a compiler optimization that was smart enough to detect that both string literals are the same ("whatever") and thus assigned str1 and str3 to the same object ? Or am I missing something in the underlying mechanics of strings ?

Because Java has a pool of unique interned instances, and that String literals are stored in this pool. This means that the first "whatever" string literal is exactly the same String object as the third "whatever" literal.
As the Document Says:
public String intern()
Returns a canonical representation for the
string object. A pool of strings, initially empty, is maintained
privately by the class String.
When the intern method is invoked, if the pool already contains a
string equal to this String object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this
String object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are
interned. String literals are defined in §3.10.5 of the Java Language
Specification
Returns: a string that has the same contents as this string, but is
guaranteed to be from a pool of unique strings.

http://www.xyzws.com/Javafaq/what-is-string-literal-pool/3
As the post says:
String allocation, like all object allocation, proves costly in both time and memory. The JVM performs some trickery while instantiating string literals to increase performance and decrease memory overhead. To cut down the number of String objects created in the JVM, the String class keeps a pool of strings. Each time your code create a string literal, the JVM checks the string literal pool first. If the string already exists in the pool, a reference to the pooled instance returns. If the string does not exist in the pool, a new String object instantiates, then is placed in the pool.

If you do:
String str1 = new String("BlaBla"); //In the heap!
String str2 = new String("BlaBla"); //In the heap!
then you're explicitly creating a String object through new operator (and constructor).
In this case you'll have each object pointing to a different storage location.
But if you do:
String str1 = "BlaBla";
String str2 = "BlaBla";
then you've implicit construction.
Two strings literals share the same storage if they have the same values, this is because Java conserves the storage of the same strings! (Strings that have the same value)

The javac compiler combines String literals which are the same in a given class file.
However at runtime, String literals are combined using the same approach as String.intern() This means even Strings in different class in different applications (in the same JVM which use the same object.

Related

Check if a variable is in the string constant pool

In Java when we use literal string when creating a string object, I know that a new object is created in SCP (string constant pool).
Is there a way to check if a variable is in the SCP or in the heap?
First of all, the correct term is the "string pool", not the "string constant pool"; see String pool - do String always exist in constant pool?
Secondly, you are not checking a variable. You are checking the string that some variable contains / refers to. (A variable that contains a reference to the string, cannot be be "in the SCP". The variable is either on the stack, in the heap (not SCP), or in metaspace.)
Is there a way to check if a variable is in SCP or in heap ?
From Java 7 and later, the string pool is in the (normal) heap. So your question is moot if we interpret it literally.
Prior to Java 7, the way to check if a String is in the string pool was to do this
if (str == str.intern()) {
System.out.println("In the string pool");
}
However, this had the problem that if an equivalent str was not already in the pool, you would have added a copy of str to the pool.
From Java 7 onwards, the above test is no longer reliable. A str.intern() no longer needs to copy a string to a separate region to add it the string pool. Therefore the reference to an intern'd string is often identical to the reference to the original (non-interned) string.
char[] chars = new char[]{'a', 'b', 'c'};
String str = new String(chars);
String interned = str.intern();
System.out.println(str == interned);
In fact str == str.intern() can only detect the case where you have a non-interned string with the same content as a string literal.
Or at least list all string instances in SCP ?
There is no way to do that.
As JB Nizet pointed out, there is really not a lot of point in asking these questions:
You shouldn't be writing code that depends on whether a string is in the string pool or not.
If you are concerned about storage to the level where you would contemplate calling intern for yourself, it is better to make use of the opportunistic string compaction mechanism provided by Java 9+ garbage collectors.

Why concatenation of String object and string literal is created in heap? [duplicate]

This question already has an answer here:
What is the difference between Heap memory and string constant pool in java
(1 answer)
Closed 3 years ago.
I have below Strings
String str1 = "Abc";//created in constant pool
String str2 = "XYZ";//created in constant pool
String str3 = str1 + str2;//created in constant pool
String str4 = new String("PQR");//created in heap
String str5 = str1.concat(str4);//created in heap
String str6 = str1 + str4;//created in heap
Here I don't know why concatenation of Strings, one created in the constant pool, and the other in the heap, results in creating the new String in the heap. I don't know the reason, why does it happen?
There is a bunch of dubious information in the comments, so I will give this a proper answer.
There is actually no such thing as "the constant pool". You won't find this term in the Java Language Specification.
(Perhaps you are getting your terminology confused with the Constant Pool which is the section of a .class file, and the corresponding per-class Runtime Constant Pool ... which is not visible to application programs. These are "specification artifacts" defined by the JVM spec for the purpose of defining the execution model of bytecodes. The spec does not require that they physically exist, though they typically do; e.g. in an Oracle or OpenJDK implementation.)
There is a data structure in a running JVM called the string pool. The string pool is NOT mentioned by name in the JLS, but its existence is implied by string literal properties as specified by the JLS. The string pool is mentioned in the javadocs, and the JVM specification.
The string pool will contain the String objects that represent the values of any string-valued constant expression used in an application. This includes string literals.
The string pool has always been primarily a de-duping mechanism for strings. Applications are able to use this by calling the String.intern method.
The string values in the Constant Pool (see above) are used to create the String objects that the application see:
A String object is created from the representation.
String.intern is called, returning the corresponding de-duped String object from the string pool.
That string becomes part of the classes Runtime Constant Pool; i.e. the Runtime Constant Pool for a class will include a reference to the String object in the string pool.
This process can happen eagerly or lazily depending on the Java implementation.
The string pool is and has always been stored in the (or a) heap.
Prior to Java 7, string objects in the string pool were allocated in a special heap called the PermGen heap. In the earliest versions of Java it wasn't GC'ed. Then it was GC'ed only occasionally.
In Java 7 (not 8!) the string pool stopped using the PermGen heap and used the regular heap instead.
In Java 8 the PermGen heap was replaced (for some purposes!) by a different storage management mechanism called the Metaspace. Apparently, Metaspace doesn't hold Java objects. Rather, it holds code segments, class descriptors and other JVM internal data structures.
In recent versions of Java (i.e. Java 8 u20 and later) the GC has another mechanism for de-duping strings that survive a given number of GC cycles.
The behavior of strings (i.e. which ones are interned and which ones are not) is determined by the relevant parts of the JLS and the javadocs for the String class.
All of the complexity is irrelevant if you follow one simple rule:
Never use == to compare strings. Always use equals.
Now to deal with your example code:
String str1 = "Abc"; // string pool
String str2 = "XYZ"; // string pool
String str3 = str1 + str2; // not string pool (!!)
String str3a = "Abc" + "XYZ"; // string pool
String str4 = new String("PQR"); // not string pool (but the "PQR" literal is)
String str5 = str1.concat(str4); // not string pool
String str6 = str1 + str4; // not string pool
String str7 = str6.intern(); // string pool
Why?
The values assigned to str1, str2 and str3a are all values of constant expressions; see below.
The value assigned to str3 is not the value of a constant expression according to the JLS.
str4 - the JLS says that new operator always creates a new object and new strings are not automatically interned
str5 - string operations apart from intern do not create objects in the string pool
str6 - ditto - equivalent to a concat call. The JLS also says that + produces a new string (except in the constant expression case).
str7 - the exception: see above. The intern call returns a object in the string pool.
Constant expressions include literals, concatenations involving literals, values of static final String constants, and a few other things. See JLS 15.28 for the complete list, but bear in mind that the string pool only holds string values.
The precise behavior of intern depends on the Java version. Consider this example:
char[] chars = // create an array of random characters
String s1 = new String(chars);
String s2 = s1.intern();
Let us assume that the random characters do not correspond to any previously interned string.
For older JVMs where interned strings were allocated in PermGen, the intern call in the example will (must) produce a new String object.
For newer JVMs, the intern can add the existing String object to the string pool data structure without having to create a new String object.
In other words, the truth of s1 == s2 depends on the Java version.

Java- Creating String object using new keyword

I know the difference between String literal and new String object and also know how it works internally.But my question is little bit advance of this.When we create String object using new keyword as
String str = new String("test");
In this case, we are passing a argument of String type.
My questions is where this string gets generated - Heap Or String constant pool Or somewhere else?
As up to my knowledge, this argument is a string literal so it should be in String constant pool.If is it so then what is use of intern method - only just link variable str to constant pool? because "test" would be available already.
Please clarify me, if I had misunderstood the concept.
The statement String str = new String("test"); creates a string object which gets stored on the heap like any other object. The string literal "test" that is passed as an argument is stored in the string constant pool.
String#intern() checks if a string constant is already available in the string pool. If there is one already it returns it, else it creates a new one and stores it in the pool. See the Javadocs:
Returns a canonical representation for the string object.
A pool of strings, initially empty, is maintained privately by the class String.
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.
Starting from JDK7, interned strings are stored on the heap. This is from the release notes of JDK7:
In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
Use of intern() :
public static void main(String[] args) throws IOException {
String s = new String(new char[] { 'a', 'b', 'c' }); // "abc" will not be added to String constants pool.
System.out.println(System.identityHashCode(s));
s = s.intern();// add s to String constants pool
System.out.println(System.identityHashCode(s));
String str1 = new String("hello");
String str2 = "hello";
String str3 = str1.intern();
System.out.println(System.identityHashCode(str1));
System.out.println(System.identityHashCode(str2));
System.out.println(System.identityHashCode(str3));
}
O/P :
1414159026
1569228633 --> OOPs String moved to String constants pool
778966024
1021653256
1021653256 --> "hello" already added to string pool. So intern does not add it again.

Using == when comparing objects

Recently in a job interview I was asked this following question (for Java):
Given:
String s1 = "abc";
String s2 = "abc";
What is the return value of
(s1 == s2)
I answered with it would return false because they are two different objects and == is a memory address comparison rather than a value comparison, and that one would need to use .equals() to compare String objects. I was however told that although the .equals(0 methodology was right, the statement nonetheless returns true. I was wondering if someone could explain this to me as to why it is true but why we are still taught in school to use equals()?
String constants are interned by your JVM (this is required by the spec as per here):
All literal strings and string-valued constant expressions are interned. String literals are defined in §3.10.5 of the Java Language Specification
This means that the compiler has already created an object representing the string "abc", and sets both s1 and s2 to point to the same interned object.
java will intern both strings, since they both have the same value only one actual string instance will exist in memory - that's why == will return true - both references point to the same instance.
String interning is an optimization technique to minimize the number of string instances that have to be held in memory. String literals or strings that are the values of constant expressions, are interned so as to share unique instances. Think flyweight pattern.
Since you are not actually creating new instances of String objects for either one of these, they are sharing the same memory space. If it were
String s1 = new String("abc");
String s2 = new String("abc");
the result would be false.
The reason is strings are interned in Java. String interning is a method of storing only one copy of each distinct string value which is immutable. Interning string makes some string processing tasks more efficient. The distinct values are stored in a string intern pool.
(From wiki)
You're right that == uses memory address. However when the java compiler notices that you're using the same string literal multiple times in the same program, it won't create the same string multiple times in memory. Instead both s1 and s2 in your example will point to the same memory. This is called string interning.
So that's why == will return true in this case. However if you read s2 from a file or user input, the string will not automatically interned. So now it no longer points to the same memory. Therefor == would now return false, while equals returns true. And that's why you shouldn't use ==.
Quick and dirty answer: Java optimizes strings so if it encounters the same string twice it will reuse the same object (which is safe because String is immutable).
However, there is no guarantee.
What usually happens is that it works for a long time, and then you get a nasty bug that takes forever to figure out because someone changed the class loading context and your == no longer works.
You should continue to use equals() when testing string equality. Java makes no guarantees about identity testing for strings unless they are interned.
The reason the s1 == s2 in your example is because the compiler is simply optimizing 2 literal references in a scope it can predict.

== operator does not compare references for String [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
String comparison and String interning in Java
I understand how String equals() method works but was surprised by some results I had with the String == operator.
I would have expected == to compare references as it does for other objects.
However distinct String objects (with the same content) == returns true and furthermore even for a Static String object (with the same content) which is obviously not the same memory address.
I guess == has been defined the same as equals to prevent its misuse
No, == does just compare references. However, I suspect you've been fooled by compile-time constants being interned - so two literals end up refererring to the same string object. For example:
String x = "xyz";
String y = "xyz";
System.out.println(x == y); // Guaranteed to print true
StringBuilder builder = new StringBuilder();
String z = builder.append("x").append("yz").toString();
System.out.printn(x == z); // Will print false
From section 3.10.5 of the Java language specification:
String literals-or, more generally, strings that are the values of constant expressions (§15.28)-are "interned" so as to share unique instances, using the method String.intern.
The reason it returns the same is because of memory optimizations (that aren't always guaranteed to occur) strings with the same content will point to the same memory area to save space. In the case of static objects they will always point to the same thing (as there is only one of it because of the static keyword). Again don't rely on the above and use Equals() instead.
One thing I should point out from Jon Skeet is that it is always guaranteed for compile time constants. But again just use equals() as it is clearer to read.
It is due to string intern pooling
See
whats-the-difference-between-equals-and
The == operator does always compares references in Java and never contents. What can happen is that once you declare a string literal, this object is sent to the JVM's string pool and if you reuse the same literal the same object is going to be placed in there. A simple test for this behavior can be seen in the following code snippet:
String a = "a string";
String b = "a string";
System.out.println( a == b ); // will print true
String c = "other string";
String d = new String( "other string" );
System.out.println( c == d ); // will print false
The second case prints false because the variable d was initialized with a directly created String object and not a literal, so it will not go to the String pool.
The string pool is not part of the java specification and trusting on it's behavior is not advised. You should always use equals to compare objects.
I guess == has been defined the same as equals to prevent its misuse
Wrong. What is happening here is that when the compiler sees that you are using the same string in two different places it only stores it in the program's data section once. Read in a string or create it from smaller strings and then compare them.
Edit: Note that when I say "same string" above, I'm referring only to string literals, which the compiler knows at runtime.

Categories