Does string pool store literals or objects? - java

Stackoverflow is full of questions related to different types of String initialization. I understand how different is String s = "word" to String s = new String("word"). So no need to 'touch' that topic.
I noticed that different people refer that String pool stores constants/objects/literals.
Constants are understandable, as they are final, so they always 'stay' there. Yes, also duplicates aren't stored in SCP.
But I can't understand does SCP store objects or literals. They are totally different concepts. Object is an entity, while literal is just a value. So what is the correct answer to this. Does SCP store objects or literals? I know it can't be both :)

Literals are a chunk of source code that is delimited by ". For example, in the following line of source code:
String s = "Hello World";
"Hello World" is a string literal.
Objects are a useful abstraction for a meaningful bits of memory with data that (when grouped together) represents something, whether it be a Car, Person, or String.
The string pool stores String objects rather than String literals, simply because the string pool does not store source code.
You might hear people say "the string pool stores string literals". They (probably) don't mean that the string pool somehow has the source code "Hello World" in it. They (probably) mean that all the Strings represented by string literals in your source code will get put into the string pool. In fact, the Strings produced by constant expressions in your source code also gets added to the string pool automatically.

Strictly speaking, "literal" is not a value; It is a syntactic form. A String literal in Java is a double quote followed by some non-double-quote (or escaped double quote) characters, ending in another double quote. A "literal value" is a value that is created from a source-code literal, as opposed to an evaluated value such as a.concat(b). The core difference is that the literal value can be identified at compilation time, while an evaluated value can only be known during execution. This allows the compiler to store the literal values inside the compiled bytecode. (Since constants initialised by literal values are also known by the compiler at compile time, evaluations that only use constants can also be computed at compile time.)
In colloquial speech one can refer to a literal value as a "literal", but that may be the source of your confusion - a value is a value, whether its origin is a literal, or an evaluation.
I know it can't be both
The distinction between a literal value and an evaluated value is separate from a distinction between an object value and a primitive value. "foo" is a literal String value (and since Strings are objects, it is also an object). 3 is a literal primitive (integer) value. If x is currently 7, then 18 - x evaluates to a non-literal primitive value of 11. If y is currently "world!", then "Hello, " + y evaluates to a non-literal, non-primitive value "Hello, world!".

Nice question. The answer can be found through how String::intern() was implemented. From javadoc:
* When the intern method is invoked, if the pool already contains a
* string equal to this {#code String} object as determined by
* the {#link #equals(Object)} method, then the string from the pool is
* returned. Otherwise, this {#code String} object is added to the
* pool and a reference to this {#code String} object is returned.
* <p>
So the String pool stores string object.
We can open the source code to confirm the answer. String::intern() is a native method and it's defined in StringTable::intern(), symbolTable.hpp
oop StringTable::intern(Handle string_or_null, jchar* name,
int len, TRAPS) {
unsigned int hashValue = hash_string(name, len);
int index = the_table()->hash_to_index(hashValue);
oop found_string = the_table()->lookup(index, name, len, hashValue);
// Found
if (found_string != NULL) {
ensure_string_alive(found_string);
return found_string;
}
... ...
Handle string;
// try to reuse the string if possible
if (!string_or_null.is_null()) {
string = string_or_null;
} else {
string = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
}
... ...
// Grab the StringTable_lock before getting the_table() because it could
// change at safepoint.
oop added_or_found;
{
MutexLocker ml(StringTable_lock, THREAD);
// Otherwise, add to symbol to table
added_or_found = the_table()->basic_add(index, string, name, len,
hashValue, CHECK_NULL);
}
ensure_string_alive(added_or_found);
return added_or_found;
}
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/f3108e56b502/src/share/vm/classfile/symbolTable.cpp

Related

In Java, when we print a string literal on to the terminal, does this string literal also be stored in the string pool?

I am aware that when we initialize a string literal to a variable this literal will be stored in the string pool by the JVM. Consider the piece of code below.
System.out.println("This is a string literal");
Does the string literal within the quotes also be stored in the string pool even if I don't initialize it to a variable?
I will preface this answer by saying that there is little practical use in gaining a deep understanding of the Java string pool. From a practical perspective, you just need to remember two things:
Don't use == to compare strings. Use equals, compareTo, or equivalent methods.
Don't use explicit String.intern calls in your code. If you want to avoid potential problems with duplicate strings, enable the string de-duplication feature that is available in modern Java GCs.
I am aware that when we initialize a string literal either using the 'new' keyword or not, this literal will be stored in the string pool by the JVM.
This is garbled.
Firstly, you don't "initialize" a string literal. You initialize a variable.
String hi = "hello"; // This initializes the variable `hi`.
Secondly you typically don't / shouldn't use a string literal with new.
String hi = new String("hello"); // This is bad. You should write this as above.
The normal use-case for creating a string using new is something like this:
String hi = new String(arrayOfCharacters, offset, count);
In fact, creation and interning of the String object that corresponds to a string literal, happens either at the first time that the literal is used in an expression or at an earlier time. The precise details (i.e. when it happens) are unspecified and (I understand) version dependent.
The first usage might be in a variable initialization, or it might be in something else; e.g. a method call.
So to your question:
Consider the piece of code below:
System.out.println("This is a string literal");
Does the string literal within the quotes also be stored in the string pool even if I do not initialize it?
Yes, it does. If that was the first time the literal was used, the code above may be the trigger for this to happen. But it could have happened previously; e.g. if the above code was run earlier.
As a followup, you asked:
Why does the String Pool collect string literals which are not stored in a variable and just displayed in the console?
Because the JLS 3.10.5 requires that the String objects which correspond to string literals are interned:
"Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern (§12.5)."
And you asked:
The Presence of the String Pool help optimize the program. By storing literals as such (which is actually not required because it is just to be displayed in the console), isn't it the case that it goes against its whole purpose (which is optimization)?
The original idea for interning and the string pool was to save memory. That made sense 25 years ago when the Java language was designed and originally specified. These days even a low-end Android phone has 1GB of RAM, and interning of string literals to save a few thousand bytes is kind of pointless. Except that the JLS says that this must happen.
But the answer is No, it doesn't go against the (original) purpose. This statement:
System.out.println("This is a string literal");
could be executed many times. You don't want / need to create a new String object for the literal each time that you execute it. The thing is that the JVM doesn't know what is going to happen.
Anyway, the interning must happen because that is what the spec says.

Difference between concatenation at run time and compile time in java

public class First
{
public static void main(String[] args)
{
String str1="Hello ",str2="World",str3="Hello World";
System.out.println(str3==("Hello "+"World")); //Prints true
System.out.println(str3==("Hello "+str2)); //Prints false
}
}
The reason of the above is given in JLS-
• Strings computed by constant expressions (§15.28) are computed at
compile time and then treated as if they were literals.
• Strings computed by concatenation at run time are newly created and
therefore distinct.
What I wanted to ask is-
Why the strings which are computed at run time differ from those which are computed at compile time?
Is it because of the memory allocation,one is allocated memory in heap and one in String pool or there is some other reason?Please clarify.
The compiler can't know what str2 contains because it would have to execute the code to know the contents of str2 when you are concatenating it with "Hello " (it could make some optimizations and inline it, since it doesn't change, but it doesn't do it).
Imagine a more complex scenario where str2 is something that a user typed in. Even if the user had typed "World" there was no way the compiler could've known that.
Therefore, it can't perform the comparison str3 == "Hello World" using the same "Hello World" from the constant pool that's assigned to str3 and used in the first comparison.
So the compiler will generate the concatenation by using StringBuilder and will end up creating another String with value Hello World, so the identity comparison will fail because one object is the one from the constant pool and the other one is the one that was just created.
You should use equals when comparing Objects and not the == operator.
Strings are immutable in Java. So, when you concatenate two strings, a third one is created at runtime to represent the concatenated value. So using == returns false as both arguments are pointing to different instances of String object.
For compile time scenario, due to compiler optimization, the concatenated string is already created, and at runtime, boht arguments of == are being represented by same instances. Hence, == returns true as both arguments point to same instance (reference).
The compiler recognizes that constants won't change and if you are using the + operator, will concatenate them in the compiled code.
That's why in first case it will run the execution as str3==("HelloWorld") since "Helloworld" literal is already present in the string pool they both will point at the same location in the String pool it will print true .
In case of str3==("Hello"+str2),the compiler won't check that str2 has World in it, it will consider it as a variable that can have any value so at run time they will create a new string variable which point to different HelloWorld than the str3 in the string pool, so it will print false.

Why are equal java strings taking the same address? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
String object creation using new and its comparison with intern method
I was playing around with Strings to understand them more and I noticed something that I can't explain :
String str1 = "whatever";
String str2 = str1;
String str3 = "whatever";
System.out.println(str1==str2); //prints true...that's normal, they point to the same object
System.out.println(str1==str3); //gives true..how's that possible ?
How is the last line giving true ? this means that both str1 and str3 have the same address in memory.
Is this a compiler optimization that was smart enough to detect that both string literals are the same ("whatever") and thus assigned str1 and str3 to the same object ? Or am I missing something in the underlying mechanics of strings ?
Because Java has a pool of unique interned instances, and that String literals are stored in this pool. This means that the first "whatever" string literal is exactly the same String object as the third "whatever" literal.
As the Document Says:
public String intern()
Returns a canonical representation for the
string object. A pool of strings, initially empty, is maintained
privately by the class String.
When the intern method is invoked, if the pool already contains a
string equal to this String object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this
String object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are
interned. String literals are defined in §3.10.5 of the Java Language
Specification
Returns: a string that has the same contents as this string, but is
guaranteed to be from a pool of unique strings.
http://www.xyzws.com/Javafaq/what-is-string-literal-pool/3
As the post says:
String allocation, like all object allocation, proves costly in both time and memory. The JVM performs some trickery while instantiating string literals to increase performance and decrease memory overhead. To cut down the number of String objects created in the JVM, the String class keeps a pool of strings. Each time your code create a string literal, the JVM checks the string literal pool first. If the string already exists in the pool, a reference to the pooled instance returns. If the string does not exist in the pool, a new String object instantiates, then is placed in the pool.
If you do:
String str1 = new String("BlaBla"); //In the heap!
String str2 = new String("BlaBla"); //In the heap!
then you're explicitly creating a String object through new operator (and constructor).
In this case you'll have each object pointing to a different storage location.
But if you do:
String str1 = "BlaBla";
String str2 = "BlaBla";
then you've implicit construction.
Two strings literals share the same storage if they have the same values, this is because Java conserves the storage of the same strings! (Strings that have the same value)
The javac compiler combines String literals which are the same in a given class file.
However at runtime, String literals are combined using the same approach as String.intern() This means even Strings in different class in different applications (in the same JVM which use the same object.

Is String s = "foobar" atomic?

Is String s = "foobar"; atomic?
Assigning a object-reference should be, but I'm no really sure.
Thanks.
Yes. All reference assignments are atomic in java.
Just note that a composite statement like String s = new String("foobar") is not atomic, because it comprises of an object creation and then an assignment separately.
Also note that "assignments to long and double variables may not be atomic", from JLS-17.7
Many great answers have already been give here. Still I want something more "official" about claims like "All reference assignments are atomic in java", and why String s = "foobar" does not create a new object at runtime. Here is what is written in Java Language Spec. (Abbr. JLS).
Below are some examples:
String str1 = "foo"; // line 1, atomic
String str2 = "foo" + "bar"; // line 2, atomic
String str3 = str1; // line 3, atomic
String str4 = str1 + str2; // line 4, not atomic
String str5 = new String("foobar"); // line 5, not atomic
Line 1 and line 2 are atomic because:
They are both constant expressions, and are computed at compile time. There are no object construction occurred at runtime.
JLS - 15.28: A constant expression is an expression denoting a value of primitive type or a String that does not complete abruptly and is composed using ... literals of primitive type and literals of type String.
JSL - 3.10.5: Strings computed by constant expressions (§15.28) are computed at compile time and then treated as if they were literals.
Writes to and reads of references are always atomic.
JLS - 17.7: Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
Line 3 is atomic because:
There is only a reference assignment in this line.
JLS - 17.7
Line 4 is not atomic because:
By concat two string variables, a new String object is created at runtime. Object construction is not atomic.
JLS - 15.8.1 - String Concatenation Operator +: The String object is newly created (§12.5) unless the expression is a constant expression (§15.28).
JLS - 3.10.5 - String literals: Strings computed by concatenation at run time are newly created and therefore distinct.
Line 5 is not atomic because:
A String object is constructed at runtime in this line.
Yes, but if you're worried about race conditions, you should at least be aware of 'synchronized' methods/blocks.
And note that this is not atomic because it contains two operations:
String s = string_a + string_b;

== operator does not compare references for String [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
String comparison and String interning in Java
I understand how String equals() method works but was surprised by some results I had with the String == operator.
I would have expected == to compare references as it does for other objects.
However distinct String objects (with the same content) == returns true and furthermore even for a Static String object (with the same content) which is obviously not the same memory address.
I guess == has been defined the same as equals to prevent its misuse
No, == does just compare references. However, I suspect you've been fooled by compile-time constants being interned - so two literals end up refererring to the same string object. For example:
String x = "xyz";
String y = "xyz";
System.out.println(x == y); // Guaranteed to print true
StringBuilder builder = new StringBuilder();
String z = builder.append("x").append("yz").toString();
System.out.printn(x == z); // Will print false
From section 3.10.5 of the Java language specification:
String literals-or, more generally, strings that are the values of constant expressions (§15.28)-are "interned" so as to share unique instances, using the method String.intern.
The reason it returns the same is because of memory optimizations (that aren't always guaranteed to occur) strings with the same content will point to the same memory area to save space. In the case of static objects they will always point to the same thing (as there is only one of it because of the static keyword). Again don't rely on the above and use Equals() instead.
One thing I should point out from Jon Skeet is that it is always guaranteed for compile time constants. But again just use equals() as it is clearer to read.
It is due to string intern pooling
See
whats-the-difference-between-equals-and
The == operator does always compares references in Java and never contents. What can happen is that once you declare a string literal, this object is sent to the JVM's string pool and if you reuse the same literal the same object is going to be placed in there. A simple test for this behavior can be seen in the following code snippet:
String a = "a string";
String b = "a string";
System.out.println( a == b ); // will print true
String c = "other string";
String d = new String( "other string" );
System.out.println( c == d ); // will print false
The second case prints false because the variable d was initialized with a directly created String object and not a literal, so it will not go to the String pool.
The string pool is not part of the java specification and trusting on it's behavior is not advised. You should always use equals to compare objects.
I guess == has been defined the same as equals to prevent its misuse
Wrong. What is happening here is that when the compiler sees that you are using the same string in two different places it only stores it in the program's data section once. Read in a string or create it from smaller strings and then compare them.
Edit: Note that when I say "same string" above, I'm referring only to string literals, which the compiler knows at runtime.

Categories