When are Java Strings interned? - java

Inspired by the comments on this question, I'm pretty sure that Java Strings are interned at runtime rather than compile time - surely just the fact that classes can be compiled at different times, but would still point to the same reference at runtime.
I can't seem to find any evidence to back this up. Can anyone justify this?

The optimization happens (or at least can happen) in both places:
If two references to the same string constant appear in the same class, I'd expect the class file to only contain one constant pool entry. This isn't strictly required in order to ensure that there's only one String object created in the JVM, but it's an obvious optimization to make. This isn't actually interning as such - just constant optimization.
When classes are loaded, the string pool for the class is added to the intern pool. This is "real" interning.
(I have a vague recollection that one of the bits of work for Java 7 around "small jar files" included a single string pool for the whole jar file... but I could be very wrong.)
EDIT: Section 5.1 of the JVM spec, "The Runtime Constant Pool" goes into details of this:
To derive a string literal, the Java
virtual machine examines the sequence
of characters given by the
CONSTANT_String_info structure.
If the method String.intern has
previously been called on an instance
of class String containing a sequence
of Unicode characters identical to
that given by the CONSTANT_String_info
structure, then the result of string
literal derivation is a reference to
that same instance of class String.
Otherwise, a new instance of class
String is created containing the
sequence of Unicode characters given
by the CONSTANT_String_info structure;
that class instance is the result of
string literal derivation. Finally,
the intern method of the new String
instance is invoked.

Runtime.
JLS and JVM specifications specify javac compilation to class files which contain constant declarations (in the Constant Pool) and constant usage in code (which javac can inline as primitive / object reference values). For compile-time String constants, the compiler generates code to construct String instances and to call String.intern() for them, so that the JVM interns String constants automatically. This is a behavioural requirement from JLS:
http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.28
Compile-time constant expressions of type String are always "interned" so as to share unique instances, using the method String.intern.
But these specs have neither the concept nor the definition of any particular String intern pool structures/references/handles whether compile time or runtime. (Of course, in general, the JVM spec does not mandate any particular internal structure for objects: http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.7)
The reason that no intern pool structures are mentioned is because they're handled entirely with the String class. The intern pool is a private static/class-level structure of the String class (unspecified by JLS & JVM specs & javadoc).
Objects are added to the intern pool when String.intern() is called at runtime. The intern pool is leveraged privately by the String class - when code create new String instances and calls String.intern(), the String class determines whether to reuse existing internal data. Optimisation can be carried out by the JIT compiler - at runtime.
There's no compile-time contribution here, bar the vanilla inlining of constant values.

Related

in which memory area method area , string constant pool reside in java 8?

I have read oracle document but there is nothing given regarding method area and string constant pool. I have doubt that where method area, string constant pool reside in memory in JDK 8 or 8+ .
The java language specification does not specify where this lives.
It also doesn't matter. These objects end up being created, there is no way to directly access them, which doesn't matter.
That's sort of how java works: The spec says what you can and cannot rely on, this gives room to JVM implementations to do whatever they want, so long as they fulfill the contract. "Where in memory..." is a question that in java doesn't matter, you can't manipulate memory directly at all.
Go back to why you think you need to know and find another way; any answer to this question would be specific to some implementation of the JVM, and therefore your code wouldn't be portable. That is, any version update to the JVM, or some alternative JVM implementation such as OpenJ9 rolls along and your code just breaks, probably with a raw core dump. That doesn't sound like a good idea.
In Java 8 and later:
the method area is in metaspace
the string pool is in the regular heap.
This is an implementation detail for Oracle and OpenJDK JVMs. Other implementations may be different. But it really doesn't matter where strings and code is stored. Your application doesn't need to know.
By the way, it is called the "string pool", not the "string constant pool".
All strings are constant in the sense that they are immutable.
Strings variables that are declared as static final (and are constant in that sense) are not necessarily in the string pool.
Not all strings in the string pool are static final.
Not all strings in the string pool are string literals or other compile-time constant values.

How does the JVM lookup the String in the String constant pool? [duplicate]

This question already has answers here:
What is the Java string pool and how is "s" different from new String("s")? [duplicate]
(5 answers)
Closed 7 years ago.
I want to understand the string pool more deeply. Please help me get to the source class file containing this implementation in Java.
The question is more of related to finding the source code or implementation of the String Pool to delve deeper on this concept to know more about some unknown or elusive things in it. This way we can make the use of strings even more efficiently or think of some other way to implement our own garbage collections in case we have an application creating so many literals and string objects.
I am sorry to disappoint you but the Java String-Pool is not an actual Java class but somewhere implemented in the JVM i.e. it is writen as C++ code.
If you look at the source code of the String class (pretty much all the way down) you see that the intern() method is native.
You will have to go through some JVM code to get more information.
Edit:
Some implementation can be found here (C++ header, C++ implementation). Search for StringTable.
Edit2: As Holger pointed out in the comments, this is not a hard requirement of the JVM implementation. So it is possible to have a JVM that implements the String Pool differently, e.g. using an actual Java class. Though all commonly used JVMs I am aware of implement it in the JVMs C++ code.
You can go through this article: Strings, Literally
When a .java file is compiled into a .class file, any String literals
are noted in a special way, just as all constants are. When a class is
loaded (note that loading happens prior to initialization), the JVM
goes through the code for the class and looks for String literals.
When it finds one, it checks to see if an equivalent String is already
referenced from the heap. If not, it creates a String instance on the
heap and stores a reference to that object in the constant table. Once
a reference is made to that String object, any references to that
String literal throughout your program are simply replaced with the
reference to the object referenced from the String Literal Pool.
So, in the example shown above, there would be only one entry in the
String Literal Pool, which would refer to a String object that
contained the word "someString". Both of the local variables, one and
two, would be assigned a reference to that single String object. You
can see that this is true by looking at the output of the above
program. While the equals() method checks to see if the String objects
contain the same data ("someString"), the == operator, when used on
objects, checks for referential equality - that means that it will
return true if and only if the two reference variables refer to the
exact same object. In such a case, the references are equal. From the
above output, you can see that the local variables, one and two, not
only refer to Strings that contain the same data, they refer to the
same object.

Mapping of Constant pool and method Area

I am trying to understand how the class file is loaded into method area and execute. I am very much confused about the constant pool.
when the constant pool is created initially? while compiling the
class file or when the class is loaded.
How the byte code is organized in method area What the method table
consists of?
Can anyone show the sketch the picture representation of mapping in
method area for clear understanding
Since the literal meaning of “constant pool” is just “pool of constants”, there are different things of the name, which are easy to confuse
Each class file has a constant pool describing all constants used in that class, which includes constant values but also symbolic references needed for linkage. Some entries fulfill both roles, e.g. class entries may serve as owner declaration for a symbolic reference to a member, needed when accessing a field or invoking a method, but may also be used to get a Class instance, e.g. for a class literal appearing in source code. Since it’s part of the class file, its format is specified within The Java® Virtual Machine Specification, §4 The class File Format, in §4.4. The Constant Pool.
As said by other answers, you can use the command javap -v class.name to inspect the constant pool of a class.
There is a corresponding data structure at runtime, also known as run-time constant pool. Since certain values are represented as runtime objects (e.g. of type String, Class, MethodType, or MethodHandle), and symbolic references must be resolved to the runtime representation of the denoted classes and members, this structure is not the same as the byte sequence found in the class file. But these entries correspond, so that each time, an object is instantiated for a constant or a symbolic reference is resolved, the result can be remembered and reused the next time the same constant entry is accessed.
This doesn’t imply that an implementation must have a 1:1 representation of each class’ constant pool. It’s possible that a specific implementation maps a class’ pool to a shared pool used for a all classes of the same class loading context, where each symbolic reference resolves to the same target.
There’s also the string pool, which can be seen as part of the runtime constant pool, holding references to all String instances associated with string constants, to allow resolving all identical string constants of all classes to the same String instance.
When a Java file is compiled, all references to variables and methods are stored in the class's constant pool as a symbolic reference.
Here is a link for your reference : What is the purpose of the Java Constant Pool?
javac creates a constant pool when you compile your source to .class file. You can see it if you make
javap -v MyClass
to your MyClass.class
The Java Virtual Machine has a method area that is shared among all Java Virtual Machine threads.
You can see bytecode of your class file by
'javap -c -v Main'
Method Area is just a part of the heap where JVM has all information about this class.

How String s ="sometext" works?

In object oriented languages, creation of new object is by using the new keyword (since memory allocation in java done dynamically).
Even though String is a class how its object is created without the new Keyword?
Even though it uses string pooling I am not able to understand it clearly:
"It is possible to create a user defined class where we can initialize variable directly like String"
The mechanism enabling you to create String objects with string literals is built into the compiler and JVM. It is not available for use with objects of user-defined types.
When you write for the first time
String s = "sometext";
the compiler emits two things:
A constant pool entry with "sometext" in it, and
An instruction that sets s to reference to the entry in the constant table.
If you write
String t = "sometext";
in the same class, the compiler will reuse an existing constant for "sometext", rather than creating a new one.
At runtime, JVM creates a new String object for each entry from the constant table, and gives your program access to them. Essentially, JVM invokes new on your program's behalf, and hands it a ready-to-use object.
Similar system is in play when you create instances of primitive wrappers with autoboxing. The common thing, however, is that it requires support from the compiler, and is not available for user-defined types.
In Java, Strings are immutable and optionally pooled ("interned").
"sometext" is just an instance of a String that comes from the static pool.
You cannot create a user-defined String because java.lang.String is a final class, exactly for the reason of immutability (you can share duplicates by pointing them to a single instance).

automatically interning of string literals

In the source code of
com.sun.org.apache.xerces.internal.impl.XMLScanner at line 183 and 186
183 protected final static String fVersionSymbol = "version".intern();
186 protected final static String fEncodingSymbol = "encoding".intern();
Why "version" and "encoding" are explicitly interned by using intern() while they are string literals and would get automatically interned?
I've tracked down the change to revision 318617 in the Apache Xerces SVN Repository (this is the project where this XML parser was initially developed, as the package name suggests).
The relevant part of the commit message is:
Trying to improve the use of symbol tables. Many predefined Strings are
added to symbol tables every time the parser is reset. For small documents,
this would be a significant cost. Now since we call String#intern for Strings
in the symbol table, it's sufficient to use String#intern for those predefined
symbols. This only needs to be performed once.
As you noted, the .intern() should not be necessary (and should have no visible effect) on a conforming JVM implementation.
My guess is that
either the author was not aware of the fact that string literals will always be interned
or it was a conscious decision to ward against a misbehaving JVM implementation
In the second case I'd expect some note of that in a comment or in the comment message, however.
One side-effect of that .intern() call is that initializers are no longer constant expressions and the fields will not be inlined by other classes referencing them.That will ensure that the class XMLScanner is loaded and its field read. I don't think this is relevant here, however.
I don't believe there's any good reason for that, for the reason you identified: Literals are always automatically interned, as defined by the String class:
All literal strings and string-valued constant expressions are interned. String literals are defined in section 3.10.5 of the The Java™ Language Specification.

Categories