String POOL in java - java

Java has string pool, due to which objects of string class are immutable.
But my question stands -
What was the need to make String POOL?
Why string class was not kept like other class to hold its own values?
Is internally JVM need some strings or is this a performance benefit. If yes how?

A pool is possible because the strings are immutable. But the immutability of the String hasn't been decided only because of this pool. Immutability has numerous other benefits. BTW, a Double is also immutable, and there is no pool of Doubles.
The need for the String pool is to reduce the memory needed to hold all the String literals (and the interned Strings) a program uses, since these literals have a good chance of being used many times, in many places of the program. Instead of having thousands of copies of the same String literal, you just have thousand references to the same String, which reduces the memory usage.
Note that the String class is not different from other classes: it holds its own char array. It may also share it with other String instances, though, when substring is called.

When we compiler see's that a new String literal has to be created,it first check's the pool for an identical string,if found no new String literal is created,the existing String is referred.

the benifit of making string as immutable was for the security feature. Read below
Why String has been made immutable in Java?
Though, performance is also a reason (assuming you are already aware of the internal String pool maintained for making sure that the same String object is used more than once without having to create/re-claim it those many times), but the main reason why String has been made immutable in Java is 'Security'. Surprised? Let's understand why.
Suppose you need to open a secure file which requires the users to authenticate themselves. Let's say there are two users named 'user1' and 'user2' and they have their own password files 'password1' and 'password2', respectively. Obviously 'user2' should not have access to 'password1' file.
As we know the filenames in Java are specified by using Strings. Even if you create a 'File' object, you pass the name of the file as a String only and that String is maintained inside the File object as one of its members.
Had String been mutable, 'user1' could have logged into using his credentials and then somehow could have managed to change the name of his password filename (a String object) from 'password1' to 'password2' before JVM actually places the native OS system call to open the file. This would have allowed 'user1' to open user2's password file. Understandably it would have resulted into a big security flaw in Java. I understand there are so many 'could have's here, but you would certainly agree that it would have opened a door to allow developers messing up the security of many resources either intentionally or un-intentionally.
With Strings being immutable, JVM can be sure that the filename instance member of the corresponding File object would keep pointing to same unchanged "filename" String object. The 'filename' instance member being a 'final' in the File class can anyway not be modified to point to any other String object specifying any other file than the intended one (i.e., the one which was used to create the File object).

What was the need to make String POOL?
When created, a String object is stored in heap, and the String literal, that is sent in the constructor, is stored in SP. Thats why using String objects is not a good practice. Becused it creates two objects.
String str = new String("stackoverflow");
Above str is saved in heap with the reference str, and String literal from the constructor -"stackoverflow" - is stored in String Pool. And that is bad for performance. Two objects are created.
The flow: Creating a String literal -> JVM looks for the value in the String Pool as to find whether same value exists or not (no object to be returned) -> The value is not find -> The String literal is created as a new object (internally with the new keyword) -> But now is not sent to the heap , it is send instead in String Pool.
The difference consist where the object is created using new keyword. If it is created by the programmer, it send the object in the heap, directly, without delay. If it is created internally it is sent to String Poll. This is done by the method intern(). intern() is invoke internally when declaring a String literal. And this method is searching SP for identical value as to return the reference of an existing String object or/and to send the object to the SP.
When creating a String obj with new, intern() is not invoked and the object is stored in heap. But you can call intern() on String obj's: String str = new String().intern(); now the str object will be stored in SP.
ex:
String s1 = new String("hello").intern();
String s2 = "hello";
System.out.println(s1 == s2); // true , because now s1 is in SP

Related

Java String Pool with String constructor and the intern function

I learned about the Java String Pool recently, and there's a few things that I don't quiet understand.
When using the assignment operator, a new String will be created in the String Pool if it doesn't exist there already.
String a = "foo"; // Creates a new string in the String Pool
String b = "foo"; // Refers to the already existing string in the String Pool
When using the String constructor, I understand that regardless of the String Pool's state, a new string will be created in the heap, outside of the String Pool.
String c = new String("foo"); // Creates a new string in the heap
I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap.
String d = new String("bar"); // Creates a new string in the String Pool and in the heap
I didn't find any further information about this, but I would like to know if that's true.
If that is indeed true, then - why? Why does java create this duplicate string? It seems completely redundant to me since the strings in java are immutable.
Another thing that I would like to know is how the .intern() function of the String class works: Does it just return a pointer to the string in the String Pool?
And finally, in the following code:
String s = new String("Hello");
s = s.intern();
Will the garbage collector delete the string that is outside the String Pool from the heap?
You wrote
String c = new String("foo"); // Creates a new string in the heap
I read somewhere that even when using the constructor, the String Pool is being used. It
will insert the string into the String Pool and into the heap.
That’s somewhat correct, but you have to read the code correctly. Your code contains two String instances. First, you have the string literal "foo" that evaluates to a String instance, the one that will be inserted into the pool. Then, you are creating a new String instance explicitly, using new String(…) calling the String(String) constructor. Since the explicitly created object can’t have the same identity as an object that existed prior to its creation, two String instances must exist.
Why does java create this duplicate string? It seems completely redundant to me since the strings in java are immutable.
Well it does so, because you told it so. In theory, this construction could get optimized, skipping the intermediate step that you can’t perceive anyway. But the first assumption for a program’s behavior should be that it does precisely what you have written.
You could ask why there’s a constructor that allows such a pointless operation. In fact, this has been asked before and this answer addresses this. In short, it’s mostly a historical design mistake, but this constructor has been used in practice for other technical reasons; some do not apply anymore. Still, it can’t be removed without breaking compatibility.
String s = new String("Hello");
s = s.intern();
Will the garbage collector delete the string that is outside the String Pool from the heap?
Since the intern() call will evaluate to the instance that had been created for "Hello" and is distinct from the instance created via new String(…), the latter will definitely be unreachable after the second assignment to s. Of course, this doesn’t say whether the garbage collector will reclaim the string’s memory only that it is allowed to do so. But keep in mind that the majority of the heap occupation will be the array that holds the character data, which will be shared between the two string instances (unless you use a very outdated JVM). This array will still be in use as long as either of the two strings is in use. Recent JVMs even have the String Deduplication feature that may cause other strings of the same contents in the JVM use this array (to allow collection of their formerly used array). So the lifetime of the array is entirely unpredictable.
Q: I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap. [] I didn't find any further information about this, but I would like to know if that's true.
It is NOT true. A string created with new is not placed in the string pool ... unless something explicitly calls intern() on it.
Q: Why does java create this duplicate string?
Because the JLS specifies that every new generates a new object. It would be counter-intuitive if it didn't (IMO).
The fact that it is nearly always a bad idea to use new String(String) is not a good reason to make new behave differently in this case. The real answer is that programmers should learn not to write that ... except in the extremely rare cases that that it is necessary to do that.
Q: Another thing that I would like to know is how the intern() function of the String class works: Does it just return a pointer to the string in the String Pool?
The intern method always returns a pointer to a string in the string pool. That string may or may not be the string you called intern() or.
There have been different ways that the string pool was implemented.
In the original scheme, interned strings were held in a special heap call the PermGen heap. In that scheme, if the string you were interning was not already in the pool, then a new string would be allocated in PermGen space, and the intern method would return that.
In the current scheme, interned strings are held in the normal heap, and the string pool is just a (private) data structure. When the string being interned a not in the pool, it is simply linked into the data structure. A new string does not need to be allocated.
Q: Will the garbage collector delete the string that is outside the String Pool from the heap?
The rule is the same for all Java objects, no matter how they were created, and irrespective of where (in which "space" or "heap" in the JVM) they reside.
If an object is not reachable from the running application, then it is eligible for deletion by the garbage collector.
That doesn't mean that an unreachable object will be be garbage collected in any particular run of the GC. (Or indeed ever ... in some circumstances.)
The above rule equally applies to the String objects that correspond to string literals. If it ever becomes possible that a literal can never be used again, then it may be garbage collected.
That doesn't normally happen. The JVM keeps a hidden references to each string literal object in a private data structure associated with the class that defined it. Since classes normally exists for the lifetime of the JVM, their string literal objects remain reachable. (Which makes sense ... since the application may need to use them.)
However, if a class is loaded using a dynamically created classloader, and that classloader becomes unreachable, then so will all of its classes. So it is actually possible for a string literal object to become unreachable. If it does, it may be garbage collected.

Why String Deduplication when we have String Pool

String De-duplication:
Strings consume a lot of memory in any application.Whenever the garbage collector visits String objects it takes note of the char arrays. It takes their hash value and stores it alongside with a weak reference to the array. As soon as it finds another String which has the same hash code it compares them char by char.If they match as well, one String will be modified and point to the char array of the second String. The first char array then is no longer referenced anymore and can be garbage collected.
String Pool:
All strings used by the java program are stored here. If two variables are initialized to the same string value. Two strings are not created in the memory, there will be only one copy stored in memory and both will point to the same memory location.
So java already takes care of not creating duplicate strings in the heap by checking if the string exists in the string pool. Then what is the purpose of string de-duplication?
If there is a code as follows
String myString_1 = new String("Hello World");
String myString_2 = new String("Hello World");
two strings are created in memory even though they are same. I cannot think of any scenario other than this where string de-duplication is useful. Obviously I must be missing something. What I am I missing?
Thanks In Advance
The string pool applies only to strings added to it explicitly, or used as constants in the application. It does not apply to strings created dynamically during the lifetime of the application. String deduplication, however, applies to all strings.
Compile time vs run time
String pool refers to string constants that are known at compile time.
String deduplication would help you if you happen to retrieve (or construct) the same string a million times at run time, e.g. reading it from a file, a HTTP request or any other way.
String de-duplication enjoys the extra level of indirection built into String:
With a string pool, you are limited to returning the same object for two identical strings
String de-duplication lets you have multiple distinct String objects sharing the same content.
This translates into removing a limitation of de-duplicating on creation: your application could keep creating new String objects with identical content while using very little extra memory, because the content of the strings would be shared. This process can be done on a completely unrelated schedule - for example, in the background, while your application does not need much of the CPU resources. Since the identity of the String object does not change, de-duplication can be completely hidden form your application.
Just to add to the answers above, on older VM's the string pool is not garbage collected (this has changed now, but don't rely on that). It contains strings which are used as constants in the application, and so will always be needed. If you continually put all your strings in the string pool, you might quickly run out of memory. On top of that, de-duplication is a relatively expensive process, if you know you only need the string for a very short period of time, and you have enough memory.
For these reasons, strings are not put in the string pool automatically. You have to do it explicitly by calling string.intern().
From documentation :
"Initializes a newly created String object so that it represents
the same sequence of characters as the argument; in other words, the
newly created string is a copy of the argument string. Unless an
explicit copy of original is needed, use of this constructor is
unnecessary since Strings are immutable."
So my sense says, this constructor in String class is not needed normally like you have used above. I guess that constructor is provided merely for the sake of completeness or if you do not want to share that copy (kind of unnecessary now, refer here what I am talking about) but still other constructors are useful like getting an String object from char array and so on..
I cannot think of any scenario other than this where string de-duplication is useful.
Well one other (much more) frequent scenario is the use of StringBuilders. In the toString() method of the StringBuilder class, it clearly creates a new instance in memory:
public final class StringBuilder extends AbstractStringBuilder
implements java.io.Serializable, CharSequence
{
...
#Override
public String toString() {
// Create a copy, don't share the array
return new String(value, 0, count);
}
...
}
Same thing for its thread-safe version StringBuffer:
public final class StringBuffer extends AbstractStringBuilder
implements java.io.Serializable, CharSequence
{
...
#Override
public synchronized String toString() {
if (toStringCache == null) {
toStringCache = Arrays.copyOfRange(value, 0, count);
}
return new String(toStringCache, true);
}
...
}
In applications that rely heavily on this, string de-duplication may reduce memory usage.

Does String have any internal mechanism to check existence of object in String constant pool?

I know that When i do below:
String s = "abc";
JVM checks for the "abc" object on string constant pool and if not present, it will create the object and its reference will be returned to s variable.
But here i want to get clear one thing that Do String have any internal mechanism to check existence of such object.
JVM checks for the "abc" object on string constant pool and if not present, it will create the object and its reference will be returned to s variable.
Wrong. Any string literal is in the constant pool, placed there by the compiler and classloader.
But here i want to get clear one thing that Do String have any internal mechanism to check existence of such object.
It has an external mechanism: String.intern(). But that has nothing to do with the code you posted.
You can use String.intern() which the Javadoc says (in part) returns a canonical representation for the string object.
String s = new String("abc").intern();
Or,
String s = "abc".intern();
Do String have any internal mechanism to check existence of such object.
Internally ... obviously yes, because otherwise intern() would not work.
But this functionality is within the JVM native implementation of intern etcetera, and is not exposed in any public java APIs.
I would argue that String.intern() is not a valid test. Sure, if you call intern on a string and that string object is already in the pool, then you will get the same string object back. But that doesn't give you an answer in the case where your string object isn't in the pool. In that case, you can't tell whether the result string was in the pool before your call ... or not. In the latter case, you have added a string to the pool as a side-effect of the intern() call.
Simple test to test the above feature
String one="abc";
String two="abc";
System.out.println(one==two);
Output is
true
Which implies both variables one and two are referencing the same object address.

Is "new String()" immutable as well?

I've been studying Java String for a while. The following questions are based on the below posts
Java String is special
Immutability of String in java
Immutability:
Now, going by the immutability, the String class has been designed so that the values in the common pool can be reused in other places/variables. This holds good if the String was created as
String a = "Hello World!";
However, if I create String like
String b = new String("Hello World!");
why is this immutable as well? (or is it?). Since this has a dedicated heap memory, I should be able to modify this without affecting any other variable. So by design, was there any other reason why String as a whole is considered immutable? Or is my above assumption wrong?
Second thing I wanted to ask was about the common string pool. If I create a string object as
String c = "";
is an empty entry created in the pool?
Is there any post already on these? If so, could someone share the link?
new String() is an expression that produces a String ... and a String is immutable, no matter how it is produced.
(Asking if new String() is mutable or not is nonsensical. It is program code, not a value. But I take it that that is not what you really meant.)
If I create a string object as String c = ""; is an empty entry created in the pool?
Yes; that is, an entry is created for the empty string. There is nothing special about an empty String.
(To be pedantic, the pool entry for "" gets created long before your code is executed. In fact, it is created when your code is loaded ... or possibly even earlier than that.)
So, I was wanted to know whether the new heap object is immutable as well, ...
Yes it is. But the immutability is a fundamental property of String objects. All String objects.
You see, the String API simply does not provide any methods for changing a String. So (apart from some dangerous and foolish1 tricks using reflection), you can't mutate a String.
and if so what was the purpose?.
The primary reason that Java String is designed as an immutable class is simplicity. It makes it easier to write correct programs, and read / reason about other people's code if the core string class provides an immutable interface.
An important second reason is that the immutability of String has fundamental implications for the Java security model. But I don't think this was a driver in the original language design ... in Java 1.0 and earlier.
Going by the answer, I gather that other references to the same variable is one of the reasons. Please let me know if I am right in understanding this.
No. It is more fundamental than that. Simply, all String objects are immutable. There is no complicated special case reasoning required to understand this. It just >>is<<.
For the record, if you want a mutable "string-like" object in Java, you can use StringBuilder or StringBuffer. But these are different types to String.
1 - The reason these tricks are (IMO) dangerous and foolish is that they affect the values of strings that are potentially shared by other parts of your application via the string pool. This can cause chaos ... in ways that the next guy maintaining your code has little chance of tracking down.
String is immutable irrespective of how it is instantiated
1) Short answer is yes, new String() is immutable too.
Because every possible mutable operation (like replace,toLowerCase etcetra) that you perform on String does not affect the original String instance and returns you a new instance.
You may check this in Javadoc for String. Each public method of String that is exposed returns a new String instance and does not alter the present instance on which you called the method.
This is very helpful in Multi-threaded environment as you don't have to think about mutability (someone will change the value) every time you pass or share the String around. String can easily be the most used data type, so the designers have blessed us all to not think about mutability everytime and saved us a lot of pain.
Immutability allowed String pool or caching
It is because of immutability property that the internal pool of string was possible, as when same String value is required at some other place then that immutable reference is returned. If String would have been mutable then it would not have been possible to share Strings like this to save memory.
String immutablity was not because of pooling, but immutability has more benefits attached to it.
String interning or pooling is an example of Flyweight Design pattern
2) Yes it will be interned like any other String as a blank String is also as much a String as other String instances.
References:
Immutability benefits of String
The Java libraries are heavily optimized around the constraint that any String object is immutable, regardless of how that object is constructed. Even if you create your b using new, other code that you pass that instance to will treat the value as immutable. This is an example of the Value Object pattern, and all of the advantages (thread-safety, no need to make private copies) apply.
The empty string "" is a legitimate String object just like anything else, it just happens to have no internal contents, and since all compile-time constant strings are interned, I'll virtually guarantee that some runtime library has already caused it to be added to the pool.
1) The immutable part isn't because of the pool; it just makes the pool possible in the first place. Strings are often passed as arguments to other functions or even shared with other threads; Making Strings immutable was a design decision to make reasoning in such situations easier. So yes - Strings in Java are always immutable, no matter how you create them (note that it's possible to have mutable strings in java - just not with the String class).
2) Yes. Probably. I'm not actually 100% sure, but that should be the case.
This isn't strictly an answer to your question, but if behind your question is a wish to have mutable strings that you can manipulate, you should check out the StringBuilder class, which implements many of the exact same methods that String has but also adds methods to change the current contents.
Once you've built your string in such a way that you're content with it, you simply call toString() on it in order to convert it to an ordinary String that you can pass to library routines and other functions that only take Strings.
Also, both StringBuilder and String implements the CharSequence interface, so if you want to write functions in your own code that can use both mutable and immutable strings, you can declare them to take any CharSequence object.
From the Java Oracle documentation:
Strings are constant; their values cannot be changed after they are
created.
And again:
String buffers support mutable strings. Because String
objects are immutable they can be shared.
Generally speaking: "all primitive" (or related) object are immutable (please, accept my lack of formalism).
Related post on Stack Overflow:
Immutability of Strings in Java
Is Java String immutable?
Is Java "pass-by-reference" or "pass-by-value"?
Is a Java string really immutable?
String is immutable. What exactly is the meaning?
what the reason behind making string immutable in java?
About the object pool: object pool is a java optimization which is NOT related to immutable as well.
String is immutable means that you cannot change the object itself no matter how you created it.And as for the second question: yes it will create an entry.
It's the other way around, actually.
[...] the String class has been designed so that the values in the common pool can be reused in other places/variables.
No, the String class is immutable so that you can safely reference an instance of it, without worrying about it being modified from another part of your program. This is why pooling is possible in the first place.
So, consider this:
// this string literal is interned and referenced by 'a'
String a = "Hello World!";
// creates a new instance by copying characters from 'a'
String b = new String(a);
Now, what happens if you simply create a reference to your newly created b variable?
// 'c' now points to the same instance as 'b'
String c = b;
Imagine that you pass c (or, more specifically, the object it is referencing) to a method on a different thread, and continue working with the same instance on your main thread. And now imagine what would happen if strings were mutable.
Why is this the case anyway?
If nothing else, it's because immutable objects make multithreading much simpler, and usually even faster. If you share a mutable object (that would be any stateful object, with mutable private/public fields or properties) between different threads, you need to take special care to ensure synchronized access (mutexes, semaphores). Even with this, you need special care to ensure atomicity in all your operations. Multithreading is hard.
Regarding the performance implications, note that quite often copying the entire string into a new instance, in order to change even a single character, is actually faster than inducing an expensive context switch due to synchronization constructs needed to ensure thread-safe access. And as you've mentioned, immutability also offers interning possibilities, which means it can actually help reduce memory usage.
It's generally a pretty good idea to make as much stuff immutable as your can.
1) Immutability: String will be immutable if you create it using new or other way for security reasons
2) Yes there will be a empty entry in the string pool.
You can better understand the concept using the code
String s1 = new String("Test");
String s2 = new String("Test");
String s3 = "Test";
String s4 = "Test";
System.out.println(s1==s2);//false
System.out.println(s1==s3);//false
System.out.println(s2==s3);//false
System.out.println(s4==s3);//true
Hope this will help your query. You can always check the source code for String class in case of better understanding Link : http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java
1-String is immutable.See this:
Is a Java string really immutable?
So you can create it with many way.
2-Short answer: Yes will be empty.
Strings created will always be immutable regardless of how they are created.
Answers to your questions:
The only difference is:
When string is created like -- {String a = "Hello World!";} then only one object gets created.
And when it is created like -- {String b = new String("Hello World!");} then two objects get created. The first one, because you have used 'new' keyword and the second one because of the String property.
Yes, for sure. There will be an empty entry created in the pool.
String is immutable because it does not provide you with a mean to modify it. It is design to avoid any tampering (it is final, the underlying array is not supposed to be touched ...).
Identically, Integer is immutable, because there is no way to modify it.
It does not matter how you create it.
Immutability is not a feature of new, it is a feature of the class String. It has no mutator methods, so it immutable.
String A = "Test"
String B = "Test"
Now String B called"Test".toUpperCase()which change the same object into"TEST", soAwill also be"TEST"` which is not desirable.
Note that in your example, the reference is changed and not the object it refers to, i.e. b as a reference may be changed and refer to a new object. But that new object is immutable, which means its contents will not be changed "after" calling the constructor.
You can change a string using b=b+"x"; or b=new String(b);, and the content of variable a seem to change, but don't confuse immutability of a reference (here variable b) and the object it is referring to (think of pointers in C). The object that the reference is pointing to will remain unchanged after its creation.
If you need to change a string by changing the contents of the object (instead of changing the reference), you can use StringBuffer, which is a mutable version of String.

When does StringBuffer adds strings to the String Pool?

When I define a StringBuffer variable with new, this string is not added to the String pool, right?
Now, when I define another StringBuffer but not with new, I define it as StrPrev.append("XXX") suddenly it is.(or so says my college teacher). Why is that? What makes this string to suddenly become a string-pool string?
When I define a StringBuffer variable with new, this string is not added to the String pool, right?
Creating a StringBuffer does not create a String at all.
Now, when I define another StringBuffer but not with new, I define it as StrPrev.append("XXX") suddenly it is.
This is totally confused:
When you call strBuff.append("XXX") you are NOT defining a new StringBuffer. You are updating the existing StringBuffer that strBuff refers to. Specifically, you are adding extra characters to the end of the buffer.
You only get a new String from the StringBuffer when you call strBuff.toString().
You only add a String to the string pool when you call intern() on the String. And that only adds the string to the pool if there is not already an equal string in the pool.
The String object that represents the literal "XXX" is a member of the string pool. But that happens (i.e. the String is added to the pool) when the class is loaded, not when you execute the append call.
(If you teacher told you that StringBuffer puts strings into the Java string pool, he / she is wrong. But, given your rather garbled description, I suspect that you actually misheard or misunderstood what your teacher really said.)
"XXX" in StrPrev.append("XXX") is a string literal that is interned at class loading time (class loading time of the class that contains the code).
"XXX" is not added to the pool by the StringBuffer.
From the JLS section 3.10.5:
Moreover, a string literal always refers to the same instance of class
String. This is because string literals - or, more generally, strings
that are the values of constant expressions (§15.28) - are "interned"
so as to share unique instances
From the JLS section 12.5:
Loading of a class or interface that contains a String literal
(§3.10.5) may create a new String object to represent that literal.
(This might not occur if the same String has previously been interned
(§3.10.5).)
buf.append("XXX") followed by buf.toString(), and then returning the string to the pool. With the pool in place, only one StringBuffer object is ever allocated.
Actually your teacher is referring to XXX . which goes to StringPool because all string literals written in java program goes to StringPool while execution...

Categories