Java String.split memory leak?

Java String.split memory leak? - java

I found that using String.substring is known for memory issues related to String.split.
Is there a memory leak in using String.split?
If yes what is the work-around for it?
Following link show correct usage of substring in Java.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4513622
One more blog which talk about possible MLK in substring.
http://nflath.com/2009/07/the-dangers-of-stringsubstring/

Update: Behavior has changed in 1.7.0_06: See this article: Changes to String internal representation made in Java 1.7.0_06 at java-performance.info.
As pointed out by #finnw there is indeed kind of a memory leak lurking around when using String.substring. The reason is that String.substring only returns a view of a portion of the given string, i.e., the underlying string is still kept in memory.
To force the creation of a new string, unrelated to the source, you'll have to use the new keyword. I.e., you'll have to do for instance
String[] parts = orig.split(";");
//String mySubstring = parts[i]; // keeps orig from being GC'd
String mySubstring = new String(parts[i]); // creates a new string.
or, perhaps more direct
String mySubstring = new String(orig.split(";")[i]);
I must say that this behavior seems "unnecessary" to me. It should be solvable using weak references or some other technique. (Especially considering that String is already a special class, part of the Java language specification.)

Related

What's the best way to set a StringBuilder object?

I created a StringBuilder object without any value and appended some values afterwards. I later wanted to replace the object with an entirely different string.
Here's the code:
StringBuilder finalVersion = new StringBuilder();
finalVersion.append("6.0")
for (int i = 0; i < list.length(); i++) {
if(/*...*/){
finalVersion.append(".2");
} else {
finalVersion.append(".1");
}
if (/*...*/) {
if (/*...*/) {
finalVersion = new StringBuilder("Invalid parameter"));
}
}
}
What I've done is I created a new object to change the value, but perhaps there is a better approach without using stringBuilder.setLength(0);.
Could someone help?

To solve it I created a new object to change the value but I guess there is a better approach without use sb.setLength(0)
Both of those are good approaches. Another is sb.delete(0, sb.length()).
And when you want to replace the existing string content, sb.replace(0, sb.length(), "new content") is another option.
It really depends what your goal is:
Calling new StringBuffer(...) will give you a freshly allocated object with either the default capacity or a capacity that you specify.
The setLength, delete and replace approaches will recycle the existing object. This has advantages and disadvantages.
On the plus side, you don't allocate a new object1 ... so less garbage.
On the minus side, the string buffer uses the same amount of space as before, whether or not it needs to. Also, if you do this repeatedly, the chances are that the buffer and its backing array will be tenured by the GC, adding to the long-term memory load. You can free the unused capacity by calling sb.trimToSize(), but that is liable to cause a reallocation; i.e. what you were trying to avoid by not using new.
My advice would be to use new unless either the context means that you
can't, or your profiling tells you that new is generating too much garbage.
Looking at the code2, I think that setLength should be marginally faster than delete for emptying a StringBuffer. It gets more complicated when you are replacing the contents with new contents. Now you are comparing
sb.setLength(); sb.append("new content");
versus
sb.replace(0, sb.length(), "new content");
It needs to be measured ... if you care enough3 about performance to be comparing the cases.
1 - Unless the replacement string is large enough that the buffer needs to grow to hold it.
2 - By my reading of various versions of the StringBuilder and AbstractStringBuilder code, the delete method will always call System.arraycopy. However, to understand the performance impact, one would need to benchmark this carefully for different sizes of StringBuilder and across different Java versions.
3 - Actually, if you need to care. Beware the evils of premature optimization.

You can use replace. This method doesn't create a new StringBuilder object.
builder.replace(0, b.length(), "Invalid parameter");
Or, if doing it in two statements is fine with you, you can setLength(0), then append.
But really, you shouldn't worry about creating a new StringBuilder unless you actually encounter performance problems.

Is a Java string really immutable?

We all know that String is immutable in Java, but check the following code:
String s1 = "Hello World";
String s2 = "Hello World";
String s3 = s1.substring(6);
System.out.println(s1); // Hello World
System.out.println(s2); // Hello World
System.out.println(s3); // World
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
char[] value = (char[])field.get(s1);
value[6] = 'J';
value[7] = 'a';
value[8] = 'v';
value[9] = 'a';
value[10] = '!';
System.out.println(s1); // Hello Java!
System.out.println(s2); // Hello Java!
System.out.println(s3); // World
Why does this program operate like this? And why is the value of s1 and s2 changed, but not s3?

String is immutable* but this only means you cannot change it using its public API.
What you are doing here is circumventing the normal API, using reflection. The same way, you can change the values of enums, change the lookup table used in Integer autoboxing etc.
Now, the reason s1 and s2 change value, is that they both refer to the same interned string. The compiler does this (as mentioned by other answers).
The reason s3 does not was actually a bit surprising to me, as I thought it would share the value array (it did in earlier version of Java, before Java 7u6). However, looking at the source code of String, we can see that the value character array for a substring is actually copied (using Arrays.copyOfRange(..)). This is why it goes unchanged.
You can install a SecurityManager, to avoid malicious code to do such things. But keep in mind that some libraries depend on using these kind of reflection tricks (typically ORM tools, AOP libraries etc).
*) I initially wrote that Strings aren't really immutable, just "effective immutable". This might be misleading in the current implementation of String, where the value array is indeed marked private final. It's still worth noting, though, that there is no way to declare an array in Java as immutable, so care must be taken not to expose it outside its class, even with the proper access modifiers.
As this topic seems overwhelmingly popular, here's some suggested further reading: Heinz Kabutz's Reflection Madness talk from JavaZone 2009, which covers a lot of the issues in the OP, along with other reflection... well... madness.
It covers why this is sometimes useful. And why, most of the time, you should avoid it. :-)

In Java, if two string primitive variables are initialized to the same literal, it assigns the same reference to both variables:
String Test1="Hello World";
String Test2="Hello World";
System.out.println(test1==test2); // true
That is the reason the comparison returns true. The third string is created using substring() which makes a new string instead of pointing to the same.
When you access a string using reflection, you get the actual pointer:
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
So change to this will change the string holding a pointer to it, but as s3 is created with a new string due to substring() it would not change.

You are using reflection to circumvent the immutability of String - it's a form of "attack".
There are lots of examples you can create like this (eg you can even instantiate a Void object too), but it doesn't mean that String is not "immutable".
There are use cases where this type of code may be used to your advantage and be "good coding", such as clearing passwords from memory at the earliest possible moment (before GC).
Depending on the security manager, you may not be able to execute your code.

You are using reflection to access the "implementation details" of string object. Immutability is the feature of the public interface of an object.

Visibility modifiers and final (i.e. immutability) are not a measurement against malicious code in Java; they are merely tools to protect against mistakes and to make the code more maintainable (one of the big selling points of the system). That is why you can access internal implementation details like the backing char array for Strings via reflection.
The second effect you see is that all Strings change while it looks like you only change s1. It is a certain property of Java String literals that they are automatically interned, i.e. cached. Two String literals with the same value will actually be the same object. When you create a String with new it will not be interned automatically and you will not see this effect.
#substring until recently (Java 7u6) worked in a similar way, which would have explained the behaviour in the original version of your question. It didn't create a new backing char array but reused the one from the original String; it just created a new String object that used an offset and a length to present only a part of that array. This generally worked as Strings are immutable - unless you circumvent that. This property of #substring also meant that the whole original String couldn't be garbage collected when a shorter substring created from it still existed.
As of current Java and your current version of the question there is no strange behaviour of #substring.

String immutability is from the interface perspective. You are using reflection to bypass the interface and directly modify the internals of the String instances.
s1 and s2 are both changed because they are both assigned to the same "intern" String instance. You can find out a bit more about that part from this article about string equality and interning. You might be surprised to find out that in your sample code, s1 == s2 returns true!

Which version of Java are you using? From Java 1.7.0_06, Oracle has changed the internal representation of String, especially the substring.
Quoting from Oracle Tunes Java's Internal String Representation:
In the new paradigm, the String offset and count fields have been removed, so substrings no longer share the underlying char [] value.
With this change, it may happen without reflection (???).

There are really two questions here:
Are strings really immutable?
Why is s3 not changed?
To point 1: Except for ROM there is no immutable memory in your computer. Nowadays even ROM is sometimes writable. There is always some code somewhere (whether it's the kernel or native code sidestepping your managed environment) that can write to your memory address. So, in "reality", no they are not absolutely immutable.
To point 2: This is because substring is probably allocating a new string instance, which is likely copying the array. It is possible to implement substring in such a way that it won't do a copy, but that doesn't mean it does. There are tradeoffs involved.
For example, should holding a reference to reallyLargeString.substring(reallyLargeString.length - 2) cause a large amount of memory to be held alive, or only a few bytes?
That depends on how substring is implemented. A deep copy will keep less memory alive, but it will run slightly slower. A shallow copy will keep more memory alive, but it will be faster. Using a deep copy can also reduce heap fragmentation, as the string object and its buffer can be allocated in one block, as opposed to 2 separate heap allocations.
In any case, it looks like your JVM chose to use deep copies for substring calls.

To add to the #haraldK's answer - this is a security hack which could lead to a serious impact in the app.
First thing is a modification to a constant string stored in a String Pool. When string is declared as a String s = "Hello World";, it's being places into a special object pool for further potential reusing. The issue is that compiler will place a reference to the modified version at compile time and once the user modifies the string stored in this pool at runtime, all references in code will point to the modified version. This would result into a following bug:
System.out.println("Hello World");
Will print:
Hello Java!
There was another issue I experienced when I was implementing a heavy computation over such risky strings. There was a bug which happened in like 1 out of 1000000 times during the computation which made the result undeterministic. I was able to find the problem by switching off the JIT - I was always getting the same result with JIT turned off. My guess is that the reason was this String security hack which broke some of the JIT optimization contracts.

According to the concept of pooling, all the String variables containing the same value will point to the same memory address. Therefore s1 and s2, both containing the same value of “Hello World”, will point towards the same memory location (say M1).
On the other hand, s3 contains “World”, hence it will point to a different memory allocation (say M2).
So now what's happening is that the value of S1 is being changed (by using the char [ ] value). So the value at the memory location M1 pointed both by s1 and s2 has been changed.
Hence as a result, memory location M1 has been modified which causes change in the value of s1 and s2.
But the value of location M2 remains unaltered, hence s3 contains the same original value.

The reason s3 does not actually change is because in Java when you do a substring the value character array for a substring is internally copied (using Arrays.copyOfRange()).
s1 and s2 are the same because in Java they both refer to the same interned string. It's by design in Java.

String is immutable, but through reflection you're allowed to change the String class. You've just redefined the String class as mutable in real-time. You could redefine methods to be public or private or static if you wanted.

Strings are created in permanent area of the JVM heap memory. So yes, it's really immutable and cannot be changed after being created.
Because in the JVM, there are three types of heap memory:
1. Young generation
2. Old generation
3. Permanent generation.
When any object are created, it goes into the young generation heap area and PermGen area reserved for String pooling.
Here is more detail you can go and grab more information from:
How Garbage Collection works in Java .

[Disclaimer this is a deliberately opinionated style of answer as I feel a more "don't do this at home kids" answer is warranted]
The sin is the line field.setAccessible(true); which says to violate the public api by allowing access to a private field. Thats a giant security hole which can be locked down by configuring a security manager.
The phenomenon in the question are implementation details which you would never see when not using that dangerous line of code to violate the access modifiers via reflection. Clearly two (normally) immutable strings can share the same char array. Whether a substring shares the same array depends on whether it can and whether the developer thought to share it. Normally these are invisible implementation details which you should not have to know unless you shoot the access modifier through the head with that line of code.
It is simply not a good idea to rely upon such details which cannot be experienced without violating the access modifiers using reflection. The owner of that class only supports the normal public API and is free to make implementation changes in the future.
Having said all that the line of code is really very useful when you have a gun held you your head forcing you to do such dangerous things. Using that back door is usually a code smell that you need to upgrade to better library code where you don't have to sin. Another common use of that dangerous line of code is to write a "voodoo framework" (orm, injection container, ...). Many folks get religious about such frameworks (both for and against them) so I will avoid inviting a flame war by saying nothing other than the vast majority of programmers don't have to go there.

String is immutable in nature Because there is no method to modify String object.
That is the reason They introduced StringBuilder and StringBuffer classes

This is a quick guide to everything
// Character array
char[] chr = {'O', 'K', '!'};
// this is String class
String str1 = new String(chr);
// this is concat
str1 = str1.concat("another string's ");
// this is format
System.out.println(String.format(str1 + " %s ", "string"));
// this is equals
System.out.println(str1.equals("another string"));
//this is split
for(String s: str1.split(" ")){
System.out.println(s);
}
// this is length
System.out.println(str1.length());
//gives an score of the total change in the length
System.out.println(str1.compareTo("OK!another string string's"));
// trim
System.out.println(str1.trim());
// intern
System.out.println(str1.intern());
// character at
System.out.println(str1.charAt(5));
// substring
System.out.println(str1.substring(5, 12));
// to uppercase
System.out.println(str1.toUpperCase());
// to lowerCase
System.out.println(str1.toLowerCase());
// replace
System.out.println(str1.replace("another", "hello"));
// output
// OK!another string's string
// false
// OK!another
// string's
// 20
// 7
// OK!another string's
// OK!another string's
// o
// other s
// OK!ANOTHER STRING'S
// ok!another string's
// OK!hello string's

The "Why" behind PMD's StringInstantiation rule

Along the lines of an existing thread, The “Why” behind PMD's rules, I'm trying to figure out the meaning of one particular PMD rule : String and StringBuffer Rules.StringInstantiation.
This rule states that you shouldn't explicitly instantiate String objects. As per their manual page :
Avoid instantiating String objects; this is usually unnecessary since
they are immutable and can be safely shared.
This rule is defined by the following Java
class:net.sourceforge.pmd.lang.java.rule.strings.StringInstantiationRule
Example(s):
private String bar = new String("bar"); // just do a String bar =
"bar";
http://pmd.sourceforge.net/pmd-5.0.1/rules/java/strings.html
I don't see how this syntax is a problem, other than it being pointless. Does it affect overwhole performance ?
Thanks for any thought.

With String foo = "foo" there will be on instance of "foo" in PermGen space (This is referred to as string interning). If you were to later type String bar = "foo" there would still only be one "foo" in the PermGen space.
Writing String foo = new String( "foo" ) will also create a String object to count against the heap.
Thus, the rule is there to prevent wasting memory.
Cheers,

It shouldn't usually affect performance in any measurable way, but:
private String bar = new String("bar"); // just do a String bar = "bar";
If you execute this line a million times you will have created a million objects
private String bar = "bar"; // just do a String bar = "bar";
If you execute this line a million times you will have created one Object.
There are scenarios where that actually makes a difference.

Does it affect overwhole performance ?
Well, performance and maintenance. Doing something which is pointless makes the reader wonder why the code is there in the first place. When that pointless operation also involves creating new objects (two in this case - a new char[] and a new String) that's another reason to avoid it...
In the past, there has been a reason to call new String(existingString) if the existing string was originally obtained as a small substring of a longer string - or other ways of obtaining a string backed by a large character array. I believe that this is not the case with more recent implementations of Java, but obviously you can still be using an old one. This shouldn't be a problem for constant strings anyway, mind you.
(You could argue that creating a new object allows you to synchronize on it. I would avoiding synchronizing on strings to start with though.)

One difference is the memory footprint:
String a = "abc"; //one object
String b = "abc"; //same object (i.e. a == b) => still one object in memory
String c = new String("abc"); // This is a new object - now 2 objects in memory
To be honest, the only reason I can think of, why one would use the String constructor is in combination with substring, which is a view on the original string. Using the String constructor in that case helps getting rid of the original string if it is not needed any longer.
However, since java 7u6, this is not the case any more so I don't see any reasons to use it any more.

It can be useful, because it creates a new identity, and sometimes object identities are important/crucial to an application. For example, it can be used as an internal sentinel value. There are other valid use cases too, e.g. to avoid constant expression.
If a beginner writes such code, it's very likely a mistake. But that is a very short learning period. It is highly unlikely that any moderately experienced Java programmer would write that by mistake; it must be for a specific purpose. File it under "it looks like a stupid mistake, but it takes efforts to make, so it's probably intended".

It is
pointless
confusing
slightly slower
You should try to write the simplest, clearest code you can. Adding pointless code is bad all round.

What will use more memory

I am working on improving the performance of my app. I am confused about which of the following will use more memory: Here sb is StringBuffer
String strWithLink = sb.toString();
clickHereTextview.setText(
Html.fromHtml(strWithLink.substring(0,strWithLink.indexOf("+"))));
OR
clickHereTextview.setText(
Html.fromHtml(sb.toString().substring(0,sb.toString().indexOf("+"))));

In terms of memory an expression such as
sb.toString().indexOf("+")
has little to no impact as the string will be garbage collected right after evaluation. (To avoid even the temporary memory usage, I would recommend doing
sb.indexOf("+")
instead though.)
However, there's a potential leak involved when you use String.substring. Last time I checked the the substring basically returns a view of the original string, so the original string is still resident in memory.
The workaround is to do
String strWithLink = sb.toString();
... new String(strWithLink.substring(0,strWithLink.indexOf("+"))) ...
^^^^^^^^^^
to detach the wanted string, from the original (potentially large) string. Same applies for String.split as discussed over here:
Java String.split memory leak?

The second will use more memory, because each call to StringBuilder#toString() creates a new String instance.
http://www.docjar.com/html/api/java/lang/StringBuilder.java.html

Analysis
If we look at StringBuilder's OpenJDK sources:
public String toString() {
// Create a copy, don't share the array
return new String(value, 0, count);
}
We see, that it instantiates a whole new String object. It places in the string pool as many new instances as many times you call sb.toString().
Outcome
Use String strWithLink = sb.toString();, reusing it will retrieve the same instance of String from the pool, rather the new one.

Check other people's answers, the second one does take a little bit more memory, but this sounds like you are over optimizing. Keeping your code clear and readable should be the priority. I'd suggest you don't worry so much about such tiny optimizations if readability will suffer.

The less work you do, the more efficient it usually is. In this case, you don't need to call toString at all
clickHereTextview.setText(Html.fromHtml(sb.substring(0, sb.indexOf("+"))));

Creating new objects always take up more memory. However, in your case difference seems insignificant.
Also, in your case, you are creating a local variable which takes heap space.
Whenever there are references in more than one location in your method it good to use
String strWithLink = sb.toString();, as you can use the same strWithLink everywhere . Otherwise, if there is only one reference, its always better to just use sb.toString(); directly.

Java: Different between two ways when using new Object

For example, you want to reverse a string, will there two ways:
first:
String a = "StackOverFlow";
a = new StringBuffer(a).reverse().toString();
and second is:
String a = "StackOverFlow";
StringBuffer b = new StringBuffer(a);
a = b.reverse().toString();
at above code, I have two question:
1) in first code, does java create a "dummy object" StringBuffer in memory before do reverse and change to String.
2) at above code, does first will more optimize than second because It makes GC works more effectively ? (this is a main question I want to ask)

Both snippets will create the same number of objects. The only difference is the number of local variables. This probably won't even change how many values are on the stack etc - it's just that in the case of the second version, there's a name for one of the stack slots (b).
It's very important that you differentiate between objects and variables. It's also important to write the most readable code you can first, rather than trying to micro-optimize. Once you've got clear, working code you should measure to see whether it's fast enough to meet your requirements. If it isn't, you should profile it to work out where you can make changes most effectively, and optimize that section, then remeasure, etc.

The first way will create a very real, not at all a "dummy object" for the StringBuffer.
Unless there are other references to b below the last line of your code, the optimizer has enough information to let the environment garbage-collect b as soon as it's done with toString
The fact that there is no variable for b does not make the object created by new less real. The compiler will probably optimize both snippets into identical bytecode, too.

StringBuffer b is not a dummy object, is a reference; basically just a pointer, that resides in the stack and is very small memory-wise. So not only it makes no difference in performance (GC has nothing to do with this example), but the Java compiler will probably remove it altogether (unless it's used in other places in the code).

In answer to your first question, yes, Java will create a StringBuffer object. It works pretty much the way you think it does.
To your second question, I'm pretty sure that the Java compiler will take care of that for you. The compiler is not without its faults but I think in a simple example like this it will optimize the byte code.
Just a tip though, in Java Strings are immutable. This means they cannot be changed. So when you assign a new value to a String Java will carve out a piece of memory, put the new String value in it, and redirect the variable to the new memory space. After that the garbage collector should come by and clear out the old string.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.