The "Why" behind PMD's StringInstantiation rule - java

Along the lines of an existing thread, The “Why” behind PMD's rules, I'm trying to figure out the meaning of one particular PMD rule : String and StringBuffer Rules.StringInstantiation.
This rule states that you shouldn't explicitly instantiate String objects. As per their manual page :
Avoid instantiating String objects; this is usually unnecessary since
they are immutable and can be safely shared.
This rule is defined by the following Java
class:net.sourceforge.pmd.lang.java.rule.strings.StringInstantiationRule
Example(s):
private String bar = new String("bar"); // just do a String bar =
"bar";
http://pmd.sourceforge.net/pmd-5.0.1/rules/java/strings.html
I don't see how this syntax is a problem, other than it being pointless. Does it affect overwhole performance ?
Thanks for any thought.

With String foo = "foo" there will be on instance of "foo" in PermGen space (This is referred to as string interning). If you were to later type String bar = "foo" there would still only be one "foo" in the PermGen space.
Writing String foo = new String( "foo" ) will also create a String object to count against the heap.
Thus, the rule is there to prevent wasting memory.
Cheers,

It shouldn't usually affect performance in any measurable way, but:
private String bar = new String("bar"); // just do a String bar = "bar";
If you execute this line a million times you will have created a million objects
private String bar = "bar"; // just do a String bar = "bar";
If you execute this line a million times you will have created one Object.
There are scenarios where that actually makes a difference.

Does it affect overwhole performance ?
Well, performance and maintenance. Doing something which is pointless makes the reader wonder why the code is there in the first place. When that pointless operation also involves creating new objects (two in this case - a new char[] and a new String) that's another reason to avoid it...
In the past, there has been a reason to call new String(existingString) if the existing string was originally obtained as a small substring of a longer string - or other ways of obtaining a string backed by a large character array. I believe that this is not the case with more recent implementations of Java, but obviously you can still be using an old one. This shouldn't be a problem for constant strings anyway, mind you.
(You could argue that creating a new object allows you to synchronize on it. I would avoiding synchronizing on strings to start with though.)

One difference is the memory footprint:
String a = "abc"; //one object
String b = "abc"; //same object (i.e. a == b) => still one object in memory
String c = new String("abc"); // This is a new object - now 2 objects in memory
To be honest, the only reason I can think of, why one would use the String constructor is in combination with substring, which is a view on the original string. Using the String constructor in that case helps getting rid of the original string if it is not needed any longer.
However, since java 7u6, this is not the case any more so I don't see any reasons to use it any more.

It can be useful, because it creates a new identity, and sometimes object identities are important/crucial to an application. For example, it can be used as an internal sentinel value. There are other valid use cases too, e.g. to avoid constant expression.
If a beginner writes such code, it's very likely a mistake. But that is a very short learning period. It is highly unlikely that any moderately experienced Java programmer would write that by mistake; it must be for a specific purpose. File it under "it looks like a stupid mistake, but it takes efforts to make, so it's probably intended".

It is
pointless
confusing
slightly slower
You should try to write the simplest, clearest code you can. Adding pointless code is bad all round.

Related

Why String Deduplication when we have String Pool

String De-duplication:
Strings consume a lot of memory in any application.Whenever the garbage collector visits String objects it takes note of the char arrays. It takes their hash value and stores it alongside with a weak reference to the array. As soon as it finds another String which has the same hash code it compares them char by char.If they match as well, one String will be modified and point to the char array of the second String. The first char array then is no longer referenced anymore and can be garbage collected.
String Pool:
All strings used by the java program are stored here. If two variables are initialized to the same string value. Two strings are not created in the memory, there will be only one copy stored in memory and both will point to the same memory location.
So java already takes care of not creating duplicate strings in the heap by checking if the string exists in the string pool. Then what is the purpose of string de-duplication?
If there is a code as follows
String myString_1 = new String("Hello World");
String myString_2 = new String("Hello World");
two strings are created in memory even though they are same. I cannot think of any scenario other than this where string de-duplication is useful. Obviously I must be missing something. What I am I missing?
Thanks In Advance
The string pool applies only to strings added to it explicitly, or used as constants in the application. It does not apply to strings created dynamically during the lifetime of the application. String deduplication, however, applies to all strings.
Compile time vs run time
String pool refers to string constants that are known at compile time.
String deduplication would help you if you happen to retrieve (or construct) the same string a million times at run time, e.g. reading it from a file, a HTTP request or any other way.
String de-duplication enjoys the extra level of indirection built into String:
With a string pool, you are limited to returning the same object for two identical strings
String de-duplication lets you have multiple distinct String objects sharing the same content.
This translates into removing a limitation of de-duplicating on creation: your application could keep creating new String objects with identical content while using very little extra memory, because the content of the strings would be shared. This process can be done on a completely unrelated schedule - for example, in the background, while your application does not need much of the CPU resources. Since the identity of the String object does not change, de-duplication can be completely hidden form your application.
Just to add to the answers above, on older VM's the string pool is not garbage collected (this has changed now, but don't rely on that). It contains strings which are used as constants in the application, and so will always be needed. If you continually put all your strings in the string pool, you might quickly run out of memory. On top of that, de-duplication is a relatively expensive process, if you know you only need the string for a very short period of time, and you have enough memory.
For these reasons, strings are not put in the string pool automatically. You have to do it explicitly by calling string.intern().
From documentation :
"Initializes a newly created String object so that it represents
the same sequence of characters as the argument; in other words, the
newly created string is a copy of the argument string. Unless an
explicit copy of original is needed, use of this constructor is
unnecessary since Strings are immutable."
So my sense says, this constructor in String class is not needed normally like you have used above. I guess that constructor is provided merely for the sake of completeness or if you do not want to share that copy (kind of unnecessary now, refer here what I am talking about) but still other constructors are useful like getting an String object from char array and so on..
I cannot think of any scenario other than this where string de-duplication is useful.
Well one other (much more) frequent scenario is the use of StringBuilders. In the toString() method of the StringBuilder class, it clearly creates a new instance in memory:
public final class StringBuilder extends AbstractStringBuilder
implements java.io.Serializable, CharSequence
{
...
#Override
public String toString() {
// Create a copy, don't share the array
return new String(value, 0, count);
}
...
}
Same thing for its thread-safe version StringBuffer:
public final class StringBuffer extends AbstractStringBuilder
implements java.io.Serializable, CharSequence
{
...
#Override
public synchronized String toString() {
if (toStringCache == null) {
toStringCache = Arrays.copyOfRange(value, 0, count);
}
return new String(toStringCache, true);
}
...
}
In applications that rely heavily on this, string de-duplication may reduce memory usage.

Is "new String()" immutable as well?

I've been studying Java String for a while. The following questions are based on the below posts
Java String is special
Immutability of String in java
Immutability:
Now, going by the immutability, the String class has been designed so that the values in the common pool can be reused in other places/variables. This holds good if the String was created as
String a = "Hello World!";
However, if I create String like
String b = new String("Hello World!");
why is this immutable as well? (or is it?). Since this has a dedicated heap memory, I should be able to modify this without affecting any other variable. So by design, was there any other reason why String as a whole is considered immutable? Or is my above assumption wrong?
Second thing I wanted to ask was about the common string pool. If I create a string object as
String c = "";
is an empty entry created in the pool?
Is there any post already on these? If so, could someone share the link?
new String() is an expression that produces a String ... and a String is immutable, no matter how it is produced.
(Asking if new String() is mutable or not is nonsensical. It is program code, not a value. But I take it that that is not what you really meant.)
If I create a string object as String c = ""; is an empty entry created in the pool?
Yes; that is, an entry is created for the empty string. There is nothing special about an empty String.
(To be pedantic, the pool entry for "" gets created long before your code is executed. In fact, it is created when your code is loaded ... or possibly even earlier than that.)
So, I was wanted to know whether the new heap object is immutable as well, ...
Yes it is. But the immutability is a fundamental property of String objects. All String objects.
You see, the String API simply does not provide any methods for changing a String. So (apart from some dangerous and foolish1 tricks using reflection), you can't mutate a String.
and if so what was the purpose?.
The primary reason that Java String is designed as an immutable class is simplicity. It makes it easier to write correct programs, and read / reason about other people's code if the core string class provides an immutable interface.
An important second reason is that the immutability of String has fundamental implications for the Java security model. But I don't think this was a driver in the original language design ... in Java 1.0 and earlier.
Going by the answer, I gather that other references to the same variable is one of the reasons. Please let me know if I am right in understanding this.
No. It is more fundamental than that. Simply, all String objects are immutable. There is no complicated special case reasoning required to understand this. It just >>is<<.
For the record, if you want a mutable "string-like" object in Java, you can use StringBuilder or StringBuffer. But these are different types to String.
1 - The reason these tricks are (IMO) dangerous and foolish is that they affect the values of strings that are potentially shared by other parts of your application via the string pool. This can cause chaos ... in ways that the next guy maintaining your code has little chance of tracking down.
String is immutable irrespective of how it is instantiated
1) Short answer is yes, new String() is immutable too.
Because every possible mutable operation (like replace,toLowerCase etcetra) that you perform on String does not affect the original String instance and returns you a new instance.
You may check this in Javadoc for String. Each public method of String that is exposed returns a new String instance and does not alter the present instance on which you called the method.
This is very helpful in Multi-threaded environment as you don't have to think about mutability (someone will change the value) every time you pass or share the String around. String can easily be the most used data type, so the designers have blessed us all to not think about mutability everytime and saved us a lot of pain.
Immutability allowed String pool or caching
It is because of immutability property that the internal pool of string was possible, as when same String value is required at some other place then that immutable reference is returned. If String would have been mutable then it would not have been possible to share Strings like this to save memory.
String immutablity was not because of pooling, but immutability has more benefits attached to it.
String interning or pooling is an example of Flyweight Design pattern
2) Yes it will be interned like any other String as a blank String is also as much a String as other String instances.
References:
Immutability benefits of String
The Java libraries are heavily optimized around the constraint that any String object is immutable, regardless of how that object is constructed. Even if you create your b using new, other code that you pass that instance to will treat the value as immutable. This is an example of the Value Object pattern, and all of the advantages (thread-safety, no need to make private copies) apply.
The empty string "" is a legitimate String object just like anything else, it just happens to have no internal contents, and since all compile-time constant strings are interned, I'll virtually guarantee that some runtime library has already caused it to be added to the pool.
1) The immutable part isn't because of the pool; it just makes the pool possible in the first place. Strings are often passed as arguments to other functions or even shared with other threads; Making Strings immutable was a design decision to make reasoning in such situations easier. So yes - Strings in Java are always immutable, no matter how you create them (note that it's possible to have mutable strings in java - just not with the String class).
2) Yes. Probably. I'm not actually 100% sure, but that should be the case.
This isn't strictly an answer to your question, but if behind your question is a wish to have mutable strings that you can manipulate, you should check out the StringBuilder class, which implements many of the exact same methods that String has but also adds methods to change the current contents.
Once you've built your string in such a way that you're content with it, you simply call toString() on it in order to convert it to an ordinary String that you can pass to library routines and other functions that only take Strings.
Also, both StringBuilder and String implements the CharSequence interface, so if you want to write functions in your own code that can use both mutable and immutable strings, you can declare them to take any CharSequence object.
From the Java Oracle documentation:
Strings are constant; their values cannot be changed after they are
created.
And again:
String buffers support mutable strings. Because String
objects are immutable they can be shared.
Generally speaking: "all primitive" (or related) object are immutable (please, accept my lack of formalism).
Related post on Stack Overflow:
Immutability of Strings in Java
Is Java String immutable?
Is Java "pass-by-reference" or "pass-by-value"?
Is a Java string really immutable?
String is immutable. What exactly is the meaning?
what the reason behind making string immutable in java?
About the object pool: object pool is a java optimization which is NOT related to immutable as well.
String is immutable means that you cannot change the object itself no matter how you created it.And as for the second question: yes it will create an entry.
It's the other way around, actually.
[...] the String class has been designed so that the values in the common pool can be reused in other places/variables.
No, the String class is immutable so that you can safely reference an instance of it, without worrying about it being modified from another part of your program. This is why pooling is possible in the first place.
So, consider this:
// this string literal is interned and referenced by 'a'
String a = "Hello World!";
// creates a new instance by copying characters from 'a'
String b = new String(a);
Now, what happens if you simply create a reference to your newly created b variable?
// 'c' now points to the same instance as 'b'
String c = b;
Imagine that you pass c (or, more specifically, the object it is referencing) to a method on a different thread, and continue working with the same instance on your main thread. And now imagine what would happen if strings were mutable.
Why is this the case anyway?
If nothing else, it's because immutable objects make multithreading much simpler, and usually even faster. If you share a mutable object (that would be any stateful object, with mutable private/public fields or properties) between different threads, you need to take special care to ensure synchronized access (mutexes, semaphores). Even with this, you need special care to ensure atomicity in all your operations. Multithreading is hard.
Regarding the performance implications, note that quite often copying the entire string into a new instance, in order to change even a single character, is actually faster than inducing an expensive context switch due to synchronization constructs needed to ensure thread-safe access. And as you've mentioned, immutability also offers interning possibilities, which means it can actually help reduce memory usage.
It's generally a pretty good idea to make as much stuff immutable as your can.
1) Immutability: String will be immutable if you create it using new or other way for security reasons
2) Yes there will be a empty entry in the string pool.
You can better understand the concept using the code
String s1 = new String("Test");
String s2 = new String("Test");
String s3 = "Test";
String s4 = "Test";
System.out.println(s1==s2);//false
System.out.println(s1==s3);//false
System.out.println(s2==s3);//false
System.out.println(s4==s3);//true
Hope this will help your query. You can always check the source code for String class in case of better understanding Link : http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java
1-String is immutable.See this:
Is a Java string really immutable?
So you can create it with many way.
2-Short answer: Yes will be empty.
Strings created will always be immutable regardless of how they are created.
Answers to your questions:
The only difference is:
When string is created like -- {String a = "Hello World!";} then only one object gets created.
And when it is created like -- {String b = new String("Hello World!");} then two objects get created. The first one, because you have used 'new' keyword and the second one because of the String property.
Yes, for sure. There will be an empty entry created in the pool.
String is immutable because it does not provide you with a mean to modify it. It is design to avoid any tampering (it is final, the underlying array is not supposed to be touched ...).
Identically, Integer is immutable, because there is no way to modify it.
It does not matter how you create it.
Immutability is not a feature of new, it is a feature of the class String. It has no mutator methods, so it immutable.
String A = "Test"
String B = "Test"
Now String B called"Test".toUpperCase()which change the same object into"TEST", soAwill also be"TEST"` which is not desirable.
Note that in your example, the reference is changed and not the object it refers to, i.e. b as a reference may be changed and refer to a new object. But that new object is immutable, which means its contents will not be changed "after" calling the constructor.
You can change a string using b=b+"x"; or b=new String(b);, and the content of variable a seem to change, but don't confuse immutability of a reference (here variable b) and the object it is referring to (think of pointers in C). The object that the reference is pointing to will remain unchanged after its creation.
If you need to change a string by changing the contents of the object (instead of changing the reference), you can use StringBuffer, which is a mutable version of String.

Is a Java string really immutable?

We all know that String is immutable in Java, but check the following code:
String s1 = "Hello World";
String s2 = "Hello World";
String s3 = s1.substring(6);
System.out.println(s1); // Hello World
System.out.println(s2); // Hello World
System.out.println(s3); // World
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
char[] value = (char[])field.get(s1);
value[6] = 'J';
value[7] = 'a';
value[8] = 'v';
value[9] = 'a';
value[10] = '!';
System.out.println(s1); // Hello Java!
System.out.println(s2); // Hello Java!
System.out.println(s3); // World
Why does this program operate like this? And why is the value of s1 and s2 changed, but not s3?
String is immutable* but this only means you cannot change it using its public API.
What you are doing here is circumventing the normal API, using reflection. The same way, you can change the values of enums, change the lookup table used in Integer autoboxing etc.
Now, the reason s1 and s2 change value, is that they both refer to the same interned string. The compiler does this (as mentioned by other answers).
The reason s3 does not was actually a bit surprising to me, as I thought it would share the value array (it did in earlier version of Java, before Java 7u6). However, looking at the source code of String, we can see that the value character array for a substring is actually copied (using Arrays.copyOfRange(..)). This is why it goes unchanged.
You can install a SecurityManager, to avoid malicious code to do such things. But keep in mind that some libraries depend on using these kind of reflection tricks (typically ORM tools, AOP libraries etc).
*) I initially wrote that Strings aren't really immutable, just "effective immutable". This might be misleading in the current implementation of String, where the value array is indeed marked private final. It's still worth noting, though, that there is no way to declare an array in Java as immutable, so care must be taken not to expose it outside its class, even with the proper access modifiers.
As this topic seems overwhelmingly popular, here's some suggested further reading: Heinz Kabutz's Reflection Madness talk from JavaZone 2009, which covers a lot of the issues in the OP, along with other reflection... well... madness.
It covers why this is sometimes useful. And why, most of the time, you should avoid it. :-)
In Java, if two string primitive variables are initialized to the same literal, it assigns the same reference to both variables:
String Test1="Hello World";
String Test2="Hello World";
System.out.println(test1==test2); // true
That is the reason the comparison returns true. The third string is created using substring() which makes a new string instead of pointing to the same.
When you access a string using reflection, you get the actual pointer:
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
So change to this will change the string holding a pointer to it, but as s3 is created with a new string due to substring() it would not change.
You are using reflection to circumvent the immutability of String - it's a form of "attack".
There are lots of examples you can create like this (eg you can even instantiate a Void object too), but it doesn't mean that String is not "immutable".
There are use cases where this type of code may be used to your advantage and be "good coding", such as clearing passwords from memory at the earliest possible moment (before GC).
Depending on the security manager, you may not be able to execute your code.
You are using reflection to access the "implementation details" of string object. Immutability is the feature of the public interface of an object.
Visibility modifiers and final (i.e. immutability) are not a measurement against malicious code in Java; they are merely tools to protect against mistakes and to make the code more maintainable (one of the big selling points of the system). That is why you can access internal implementation details like the backing char array for Strings via reflection.
The second effect you see is that all Strings change while it looks like you only change s1. It is a certain property of Java String literals that they are automatically interned, i.e. cached. Two String literals with the same value will actually be the same object. When you create a String with new it will not be interned automatically and you will not see this effect.
#substring until recently (Java 7u6) worked in a similar way, which would have explained the behaviour in the original version of your question. It didn't create a new backing char array but reused the one from the original String; it just created a new String object that used an offset and a length to present only a part of that array. This generally worked as Strings are immutable - unless you circumvent that. This property of #substring also meant that the whole original String couldn't be garbage collected when a shorter substring created from it still existed.
As of current Java and your current version of the question there is no strange behaviour of #substring.
String immutability is from the interface perspective. You are using reflection to bypass the interface and directly modify the internals of the String instances.
s1 and s2 are both changed because they are both assigned to the same "intern" String instance. You can find out a bit more about that part from this article about string equality and interning. You might be surprised to find out that in your sample code, s1 == s2 returns true!
Which version of Java are you using? From Java 1.7.0_06, Oracle has changed the internal representation of String, especially the substring.
Quoting from Oracle Tunes Java's Internal String Representation:
In the new paradigm, the String offset and count fields have been removed, so substrings no longer share the underlying char [] value.
With this change, it may happen without reflection (???).
There are really two questions here:
Are strings really immutable?
Why is s3 not changed?
To point 1: Except for ROM there is no immutable memory in your computer. Nowadays even ROM is sometimes writable. There is always some code somewhere (whether it's the kernel or native code sidestepping your managed environment) that can write to your memory address. So, in "reality", no they are not absolutely immutable.
To point 2: This is because substring is probably allocating a new string instance, which is likely copying the array. It is possible to implement substring in such a way that it won't do a copy, but that doesn't mean it does. There are tradeoffs involved.
For example, should holding a reference to reallyLargeString.substring(reallyLargeString.length - 2) cause a large amount of memory to be held alive, or only a few bytes?
That depends on how substring is implemented. A deep copy will keep less memory alive, but it will run slightly slower. A shallow copy will keep more memory alive, but it will be faster. Using a deep copy can also reduce heap fragmentation, as the string object and its buffer can be allocated in one block, as opposed to 2 separate heap allocations.
In any case, it looks like your JVM chose to use deep copies for substring calls.
To add to the #haraldK's answer - this is a security hack which could lead to a serious impact in the app.
First thing is a modification to a constant string stored in a String Pool. When string is declared as a String s = "Hello World";, it's being places into a special object pool for further potential reusing. The issue is that compiler will place a reference to the modified version at compile time and once the user modifies the string stored in this pool at runtime, all references in code will point to the modified version. This would result into a following bug:
System.out.println("Hello World");
Will print:
Hello Java!
There was another issue I experienced when I was implementing a heavy computation over such risky strings. There was a bug which happened in like 1 out of 1000000 times during the computation which made the result undeterministic. I was able to find the problem by switching off the JIT - I was always getting the same result with JIT turned off. My guess is that the reason was this String security hack which broke some of the JIT optimization contracts.
According to the concept of pooling, all the String variables containing the same value will point to the same memory address. Therefore s1 and s2, both containing the same value of “Hello World”, will point towards the same memory location (say M1).
On the other hand, s3 contains “World”, hence it will point to a different memory allocation (say M2).
So now what's happening is that the value of S1 is being changed (by using the char [ ] value). So the value at the memory location M1 pointed both by s1 and s2 has been changed.
Hence as a result, memory location M1 has been modified which causes change in the value of s1 and s2.
But the value of location M2 remains unaltered, hence s3 contains the same original value.
The reason s3 does not actually change is because in Java when you do a substring the value character array for a substring is internally copied (using Arrays.copyOfRange()).
s1 and s2 are the same because in Java they both refer to the same interned string. It's by design in Java.
String is immutable, but through reflection you're allowed to change the String class. You've just redefined the String class as mutable in real-time. You could redefine methods to be public or private or static if you wanted.
Strings are created in permanent area of the JVM heap memory. So yes, it's really immutable and cannot be changed after being created.
Because in the JVM, there are three types of heap memory:
1. Young generation
2. Old generation
3. Permanent generation.
When any object are created, it goes into the young generation heap area and PermGen area reserved for String pooling.
Here is more detail you can go and grab more information from:
How Garbage Collection works in Java .
[Disclaimer this is a deliberately opinionated style of answer as I feel a more "don't do this at home kids" answer is warranted]
The sin is the line field.setAccessible(true); which says to violate the public api by allowing access to a private field. Thats a giant security hole which can be locked down by configuring a security manager.
The phenomenon in the question are implementation details which you would never see when not using that dangerous line of code to violate the access modifiers via reflection. Clearly two (normally) immutable strings can share the same char array. Whether a substring shares the same array depends on whether it can and whether the developer thought to share it. Normally these are invisible implementation details which you should not have to know unless you shoot the access modifier through the head with that line of code.
It is simply not a good idea to rely upon such details which cannot be experienced without violating the access modifiers using reflection. The owner of that class only supports the normal public API and is free to make implementation changes in the future.
Having said all that the line of code is really very useful when you have a gun held you your head forcing you to do such dangerous things. Using that back door is usually a code smell that you need to upgrade to better library code where you don't have to sin. Another common use of that dangerous line of code is to write a "voodoo framework" (orm, injection container, ...). Many folks get religious about such frameworks (both for and against them) so I will avoid inviting a flame war by saying nothing other than the vast majority of programmers don't have to go there.
String is immutable in nature Because there is no method to modify String object.
That is the reason They introduced StringBuilder and StringBuffer classes
This is a quick guide to everything
// Character array
char[] chr = {'O', 'K', '!'};
// this is String class
String str1 = new String(chr);
// this is concat
str1 = str1.concat("another string's ");
// this is format
System.out.println(String.format(str1 + " %s ", "string"));
// this is equals
System.out.println(str1.equals("another string"));
//this is split
for(String s: str1.split(" ")){
System.out.println(s);
}
// this is length
System.out.println(str1.length());
//gives an score of the total change in the length
System.out.println(str1.compareTo("OK!another string string's"));
// trim
System.out.println(str1.trim());
// intern
System.out.println(str1.intern());
// character at
System.out.println(str1.charAt(5));
// substring
System.out.println(str1.substring(5, 12));
// to uppercase
System.out.println(str1.toUpperCase());
// to lowerCase
System.out.println(str1.toLowerCase());
// replace
System.out.println(str1.replace("another", "hello"));
// output
// OK!another string's string
// false
// OK!another
// string's
// 20
// 7
// OK!another string's
// OK!another string's
// o
// other s
// OK!ANOTHER STRING'S
// ok!another string's
// OK!hello string's

What will use more memory

I am working on improving the performance of my app. I am confused about which of the following will use more memory: Here sb is StringBuffer
String strWithLink = sb.toString();
clickHereTextview.setText(
Html.fromHtml(strWithLink.substring(0,strWithLink.indexOf("+"))));
OR
clickHereTextview.setText(
Html.fromHtml(sb.toString().substring(0,sb.toString().indexOf("+"))));
In terms of memory an expression such as
sb.toString().indexOf("+")
has little to no impact as the string will be garbage collected right after evaluation. (To avoid even the temporary memory usage, I would recommend doing
sb.indexOf("+")
instead though.)
However, there's a potential leak involved when you use String.substring. Last time I checked the the substring basically returns a view of the original string, so the original string is still resident in memory.
The workaround is to do
String strWithLink = sb.toString();
... new String(strWithLink.substring(0,strWithLink.indexOf("+"))) ...
^^^^^^^^^^
to detach the wanted string, from the original (potentially large) string. Same applies for String.split as discussed over here:
Java String.split memory leak?
The second will use more memory, because each call to StringBuilder#toString() creates a new String instance.
http://www.docjar.com/html/api/java/lang/StringBuilder.java.html
Analysis
If we look at StringBuilder's OpenJDK sources:
public String toString() {
// Create a copy, don't share the array
return new String(value, 0, count);
}
We see, that it instantiates a whole new String object. It places in the string pool as many new instances as many times you call sb.toString().
Outcome
Use String strWithLink = sb.toString();, reusing it will retrieve the same instance of String from the pool, rather the new one.
Check other people's answers, the second one does take a little bit more memory, but this sounds like you are over optimizing. Keeping your code clear and readable should be the priority. I'd suggest you don't worry so much about such tiny optimizations if readability will suffer.
The less work you do, the more efficient it usually is. In this case, you don't need to call toString at all
clickHereTextview.setText(Html.fromHtml(sb.substring(0, sb.indexOf("+"))));
Creating new objects always take up more memory. However, in your case difference seems insignificant.
Also, in your case, you are creating a local variable which takes heap space.
Whenever there are references in more than one location in your method it good to use
String strWithLink = sb.toString();, as you can use the same strWithLink everywhere . Otherwise, if there is only one reference, its always better to just use sb.toString(); directly.

Java: Different between two ways when using new Object

For example, you want to reverse a string, will there two ways:
first:
String a = "StackOverFlow";
a = new StringBuffer(a).reverse().toString();
and second is:
String a = "StackOverFlow";
StringBuffer b = new StringBuffer(a);
a = b.reverse().toString();
at above code, I have two question:
1) in first code, does java create a "dummy object" StringBuffer in memory before do reverse and change to String.
2) at above code, does first will more optimize than second because It makes GC works more effectively ? (this is a main question I want to ask)
Both snippets will create the same number of objects. The only difference is the number of local variables. This probably won't even change how many values are on the stack etc - it's just that in the case of the second version, there's a name for one of the stack slots (b).
It's very important that you differentiate between objects and variables. It's also important to write the most readable code you can first, rather than trying to micro-optimize. Once you've got clear, working code you should measure to see whether it's fast enough to meet your requirements. If it isn't, you should profile it to work out where you can make changes most effectively, and optimize that section, then remeasure, etc.
The first way will create a very real, not at all a "dummy object" for the StringBuffer.
Unless there are other references to b below the last line of your code, the optimizer has enough information to let the environment garbage-collect b as soon as it's done with toString
The fact that there is no variable for b does not make the object created by new less real. The compiler will probably optimize both snippets into identical bytecode, too.
StringBuffer b is not a dummy object, is a reference; basically just a pointer, that resides in the stack and is very small memory-wise. So not only it makes no difference in performance (GC has nothing to do with this example), but the Java compiler will probably remove it altogether (unless it's used in other places in the code).
In answer to your first question, yes, Java will create a StringBuffer object. It works pretty much the way you think it does.
To your second question, I'm pretty sure that the Java compiler will take care of that for you. The compiler is not without its faults but I think in a simple example like this it will optimize the byte code.
Just a tip though, in Java Strings are immutable. This means they cannot be changed. So when you assign a new value to a String Java will carve out a piece of memory, put the new String value in it, and redirect the variable to the new memory space. After that the garbage collector should come by and clear out the old string.

Categories