java performance : string literal - java

I would like to know about the relation between string literal and java performance.
for example, uses of below statements number of times makes any impact in performance.I have thousands of classes and many time we are using below statements :
1) buffer.append(",");
2) buffer.append("}");
3)String.append("10,000 times...same lines") // printing same lines in many classes
3)String.someStringMethod("same line many times") // using any String method
Does this cause performance impact in terms of memory management etc.Do we have any cleaner way ?
Thanks

It is really difficult to comment on examples that make no sense. However:
In general there are no particular efficiency concerns with Java String literals.
In general there are no particular efficiency concerns with methods that take String literals as arguments.
String concatenation / building can present efficiency concerns if a particular piece of code is executed often enough. However, if you need to build strings, then there is not a lot you can do about it.
There are one or two things that it is worth taking steps to avoid. The main one is this:
String s = "";
for (/* lots of times */) {
// do stuff
s += someOtherString;
}
The problem is that this generates and then discards lots of temporary Strings. The more efficient way to do it is this:
StringBuilder sb = new StringBuilder();
for (/* lots of times */) {
// do stuff
sb.append(someOtherString);
}
String s = sb.toString();
However, it is probably only worth while optimizing this kind of thing if the profiler tells you that this particular bit of code is a bottleneck.

Any code you write affects performance, so it's always better to invoke one append() with ",}" instead of 2 appends.
There's no method append in java.lang.String class.
None of the String methods make changes to the String object. Instead, methods like String.substring(), String.concat(), String.replace() create new String objects. This means performance is affected more significantly than if you use StringBuffer.
So generally StringBuffer methods are faster than those of String. However, new class was recently introduced called StringBuilder. There's only one difference from StringBuilder - it's not thread-safe. In real world cases thread management is taken care of higher level containers thus making it unnecessary to ensure thread safety for each class. In those cases you're advised to use StringBuilder. That should be the fastest.
In order to further improve performance of StringBuilder you have to be aware of resulting string length to allocate StringBuilder of a proper size. If it's too big you'll waste some memory, but that's usually a minor problem. If it's too small though, StringBuilder will have to recreate internal character array to make space for more characters. That would make that particular append() invocation slow. Actually, that's not the invocation that's slow, but garbage collection invoked to clean the memory up.
Particular methods of String class may be better or faster than those in StringBuffer/StringBuilder, but you have to be more specific with your questions for me to answer that.

Related

Java software design - Looping, object creation VS modifying variables. Memory, performance & reliability comparison

Let's say we are trying to build a document scanner class in java that takes 1 input argument, the log path(eg. C:\document\text1.txt). Which of the following implementations would you prefer based on performance/memory/modularity?
ArrayList<String> fileListArray = new ArrayList<String>();
fileListArray.add("C:\\document\\text1.txt");
fileListArray.add("C:\\document\\text2.txt");
.
.
.
//Implementation A
for(int i =0, j = fileListArray.size(); i < j; i++){
MyDocumentScanner ds = new MyDocumentScanner(fileListArray.get(i));
ds.scanDocument();
ds.resultOutput();
}
//Implementation B
MyDocumentScanner ds = new MyDocumentScanner();
for(int i=0, j=fileListArray.size(); i < j; i++){
ds.setDocPath(fileListArray.get(i));
ds.scanDocument();
ds.resultOutput();
}
Personally I would prefer A due to its encapsulation, but it seems like more memory usage due to creation of multiple instances. I'm curious if there is an answer to this, or it is another "that depends on the situation/circumstances" dilemma?
Although this is obviously opinion-based, I will try an answer to tell my opinion.
You approach A is far better. Your document scanner obviously handles a file. That should be set at construction time and be saved in an instance field. So every method can refer to this field. Moreover, the constructor can do some checks on the file reference (null check, existence, ...).
Your approach B has two very serious disadvantages:
After constructing a document scanner, clients could easily call all of the methods. If no file was set before, you must handle that "illegal state" with maybe an IllegalStateException. Thus, this approach increases code and complexity of that class.
There seems to be a series of method calls that a client should or can perform. It's easy to call the file setting method again in the middle of such a series with a completely other file, breaking the whole scan facility. To avoid this, your setter (for the file) should remember whether a file was already set. And that nearly automatically leads to approach A.
Regarding the creation of objects: Modern JVMs are really very fast at creating objects. Usually, there is no measurable performance overhead for that. The processing time (here: the scan) usually is much higher.
If you don't need multiple instances of DocumentScanner to co-exist, I see no point in creating a new instance in each iteration of the loop. It just creates work to the garbage collector, which has to free each of those instances.
If the length of the array is small, it doesn't make much difference which implementation you choose, but for large arrays, implementation B is more efficient, both in terms of memory (less instances created that the GC hasn't freed yet) and CPU (less work for the GC).
Are you implementing DocumentScanner or using an existing class?
If the latter, and it was designed for being able to parse multiple documents in a row, you can just reuse the object as in variant B.
However, if you are designing DocumentScanner, I would recommend to design it such that it handles a single document and does not even have a setDocPath method. This leads to less mutable state in that class and thus makes its design much easier. Also using an instance of the class becomes less error-prone.
As for performance, there won't be a measurable difference unless instantiating a DocumentScanner is doing a lot of work (like instantiating many other objects, too). Instantiating and freeing objects in Java is pretty cheap if they are used only for a short time due to the generational garbage collector.

Making a set of changes to a string in Java - best practice approach

Looking for the best practice Java approach for the following problem.
I have a (relatively) long string and a set of (non-overlapping) changes to make to it - lets say the changes have the signature:
change(int startIndex, int endIndex, String replacement);
and an example would be
assert doChange("aaa",new Change(1,2,"hello")).equals("aHelloa");
My plan is to work backwards (so the changing indexes are avoided) though the string splitting into three peices each time and then stitching in the replacement. But I can imagine this has a much more effective/java-like approach... is there an API call I've missed?
The standard Java String is immutable, which makes it unsuitable for extended string-based operations. But there are also the classes StringBuffer and StringBuilder which represent a mutable string designed for being manipulated. They even have a native replace(start, end, str) method which does exactly what you are trying to do.
The main difference between these two classes is that StringBuffer is thread-safe while StringBuilder is not. When you don't have multiple threads accessing the same string, use StringBuilder, because it generally performs faster.

String allocation of literals

Another question answered me how concatenation of String literals is evaluated in compile time. In a project I'm working on we handle multi-line Strings of big queries using a StringBuffer. It appends just literals, so it had me thinking whether if something similar might happen.
In the following code, will the buffer append its contents at compile time? how would this behave when multiple threads are trying to execute this function?
public static String querySomething(int arg){
StringBuffer buffer = new StringBuffer();
buffer.append("A quite long query");
buffer.append("that doesn't fit in one line");
buffer.append("...");
}
Wouldn't it be better to define the String as a constant since it would be thread safe and we know it can get concatenated at compile time with the plus operator. Something like:
private final static REALLY_LONG_QUERY1 = "A quite long query that"
+"doesn't fit in one line"
+"...";
Wouldn't it be better to define the String as a constant ...
Basically, yes.
... since it would be thread safe and we know it can get concatenated at compile time with the plus operator.
These assertions are both correct.
However, you would not need to worry about thread safety any in the version of your code with a StringBuffer.
The StringBuffer class is thread-safe.
If the StringBuffer instance is only visible to one thread (e.g. the thread calling the method that declares and uses the instance), then the instance is thread confined and does not need to be a thread-safe data structure. (And you could use StringBuilder instead ...)
The primary advantage of the version that uses + concatenation of literals is that it takes zero time at runtime, and causes no allocation of objects ... apart from the one String object that represents the concatenated string constant that is allocated when your class is loaded.
In fact, in many places where people explicitly use StringBuilder or StringBuffer to "optimize" string concatenation, it either has no effect, or actually makes the code slower:
As you noted, the Java compiler evaluates concatenation of literals (using +) at compiler time, but it can't do the same thing for explicit StringBuilder.append calls.
In addition, the Java compiler will typically translate non-constant String concatenations (using +) in an expression into equivalent code using StringBuilder.
The only cases where it is worthwhile to use StringBuilder explicitly are when the sting building spans multiple statements; e.g. because you are concatenating stuff in a loop.
I would prefer the second solution (merely using + operator).
Why? Because:
More readable
More functional (oriented functional programming, fashion and efficient today) avoiding useless (temporary) local variables and especially mutable variables (like buffer is).
In the following code, will the buffer append its contents at compile time?
Yes.
How would this behave when multiple threads are trying to execute this function?
No problems, since each thread would use it's own StringBuffer (it is declared inside the method).
Wouldn't it be better to define the String as a constant?
Yes, it would make more sense here.
StringBuffer fit better when you want to build a string which you don't know the actual size at compile time, for example:
public static String querySomething(int arg) {
StringBuffer buffer = new StringBuffer();
while (...) {
buffer.Append(someStuff());
}
}
In your case, a constant is more suitable.

The "Why" behind PMD's StringInstantiation rule

Along the lines of an existing thread, The “Why” behind PMD's rules, I'm trying to figure out the meaning of one particular PMD rule : String and StringBuffer Rules.StringInstantiation.
This rule states that you shouldn't explicitly instantiate String objects. As per their manual page :
Avoid instantiating String objects; this is usually unnecessary since
they are immutable and can be safely shared.
This rule is defined by the following Java
class:net.sourceforge.pmd.lang.java.rule.strings.StringInstantiationRule
Example(s):
private String bar = new String("bar"); // just do a String bar =
"bar";
http://pmd.sourceforge.net/pmd-5.0.1/rules/java/strings.html
I don't see how this syntax is a problem, other than it being pointless. Does it affect overwhole performance ?
Thanks for any thought.
With String foo = "foo" there will be on instance of "foo" in PermGen space (This is referred to as string interning). If you were to later type String bar = "foo" there would still only be one "foo" in the PermGen space.
Writing String foo = new String( "foo" ) will also create a String object to count against the heap.
Thus, the rule is there to prevent wasting memory.
Cheers,
It shouldn't usually affect performance in any measurable way, but:
private String bar = new String("bar"); // just do a String bar = "bar";
If you execute this line a million times you will have created a million objects
private String bar = "bar"; // just do a String bar = "bar";
If you execute this line a million times you will have created one Object.
There are scenarios where that actually makes a difference.
Does it affect overwhole performance ?
Well, performance and maintenance. Doing something which is pointless makes the reader wonder why the code is there in the first place. When that pointless operation also involves creating new objects (two in this case - a new char[] and a new String) that's another reason to avoid it...
In the past, there has been a reason to call new String(existingString) if the existing string was originally obtained as a small substring of a longer string - or other ways of obtaining a string backed by a large character array. I believe that this is not the case with more recent implementations of Java, but obviously you can still be using an old one. This shouldn't be a problem for constant strings anyway, mind you.
(You could argue that creating a new object allows you to synchronize on it. I would avoiding synchronizing on strings to start with though.)
One difference is the memory footprint:
String a = "abc"; //one object
String b = "abc"; //same object (i.e. a == b) => still one object in memory
String c = new String("abc"); // This is a new object - now 2 objects in memory
To be honest, the only reason I can think of, why one would use the String constructor is in combination with substring, which is a view on the original string. Using the String constructor in that case helps getting rid of the original string if it is not needed any longer.
However, since java 7u6, this is not the case any more so I don't see any reasons to use it any more.
It can be useful, because it creates a new identity, and sometimes object identities are important/crucial to an application. For example, it can be used as an internal sentinel value. There are other valid use cases too, e.g. to avoid constant expression.
If a beginner writes such code, it's very likely a mistake. But that is a very short learning period. It is highly unlikely that any moderately experienced Java programmer would write that by mistake; it must be for a specific purpose. File it under "it looks like a stupid mistake, but it takes efforts to make, so it's probably intended".
It is
pointless
confusing
slightly slower
You should try to write the simplest, clearest code you can. Adding pointless code is bad all round.

What will use more memory

I am working on improving the performance of my app. I am confused about which of the following will use more memory: Here sb is StringBuffer
String strWithLink = sb.toString();
clickHereTextview.setText(
Html.fromHtml(strWithLink.substring(0,strWithLink.indexOf("+"))));
OR
clickHereTextview.setText(
Html.fromHtml(sb.toString().substring(0,sb.toString().indexOf("+"))));
In terms of memory an expression such as
sb.toString().indexOf("+")
has little to no impact as the string will be garbage collected right after evaluation. (To avoid even the temporary memory usage, I would recommend doing
sb.indexOf("+")
instead though.)
However, there's a potential leak involved when you use String.substring. Last time I checked the the substring basically returns a view of the original string, so the original string is still resident in memory.
The workaround is to do
String strWithLink = sb.toString();
... new String(strWithLink.substring(0,strWithLink.indexOf("+"))) ...
^^^^^^^^^^
to detach the wanted string, from the original (potentially large) string. Same applies for String.split as discussed over here:
Java String.split memory leak?
The second will use more memory, because each call to StringBuilder#toString() creates a new String instance.
http://www.docjar.com/html/api/java/lang/StringBuilder.java.html
Analysis
If we look at StringBuilder's OpenJDK sources:
public String toString() {
// Create a copy, don't share the array
return new String(value, 0, count);
}
We see, that it instantiates a whole new String object. It places in the string pool as many new instances as many times you call sb.toString().
Outcome
Use String strWithLink = sb.toString();, reusing it will retrieve the same instance of String from the pool, rather the new one.
Check other people's answers, the second one does take a little bit more memory, but this sounds like you are over optimizing. Keeping your code clear and readable should be the priority. I'd suggest you don't worry so much about such tiny optimizations if readability will suffer.
The less work you do, the more efficient it usually is. In this case, you don't need to call toString at all
clickHereTextview.setText(Html.fromHtml(sb.substring(0, sb.indexOf("+"))));
Creating new objects always take up more memory. However, in your case difference seems insignificant.
Also, in your case, you are creating a local variable which takes heap space.
Whenever there are references in more than one location in your method it good to use
String strWithLink = sb.toString();, as you can use the same strWithLink everywhere . Otherwise, if there is only one reference, its always better to just use sb.toString(); directly.

Categories