What is the most efficient way to use an instance of DecimalFormat together with a StringBuilder? When numbers are appended to a string in a loop, for example. There is format(long number, StringBuffer toAppendTo, FieldPosition pos), but that uses a StringBuffer and not a StringBuilder, so it is not compatible, and there is also formatToCharacterIterator(Object obj) but that both must create an object for the iterator and does not work with primitive types, so it also requires additional potential boxing.
It seems to me calling format(long number) to produce a string and append it to the StringBuilder is the easiest option, but having to create a string just to append it seems to be kinda defeating the purpose of the StringBuilder. Is there really no other option?
Edit: I have decided to do some measurements to see the performance difference between these options. Based on the OpenJDK implementation, it seems all methods eventually get routed to either format(long, StringBuffer, FieldPosition) or format(double, StringBuffer, FieldPosition) (with the exception of large BigInteger and BigDecimal), so it would seem when appending just numbers, this way will always be faster with StringBuffer.
And indeed, directly using StringBuffer is about 20 % faster on my machine than via StringBuilder and intermediate string. However, the opposite is true when no number formatting is done and only strings are appended – then StringBuffer is 20 % slower. But considering formatting a number is about 5 times slower than simply appending a string, StringBuilder seems to be only ever more efficient when there are significantly more appends than formats.
Unless you can ascertain that the additional String and StringBuffer creation is an actual real-world bottleneck in your application, I suspect you are worrying unnecessarily. It's true that if we could re-invent the history of the JDK, these calls probably would have been defined as taking Appendable. But the authors may well have considered adding a corresponding method when StringBuilder and Appendable were added in Java 5 and decided it was not worth it.
Remember that in modern JVMs, the creation and disposal of temporary objects doesn't have the performance overhead that it once used to. In addition, various flavours of NumberFormat-- including DecimalFormat-- actually provide an internal 'fast path' (see the fastFormat() method) that avoids the StringBuffer creation you mention (though not the temporary String creation). The JVM also generally optimises for uncontended synchronisation as you will have here.
If a profile genuinely shows that DecimalFormat.foramt() is a bottleneck to your application, then there is a chance you may need to consider implementing a specific method to include the optimisations you require. But I suspect you'll find that it's not actually an issue.
Related
Can I get some concrete explanation of memory and runtime overhead with the below two statements?
String CONST = "string constant";
StringBuilder sb1 = new StringBuilder();
sb1.append(CONST);
StringBuilder sb2 = new StringBuilder();
sb2.append("string constant");
Does second create string object and add in stringpool?
Is there any scenario(consider many string appends as well) where we can justify one is better than other?
There is no difference in memory or runtime overhead between these two versions.
Use whichever seems more readable or maintainable. If you're reusing the same string constant in many places, the constant is long, or might change, then pulling out a constant might be appropriate.
In reference to the runtime overhead, running a simulation of both methods yielded almost identical results.
My tests were done with 10,000,000,000 iterations and the runtime was:
Method 1 - 95109ms (~9.5ns average)
Method 2 - 95002ms (~9.5ns average)
So definitely no noticeable difference in performance.
Therefore, as #LouisWasserman said in their answer, just use the one that keeps your code clean and legible.
I need to copy many big and different String strs' content to a static char array and use the array frequently in a efficiency-demanding job, thus it's important to avoid allocating too much new space.
For the reason above, str.toCharArray() was banned, since it allocates space for every String.
As we all know, charAt(i) is more slowly and more complex than using square brackets [i]. So I want to use byte[] or char[].
One good news is, there's a str.getBytes(srcBegin, srcEnd, dst, dstBegin). But the bad news is it was (or is to be?) deprecated.
So how can we finish this demanding job?
I believe you want getChars(int, int, char[], int). That will copy the characters into the specified array, and I'd expect it to do it "as efficiently as reasonably possible".
You should avoid converting between text and binary representations unless you really need to. Aside from anything else, that conversion itself is likely to be time-consuming.
A small stocktaking:
String does Unicode text; it can be normalized (java.text.Normalizer).
int[] code points are Unicode symbols
char[] is Unicode UTF-16BE (2 bytes per char), sometimes for a code point 2 chars are needed: a surrogate pair.
byte[] is for binary data. Holding Unicode text in UTF-8 is relative compact when there is much ASCII resp. Latin-1.
Processing might be done on a ByteBuffer, CharBuffer, IntBuffer.
When dealing with Asian scripts, int code points probably is most feasible.
Otherwise bytes seem best.
Code points (or chars) also make sense when the Character class is utilized for classification of Unicode blocks and scripts, digits in several scripts, emoji, whatever.
Performance would best be done in bytes as often most compact. UTF-8 probably.
One cannot efficiently deal with memory allocation. getBytes should be used with a Charset. Almost always a kind of conversion happens. As new java versions can keep a byte array instead of a char array for an encoding like Latin-1, ISO-8859-1, even using an internal char array would not do. And new arrays are created.
What one can do, is using fast ByteBuffers.
Alternatively for lingual analysis one can use databases, maybe graph databases. At least something which can exploit parallelism.
You are pretty much restricted to the APIs offered within the string class, and obviously, that deprecated method is supposed to be replaced with getBytes() (or an alternative that allows to specify a charset.
In other words: that problem you are talking about "having many large strings, that need to go into arrays" can't be solved easily.
Thus a distinct non-answer: look into your design. If performance is really critical, then do not create those many large strings upfront!
In other words: if your measurements convince you that you do have real performance issue, then adapt your design as needed. Maybe there is a chance that in the place where your strings are "coming" in ... you already do not use String objects, but something that works better for you, later on, performance wise.
But of course: that will lead to a complex, error prone solution, where you do a lot of "memory management" yourself. Thus, as said: measure first. Ensure that you have a real problem, and it actually sits in the place you think it sits.
str.getBytes(srcBegin, srcEnd, dst, dstBegin) is indeed deprecated. The relevant documentation recommends getBytes() instead. If you needed str.getBytes(srcBegin, srcEnd, dst, dstBegin) because sometimes you don't have to convert the entire string I suppose you could substring() first, but I'm not sure how badly that would impact your code's efficiency, if at all. Or if it's all the same to you if you store it in char[] then you can use getChars(int,int,char[],int) which is not deprecated.
Java's String and StringBuilder are limited to a length of Integer.MAX_VALUE. In most use cases this is more than adequate, but I have just encountered a use case in which I need to handle and return a String greater than 2,684,354,560 characters.
This is required for capturing an incoming stream of characters, in which I do not have control over the size of the stream, nor do I have the option of re-architecting the solution. What I can do at most is replace a method in an existing module, or introduce a new class that replaces String and StringBuilder in that method.
As a temporary workaround, to prevent the OutOfMemory exception thrown when the StringBuilder length exceeds Integer.MAX_VALUE, I implemented the follow safeAppend():
private void safeAppend(StringBuilder ret, String current) {
if ((long)ret.length() + current.length() > Integer.MAX_VALUE) {
String truncateLeadingPart;
if (current.length() < ret.length()) {
truncateLeadingPart = ret.substring(current.length());
}
else {
int startIndex = (int)((long)ret.length()+current.length()-Integer.MAX_VALUE);
truncateLeadingPart = ret.substring(Math.min(ret.length(), startIndex));
}
ret.setLength(0);
ret.append(truncateLeadingPart);
}
ret.append(current);
}
This methods truncates the leading part and always keeps the trailing 2,147,483,647 characters part. However, this workaround/safeguard proved to be inadequate for the task at hand because we cannot afford losing any data captured from the stream.
What is a recommended approach to implementing a String and StringBuilder that are NOT limited by an int max size?
A limit of a long max size could be sufficient. Also, a single LimitlessString class that can be appended efficiently like StringBuilder is also adequate.
You wont be able to String or StringBuffer as the 32-bit length is baked into the interface. That's also true of arrays and NIO buffers, unfortunately (there have been proposals to fix this, but nothing at the time of writing).
Obviously streaming or using random file access would be a good solution if that is possible.
You are left with implementing something else. Ropes use a binary tree to represent composition of string parts. More common is to use an array of arrays, or for better GC an array of directly allocated (or memory-mapped file) NIO buffers. Someone remarked a few years ago that this area of Computer Science still has scope for more PhDs.
Well, if you Really-Really need to extend String/StringBuilder classes in such way you have to either create new class, that won't extend String/StringBuilder, because thay are marked as final, or you can change JRE binaries to make String/StringBuilder non-final. Anyway, both solutions sucks and will lead to huge support effort and will generate a lot of WTFs in future.
String and StringBuilder are final classes and cannot be patched. StringWriter would have been better.
Nice would have been:
not using two-byte chars, but bytes (CharBuffer upon ByteBuffer);
compressing (GzipOutputStream);
(as you did) periodically remove a huge chunk to a file or such;
[An aside] in the newer java there is support for single byte encodings which would not allow more characters but would use half the memory.
You'll meet resizing on appending, so the system will slow down.
I'm writing a routine that takes a string and formats it as quoted printable. And it's got to be as fast as possible. My first attempt copied characters from one stringbuffer to another encoding and line wrapping along the way. Then I thought it might be quicker to just modify the original stringbuffer rather than copy all that data which is mostly identical. Turns out the inserts are far worse than copying, the second version (with the stringbuffer inserts) was 8 times slower, which makes sense, as it must be moving a lot of memory.
What I was hoping for was some kind of gap buffer data structure so the inserts wouldn't involve physically moving all the characters in the rest of the stringbuffer.
So any suggestions about the fastest way to rip through a string inserting characters every once in a while?
Suggestions to use the standard mimeutils library are not helpful because I'm also dot escaping the string so it can be dumped out to an smtp server in one shot.
At the end, your gap data structure would have to be transformed into a String, which would need assembling all the chunks in a single array by appending them to a StringBuilder.
So using a StringBuilder directly will be faster. I don't think you'll find a faster technique than that. Make sure to initialize the StringBuilder with a large enough size to avoid copies of the whole buffer once the capacity is exhausted.
So taking the advice of some of the other answers here I've been writing many versions of this function, seeing what goes quickest and for future reference if anybody can gain from what I found:
1) The slowest: stringbuffer.append() but we knew that.
2) Almost twice as fast: stringbuilder.append(). locks are very expensive it seems.
3) another 20% faster is.... copying from one char[] to another.
4) and finally, coming in three times faster than even that... a JNI call to the exact same code compiled in C that copies from one char array to another.
You may consider #4 cheating, but cheaters win. It is by far the fastest way to go.
There is a risk of the GetCharArrayElements call causing the java char array to be copied so it can be handed to the C program, but I can't tell if that's happening, and it's still wicked fast compared to any java implementation.
I think a good balance between speed and coding grace would be using Matcher.appendReplacement. Formulate a regex that will catch all insertion points. In a loop you use find, analyze Matcher.group() to see what exactly has matched, and use your program logic to decide what to give to appendReplacement.
In any case, it is important not to copy the text over char by char. You must copy in the largest chunks possible.
The Matcher API is quite unfortunately bound to the StringBuffer, but, as you find, that only steels the final 5% from you.
This question already has answers here:
StringBuilder vs String concatenation in toString() in Java
(20 answers)
Closed 8 years ago.
When should we use + for concatenation of strings, when is StringBuilder preferred and When is it suitable to use concat.
I've heard StringBuilder is preferable for concatenation within loops. Why is it so?
Thanks.
Modern Java compiler convert your + operations by StringBuilder's append. I mean to say if you do str = str1 + str2 + str3 then the compiler will generate the following code:
StringBuilder sb = new StringBuilder();
str = sb.append(str1).append(str2).append(str3).toString();
You can decompile code using DJ or Cavaj to confirm this :)
So now its more a matter of choice than performance benefit to use + or StringBuilder :)
However given the situation that compiler does not do it for your (if you are using any private Java SDK to do it then it may happen), then surely StringBuilder is the way to go as you end up avoiding lots of unnecessary String objects.
I tend to use StringBuilder on code paths where performance is a concern. Repeated string concatenation within a loop is often a good candidate.
The reason to prefer StringBuilder is that both + and concat create a new object every time you call them (provided the right hand side argument is not empty). This can quickly add up to a lot of objects, almost all of which are completely unnecessary.
As others have pointed out, when you use + multiple times within the same statement, the compiler can often optimize this for you. However, in my experience this argument doesn't apply when the concatenations happen in separate statements. It certainly doesn't help with loops.
Having said all this, I think top priority should be writing clear code. There are some great profiling tools available for Java (I use YourKit), which make it very easy to pinpoint performance bottlenecks and optimize just the bits where it matters.
P.S. I have never needed to use concat.
From Java/J2EE Job Interview Companion:
String
String is immutable: you can’t modify a String object but can replace it by creating a new instance. Creating a new instance is rather expensive.
//Inefficient version using immutable String
String output = "Some text";
int count = 100;
for (int i = 0; i < count; i++) {
output += i;
}
return output;
The above code would build 99 new String objects, of which 98 would be thrown away immediately. Creating new objects is not efficient.
StringBuffer/StringBuilder
StringBuffer is mutable: use StringBuffer or StringBuilder when you want to modify the contents. StringBuilder was added in Java 5 and it is identical in all respects to StringBuffer except that it is not synchronised, which makes it slightly faster at the cost of not being thread-safe.
//More efficient version using mutable StringBuffer
StringBuffer output = new StringBuffer(110);
output.append("Some text");
for (int i = 0; i < count; i++) {
output.append(i);
}
return output.toString();
The above code creates only two new objects, the StringBuffer and the final String that is returned. StringBuffer expands as needed, which is costly however, so it would be better to initialise the StringBuffer with the correct size from the start as shown.
If all concatenated elements are constants (example : "these" + "are" + "constants"), then I'd prefer the +, because the compiler will inline the concatenation for you. Otherwise, using StringBuilder is the most effective way.
If you use + with non-constants, the Compiler will internally use StringBuilder as well, but debugging becomes hell, because the code used is no longer identical to your source code.
My recommendation would be as follows:
+: Use when concatenating 2 or 3 Strings simply to keep your code brief and readable.
StringBuilder: Use when building up complex String output or where performance is a concern.
String.format: You didn't mention this in your question but it is my preferred method for creating Strings as it keeps the code the most readable / concise in my opinion and is particularly useful for log statements.
concat: I don't think I've ever had cause to use this.
Use StringBuilder if you do a lot of manipulation. Usually a loop is a pretty good indication of this.
The reason for this is that using normal concatenation produces lots of intermediate String object that can't easily be "extended" (i.e. each concatenation operation produces a copy, requiring memory and CPU time to make). A StringBuilder on the other hand only needs to copy the data in some cases (inserting something in the middle, or having to resize because the result becomes to big), so it saves on those copy operations.
Using concat() has no real benefit over using + (it might be ever so slightly faster for a single +, but once you do a.concat(b).concat(c) it will actually be slower than a + b + c).
Use + for single statements and StringBuilder for multiple statements/ loops.
The performace gain from compiler applies to concatenating constants.
The rest uses are actually slower then using StringBuilder directly.
There is not problem with using "+" e.g. for creating a message for Exception because it does not happen often and the application si already somehow screwed at the moment. Avoid using "+" it in loops.
For creating meaningful messages or other parametrized strings (Xpath expressions e.g.) use String.format - it is much better readable.
I suggest to use concat for two string concatination and StringBuilder otherwise, see my explanation for concatenation operator (+) vs concat()