Looking for a high performance alternative to StringBuilder - java

I have a very simple piece of code which iterates over a list of Database objects and appends certain property to the StringBuilder. Now the results are sometimes going over 100K so the append operations go over 100K
My problem is there is no way I can shorten the number of iterations as I need the data. But Stringbuilder keeps on taking over heap space. And throws OutOfMemoryException.
Has anyone encountered any such situation and is there a solution to this problem or an alternative to StringBuilder.
It is quite possible that what I am doing might as well be wrong so even though code is quite simple I will post it.
StringBuilder endResult = new StringBuilder();
if (dbObjects != null && !dbObjects.isEmpty()) {
for (DBObject dBO : dbObjects) {
endResult.append("Starting to write" + dBO.getOutPut() + "content");
endResult.append(dBO.getResults);
}
output.append("END");
}
Like I said it's quite possible that I will have 100000 results from the DB

You shouldn't do something like this when using StringBuilder:
endResult.append("Starting to write" + dBO.getOutPut() + "content");
The above statement will do string concatenation. Use the append() method like:
endResult.append("Starting to write").append(dBO.getOutPut()).append("content");

Related

Performance - Method concatenates strings using + in a loop

I'm encountering this issue on this line of code even I use .append() inside the loop.
for (final FieldError fieldError : result.getFieldErrors()) {
errors = new StringBuilder(errors).append(fieldError.getField()).append(" - ")
.append(getErrorMessageFromProperties(fieldError.getCode())).append("*").toString();
}
how can I fix this?
You can create StringBuilder outside the for loop and reuse it.
StringBuilder sb=new StringBuilder();
for (final FieldError fieldError : result.getFieldErrors()) {
sb.append(fieldError.getField())
.append(" - ")
.append(getErrorMessageFromProperties(fieldError.getCode()))
.append("*");
}
After appending all to sb you can call
String error=sb.toString()
just after the for loop
Each time you want to read a database table, you use a loop. As database tables grow along the time, the number of iterations in the loop grow accordingly. So you want to avoid any operation in the iterations that could be costly in terms of performance. Furthermore, this is the kind of defect that you cannot detect during QA or when the application is young, with a test database that has few records.
Avoid string concatenation, creation of objects in memory, etc. in a loop.

How can I speed up my Java text file parser?

I am reading about 600 text files, and then parsing each file individually and add all the terms to a map so i can know the frequency of each word within the 600 files. (about 400MB).
My parser functions includes the following steps (ordered):
find text between two tags, which is the relevant text to read in each file.
lowecase all the text
string.split with multiple delimiters.
creating an arrayList with words like this: "aaa-aa", then adding to the string splitted above, and discounting "aaa" and "aa" to the String []. (i did this because i wanted "-" to be a delimiter, but i also wanted "aaa-aa" to be one word only, and not "aaa" and "aa".
get the String [] and map to a Map = new HashMap ... (word, frequency)
print everything.
It is taking me about 8min and 48 seconds, in a dual-core 2.2GHz, 2GB Ram. I would like advice on how to speed this process up. Should I expect it to be this slow? And if possible, how can I know (in netbeans), which functions are taking more time to execute?
unique words found: 398752.
CODE:
File file = new File(dir);
String[] files = file.list();
for (int i = 0; i < files.length; i++) {
BufferedReader br = new BufferedReader(
new InputStreamReader(
new BufferedInputStream(
new FileInputStream(dir + files[i])), encoding));
try {
String line;
while ((line = br.readLine()) != null) {
parsedString = parseString(line); // parse the string
m = stringToMap(parsedString, m);
}
} finally {
br.close();
}
}
EDIT: Check this:
![enter image description here][1]
I don't know what to conclude.
EDIT: 80% TIME USED WITH THIS FUNCTION
public String [] parseString(String sentence){
// separators; ,:;'"\/<>()[]*~^ºª+&%$ etc..
String[] parts = sentence.toLowerCase().split("[,\\s\\-:\\?\\!\\«\\»\\'\\´\\`\\\"\\.\\\\\\/()<>*º;+&ª%\\[\\]~^]");
Map<String, String> o = new HashMap<String, String>(); // save the hyphened words, aaa-bbb like Map<aaa,bbb>
Pattern pattern = Pattern.compile("(?<![A-Za-zÁÉÍÓÚÀÃÂÊÎÔÛáéíóúàãâêîôû-])[A-Za-zÁÉÍÓÚÀÃÂÊÎÔÛáéíóúàãâêîôû]+-[A-Za-zÁÉÍÓÚÀÃÂÊÎÔÛáéíóúàãâêîôû]+(?![A-Za-z-])");
Matcher matcher = pattern.matcher(sentence);
// Find all matches like this: ("aaa-bb or bbb-cc") and put it to map to later add this words to the original map and discount the single words "aaa-aa" like "aaa" and "aa"
for(int i=0; matcher.find(); i++){
String [] tempo = matcher.group().split("-");
o.put(tempo[0], tempo[1]);
}
//System.out.println("words: " + o);
ArrayList temp = new ArrayList();
temp.addAll(Arrays.asList(parts));
for (Map.Entry<String, String> entry : o.entrySet()) {
String key = entry.getKey();
String value = entry.getValue();
temp.add(key+"-"+value);
if(temp.indexOf(key)!=-1){
temp.remove(temp.indexOf(key));
}
if(temp.indexOf(value)!=-1){
temp.remove(temp.indexOf(value));
}
}
String []strArray = new String[temp.size()];
temp.toArray(strArray);
return strArray;
}
600 files, each file about 0.5MB
EDIT3#- The pattern is no longer compiling each time a line is read. The new images are:
2:
Be sure to increase your heap size, if you haven't already, using -Xmx. For this app, the impact may be striking.
The parts of your code that are likely to have the largest performance impact are the ones that are executed the most - which are the parts you haven't shown.
Update after memory screenshot
Look at all those Pattern$6 objects in the screenshot. I think you're recompiling the pattern a lot - maybe for every line. That would take a lot of time.
Update 2 - after code added to question.
Yup - two patterns compiled on every line - the explicit one, and also the "-" in the split (much cheaper, of course). I wish they hadn't added split() to String without it taking a compiled pattern as an argument. I see some other things that could be improved, but nothing else like the big compile. Just compile the pattern once, outside this function, maybe as a static class member.
Try to use to single regex that has a group that matches each word that is within tags - so a single regex could be used for your entire input and there would be not separate "split" stage.
Otherwise your approach seems reasonable, although I don't understand what you mean by "get the String [] ..." - I thought you were using an ArrayList. In any event, try to minimize the creation of objects, for both construction cost and garbage collection cost.
Is it just the parsing that's taking so long, or is it the file reading as well?
For the file reading, you can probably speed that up by reading the files on multiple threads. But first step is to figure out whether it's the reading or the parsing that's taking all the time so you can address the right issue.
Run the code through the Netbeans profiler and find out where it is taking the most time (right mouse click on the project and select profile, make sure you do time not memory).
Nothing in the code that you have shown us is an obvious source of performance problems. The problem is likely to be something to do with the way that you are parsing the lines or extracting the words and putting them into the map. If you want more advice you need to post the code for those methods, and the code that declares / initializes the map.
My general advice would be to profile the application and see where the bottlenecks are, and use that information to figure out what needs to be optimized.
#Ed Staub's advice is also sound. Running an application with a heap that is too small can result serious performance problems.
If you aren't already doing it, use BufferedInputStream and BufferedReader to read the files. Double-buffering like that is measurably better than using BufferedInputStream or BufferedReader alone. E.g.:
BufferedReader rdr = new BufferedReader(
new InputStreamReader(
new BufferedInputStream(
new FileInputStream(aFile)
)
/* add an encoding arg here (e.g., ', "UTF-8"') if appropriate */
)
);
If you post relevant parts of your code, there'd be a chance we could comment on how to improve the processing.
EDIT:
Based on your edit, here are a couple of suggestions:
Compile the pattern once and save it as a static variable, rather than compiling every time you call parseString.
Store the values of temp.indexOf(key) and temp.indexOf(value) when you first call them and then use the stored values instead of calling indexOf a second time.
It looks like its spending most of it time in regular expressions. I would firstly try writing the code without using a regular expression and then using multiple threads as if the process still appears to be CPU bound.
For the counter, I would look at using TObjectIntHashMap to reduce the overhead of the counter. I would use only one map, not create an array of string - counts which I then use to build another map, this could be a significant waste of time.
Precompile the pattern instead of compiling it every time through that method, and rid of the double buffering: use new BufferedReader(new FileReader(...)).

String concatenation in Java - when to use +, StringBuilder and concat [duplicate]

This question already has answers here:
StringBuilder vs String concatenation in toString() in Java
(20 answers)
Closed 8 years ago.
When should we use + for concatenation of strings, when is StringBuilder preferred and When is it suitable to use concat.
I've heard StringBuilder is preferable for concatenation within loops. Why is it so?
Thanks.
Modern Java compiler convert your + operations by StringBuilder's append. I mean to say if you do str = str1 + str2 + str3 then the compiler will generate the following code:
StringBuilder sb = new StringBuilder();
str = sb.append(str1).append(str2).append(str3).toString();
You can decompile code using DJ or Cavaj to confirm this :)
So now its more a matter of choice than performance benefit to use + or StringBuilder :)
However given the situation that compiler does not do it for your (if you are using any private Java SDK to do it then it may happen), then surely StringBuilder is the way to go as you end up avoiding lots of unnecessary String objects.
I tend to use StringBuilder on code paths where performance is a concern. Repeated string concatenation within a loop is often a good candidate.
The reason to prefer StringBuilder is that both + and concat create a new object every time you call them (provided the right hand side argument is not empty). This can quickly add up to a lot of objects, almost all of which are completely unnecessary.
As others have pointed out, when you use + multiple times within the same statement, the compiler can often optimize this for you. However, in my experience this argument doesn't apply when the concatenations happen in separate statements. It certainly doesn't help with loops.
Having said all this, I think top priority should be writing clear code. There are some great profiling tools available for Java (I use YourKit), which make it very easy to pinpoint performance bottlenecks and optimize just the bits where it matters.
P.S. I have never needed to use concat.
From Java/J2EE Job Interview Companion:
String
String is immutable: you can’t modify a String object but can replace it by creating a new instance. Creating a new instance is rather expensive.
//Inefficient version using immutable String
String output = "Some text";
int count = 100;
for (int i = 0; i < count; i++) {
output += i;
}
return output;
The above code would build 99 new String objects, of which 98 would be thrown away immediately. Creating new objects is not efficient.
StringBuffer/StringBuilder
StringBuffer is mutable: use StringBuffer or StringBuilder when you want to modify the contents. StringBuilder was added in Java 5 and it is identical in all respects to StringBuffer except that it is not synchronised, which makes it slightly faster at the cost of not being thread-safe.
//More efficient version using mutable StringBuffer
StringBuffer output = new StringBuffer(110);
output.append("Some text");
for (int i = 0; i < count; i++) {
output.append(i);
}
return output.toString();
The above code creates only two new objects, the StringBuffer and the final String that is returned. StringBuffer expands as needed, which is costly however, so it would be better to initialise the StringBuffer with the correct size from the start as shown.
If all concatenated elements are constants (example : "these" + "are" + "constants"), then I'd prefer the +, because the compiler will inline the concatenation for you. Otherwise, using StringBuilder is the most effective way.
If you use + with non-constants, the Compiler will internally use StringBuilder as well, but debugging becomes hell, because the code used is no longer identical to your source code.
My recommendation would be as follows:
+: Use when concatenating 2 or 3 Strings simply to keep your code brief and readable.
StringBuilder: Use when building up complex String output or where performance is a concern.
String.format: You didn't mention this in your question but it is my preferred method for creating Strings as it keeps the code the most readable / concise in my opinion and is particularly useful for log statements.
concat: I don't think I've ever had cause to use this.
Use StringBuilder if you do a lot of manipulation. Usually a loop is a pretty good indication of this.
The reason for this is that using normal concatenation produces lots of intermediate String object that can't easily be "extended" (i.e. each concatenation operation produces a copy, requiring memory and CPU time to make). A StringBuilder on the other hand only needs to copy the data in some cases (inserting something in the middle, or having to resize because the result becomes to big), so it saves on those copy operations.
Using concat() has no real benefit over using + (it might be ever so slightly faster for a single +, but once you do a.concat(b).concat(c) it will actually be slower than a + b + c).
Use + for single statements and StringBuilder for multiple statements/ loops.
The performace gain from compiler applies to concatenating constants.
The rest uses are actually slower then using StringBuilder directly.
There is not problem with using "+" e.g. for creating a message for Exception because it does not happen often and the application si already somehow screwed at the moment. Avoid using "+" it in loops.
For creating meaningful messages or other parametrized strings (Xpath expressions e.g.) use String.format - it is much better readable.
I suggest to use concat for two string concatination and StringBuilder otherwise, see my explanation for concatenation operator (+) vs concat()

Java concatenate to build string or format

I'm writing a MUD (text based game) at the moment using java. One of the major aspects of a MUD is formatting strings and sending it back to the user. How would this best be accomplished?
Say I wanted to send the following string:
You say to Someone "Hello!" - where "Someone", "say" and "Hello!" are all variables. Which would be best performance wise?
"You " + verb + " to " + user + " \"" + text + "\""
or
String.format("You %1$s to %2$s \"%3$s\"", verb, user, text)
or some other option?
I'm not sure which is going to be easier to use in the end (which is important because it'll be everywhere), but I'm thinking about it at this point because concatenating with +'s is getting a bit confusing with some of the bigger lines. I feel that using StringBuilder in this case will simply make it even less readable.
Any suggestion here?
If the strings are built using a single concatenation expression; e.g.
String s = "You " + verb + " to " + user + " \"" + text + "\"";
then this is more or less equivalent to the more long winded:
StringBuilder sb = new StringBuilder();
sb.append("You");
sb.append(verb);
sb.append(" to ");
sb.append(user);
sb.append(" \"");
sb.append(text );
sb.append('"');
String s = sb.toString();
In fact, a classic Java compiler will compile the former into the latter ... almost. In Java 9, they implemented JEP 280 which replaces the sequence of constructor and method calls in the bytecodes with a single invokedynamic bytecode. The runtime system then optimizes this1.
The efficiency issues arise when you start creating intermediate strings, or building strings using += and so on. At that point, StringBuilder becomes more efficient because you reduce the number of intermediate strings that get created and then thrown away.
Now when you use String.format(), it should be using a StringBuilder under the hood. However, format also has to parse the format String each time you make the call, and that is an overhead you don't have if you do the string building optimally.
Having said this, My Advice would be to write the code in the way that is most readable. Only worry about the most efficient way to build strings if profiling tells you that this is a real performance concern. (Right now, you are spending time thinking about ways to address a performance issue that may turn out to be insignificant or irrelevant.)
Another answer mentions that using a format string may simplify support for multiple languages. This is true, though there are limits as to what you can do with respect to such things as plurals, genders, and so on.
1 - As a consequence, hand optimization as per the example above might actually have negative consequences, for Java 9 or later. But this is a risk you take whenever you micro-optimize.
I think that concatenation with + is more readable than using String.format.
String.format is good when you need to format number and dates.
Concateneting with plus, the compilet can transforms the code in performatic way. With string format i don t know.
I prefer cocatenation with plus, i think that is easer to undersand.
The key to keeping it simple is to never look at it. Here is what I mean:
Joiner join = Joiner.on(" ");
public void constructMessage(StringBuilder sb, Iterable<String> words) {
join.appendTo(sb, words);
}
I'm using the Guava Joiner class to make readability a non-issue. What could be clearer than "join"? All the nasty bits regarding concatenation are nicely hidden away. By using Iterable, I can use this method with all sorts of data structures, Lists being the most obvious.
Here is an example of a call using a Guava ImmutableList (which is more efficient than a regular list, since any methods that modify the list just throw exceptions, and correctly represents the fact that constructMessage() cannot change the list of words, just consume it):
StringBuilder outputMessage = new StringBuilder();
constructMessage(outputMessage,
new ImmutableList.Builder<String>()
.add("You", verb, "to", user, "\"", text, "\"")
.build());
I will be honest and suggest that you take the first one if you want less typing, or the latter one if you are looking for a more C-style way of doing it.
I sat here for a minute or two pondering the idea of what could be a problem, but I think it comes down to how much you want to type.
Anyone else have an idea?
Assuming you are going to reuse base strings often Store your templates like
String mystring = "You $1 to $2 \"$3\""
Then just get a copy and do a replace $X with what you want.
This would work really well for a resource file too.
I think String.format looks cleaner.
However you can use StringBuilder and use append function to create the string you want
The best, performance-wise, would probably be to use a StringBuffer.

Can I optimize this code?

I am trying to retrieve the data from the table and convert each row into CSV format like
s12, james, 24, 1232, Salaried
The below code does the job, but takes a long time, with tables of rows exceeding 1,00,000.
Please advise on optimizing technique:
while(rset1.next()!=false) {
sr=sr+"\n";
for(int j=1;j<=rsMetaData.getColumnCount();j++)
{
if(j< 5)
{
sr=sr+rset1.getString(j).toString()+",";
}
else
sr=sr+rset1.getString(j).toString();
}
}
/SR
Two approaches, in order of preference:
Stream the output
PrintWriter csvOut = ... // Construct a write from an outputstream, say to a file
while (rs.next())
csvOut.println(...) // Write a single line
(note that you should ensure that your Writer / OutputStream is buffered, although many are by default)
Use a StringBuilder
StringBuilder sb = new StringBuilder();
while (rs.next())
sb.append(...) // Write a single line
The idea here is that appending Strings in a loop is a bad idea. Imagine that you have a string. In Java, Strings are immutable. That means that to append to a string you have to copy the entire string and then write more to the end. Since you are appending things a little bit at a time, you will have many many copies of the string which aren't really useful.
If you're writing to a File, it's most efficient just to write directly out with a stream or a Writer. Otherwise you can use the StringBuilder which is tuned to be much more efficient for appending many small strings together.
I'm no Java expert, but I think it's always bad practice to use something like getColumnCount() in a conditional check. This is because after each loop, it runs that function to see what the column count is, instead of just referencing a static number. Instead, set a variable equal to that number and use the variable to compare against j.
You might want to use a StringBuilder to build the string, that's much more efficient when you're doing a lot of concatenation. Also if you have that much data, you might want to consider writing it directly to wherever you're going to put it instead of building it in memory at first, if that's a file or a socket, for example.
StringBuilder sr = new StringBuilder();
int columnCount =rsMetaData.getColumnCount();
while (rset1.next()) {
sr.append('\n');
for (int j = 1; j <= columnCount; j++) {
sr.append(rset1.getString(j));
if (j < 5) {
sr.append(',');
}
}
}
As a completely different, but undoubtely the most optimal alternative, use the DB-provided export facilities. It's unclear which DB you're using, but as per your question history you seem to be doing a lot with Oracle. In this case, you can export a table into a CSV file using UTL_FILE.
See also:
Generating CSV files using Oracle
Stored procedure example on Ask Tom
As the other answers say, stop appending to a String. In Java, String objects are immutable, so each append must do a full copy of the string, turning this into an O(n^2) operation.
The other is big slowdown is fetch size. By default, the driver is likely to fetch one row at a time. Even if this takes 1ms, that limits you to a thousand rows per second. A remote database, even on the same network, will be much worse. Try calling setFetchSize(1000) on the Statement. Beware that setting the fetch size too big can cause out of memory errors with some database drivers.
I don't believe minor code changes are going to make a substantive difference. I'd surely use a StringBuffer however.
He's going to be reading a million rows over a wire, assuming his database is on a separate machine. First, if performance is unacceptable, I'd run that code on the database server and clip the network out of the equation. If it's the sort of code that gets run once a week as a batch job that may be ok.
Now, what are you going to do with the StringBuffer or String once it is fully loaded from the database? We're looking at a String that could be 50 Mbyte long.
This should be 1 iota faster since it removes the unneeded (i<5) check.
StringBuilder sr = new StringBuilder();
int columnCount =rsMetaData.getColumnCount();
while (rset1.next()) {
for (int j = 1; j < columnCount; j++) {
sr.append(rset1.getString(j)).append(",");
}
// I suspect the 'if (j<5)' really meant, "if we aren't on the last
// column then tack on a comma." So we always tack it on above and
// write the last column and a newline now.
sr.append(rset1.getString(columnCount)).append("\n");
}
}
Another answer is to change the select so it returns a comma-sep string. Then we read the single-column result and append it to the StringBuffer.
I forget the syntax now, but something like:
select column1 || "," || column2 || "," ... from table;
Now we don't need to loop and comma concatenation business.
StringBuilder sr = new StringBuilder();
while (rset1.next()) {
sr.append(rset1.getString(1)).append("\n");
}
}

Categories