I have:
String str = "Hello, how, are, you";
I want to create a helper method that removes the commas from any string. Which of the following is more accurate?
private static String removeComma(String str){
if(str.contains(",")){
str = str.replaceAll(",","");
}
return str;
}
OR
private static String removeComma(String str){
str = str.replaceAll(",","");
return str;
}
Seems like I don't need the IF statement but there might be a case where I do.
If there is a better way let me know.
Both are functionally equivalent but the former is more verbose and will probably be slower because it runs an extra operation.
Also note that you don't need replaceAll (which accepts a regular expression): replace will do.
So I would go for:
private static String removeComma(String str){
return str.replace(",", "");
}
The IF statement is unnecessary, unless you're handling "large" strings (we're talking megabytes or more).
If you're using the IF statement, your code will first search for the first occurance of a comma, and then execute the replacement. This could be costly if the comma is near the end of the string and your string is large, since it will have to be traversed twice.
Without the IF statement, commas will be replaced if they exist. If the answer is negative, your string will be untouched.
Bottom rule: use the version without the IF statement.
Both are correct, but the second one is cleaner since the IF statement of the first alternative is not needed.
It's a matter of what is the probability to have strings with comma in your universe of strings.
If you have a high probability, call the method replaceAll without checking first.
BUT If you are not using extremely huge strings, I guess you will see no difference in perfomance at all.
Just another solution with time complexity O(n), space complexity O(n):
public static String removeComma(String str){
int length = str.length();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < length; i++) {
char c = str.charAt(i);
if (c != ',') {
sb.append(c);
}
}
return sb.toString();
}
Related
Let's say there has a string like " world ". This String only has the blank at front and end. Is the trim() faster than replace()?
I used the replace once and my mentor said don't use it since the trim() probably faster.
If not, what's the advantage of trim() than replace()?
If we look at the source code for the methods:
replace():
public String replace(CharSequence target, CharSequence replacement) {
String tgtStr = target.toString();
String replStr = replacement.toString();
int j = indexOf(tgtStr);
if (j < 0) {
return this;
}
int tgtLen = tgtStr.length();
int tgtLen1 = Math.max(tgtLen, 1);
int thisLen = length();
int newLenHint = thisLen - tgtLen + replStr.length();
if (newLenHint < 0) {
throw new OutOfMemoryError();
}
StringBuilder sb = new StringBuilder(newLenHint);
int i = 0;
do {
sb.append(this, i, j).append(replStr);
i = j + tgtLen;
} while (j < thisLen && (j = indexOf(tgtStr, j + tgtLen1)) > 0);
return sb.append(this, i, thisLen).toString()
}
Vs trim():
public String trim() {
int len = value.length;
int st = 0;
char[] val = value; /* avoid getfield opcode */
while ((st < len) && (val[st] <= ' ')) {
st++;
}
while ((st < len) && (val[len - 1] <= ' ')) {
len--;
}
return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
}
As you can see replace() calls multiple other methods and iterates throughout the entire String, while trim() simply iterates over the beginning and ending of the String until the character isn't a white space. So in the single respect of trying to only remove white space before and after a word, trim() is more efficient.
We can run some benchmarks on this:
public static void main(String[] args) {
long testStartTime = System.nanoTime();;
trimTest();
long trimTestTime = System.nanoTime() - testStartTime;
testStartTime = System.nanoTime();
replaceTest();
long replaceTime = System.nanoTime() - testStartTime;
System.out.println("Time for trim(): " + trimTestTime);
System.out.println("Time for replace(): " + replaceTime);
}
public static void trimTest() {
for(int i = 0; i < 1000000; i ++) {
new String(" string ").trim();
}
}
public static void replaceTest() {
for(int i = 0; i < 1000000; i ++) {
new String(" string ").replace(" ", "");
}
}
Output:
Time for trim(): 53303903
Time for replace(): 485536597
//432,232,694 difference
Assuming that the people writing the Java library code are doing a good job1, you can assume that a special purpose method (like trim()) will be as fast, and probably faster than a general purpose method (like replace(...)) doing the same thing.
Two reasons:
If the special purpose method is slower, its implementation can be rewritten as equivalent calls to the general purpose one, making the performance equivalent in most cases. A competent programmer will do this because it reduces maintenance costs.
In the special purpose method, it is likely that there will be optimizations that can be made that don't apply in the general-purpose case.
In this case we know that trim() only needs to look at the start and end of the string ... whereas replace(...) needs to look at all of the characters in the string. (We can infer this from the description of what the respective methods do.)
If we assume "competence" then we can infer that the developers will have done the analysis and not implemented trim() sub-optimally2; i.e. they won't code trim() to examine all characters.
There is another reason to use the special purpose method over the general purpose. It makes your code simpler, easier to read, and easier to inspect for correctness. This may well be more important than performance.
This clearly applies in the case of trim() versus replace(...).
1 - We can in this case. There are lots of eyes looking at this code, and lots of people who will complain loudly about egregious performance issues.
2 - Unfortunately, it is not always as straightforward as this. A library method needs to be optimized for "typical" behavior, but it also needs to avoid pathological performance in edge-cases. It is not always possible to achieve both things.
trim() is definitely faster to type, yes. It doesn't take any parameters.
It is also much faster to understand what you where trying to do. You were trying to trim the string, rather than replacing all the spaces it contains with the empty string, knowing from other context that there is only space at the beginning and the end of the string.
Indeed much faster no matter how you look at it. Don't complicate the life of the persons who're trying to read your code. Most of the time, it will be you months later, or at least someone you don't hate.
Trim will prune the outter characters until they are non white space. I believe they trim space, tab, and new lines.
Replace will scan the entire string (so, it could be a sentense) and would replace inner " " with "", essentially compressing them together.
They have different use cases though, obviously 1 is to clean up user input where the other is to update a string where matches are found with something else.
That being said, run times: Replace will run in N time, as it will look for all matching characters. Trim will run in O(N), but most likely a just a few characters off of each end.
The idea behind trim i think came around from people would would type and input things but accidentally press space before submitting their forms, essentially trying to save the field "Foo " instead of "Foo"
s.trim() shortens a String s. This means no characters has to be moved from an index to another. It starts at the first character (s.toCharArray()[0]) of the String and shortens the String character by character until the first non-whitespace character occurs. It works the same way to shorten the String at the end. So it compresses the String. If a String has no leading and trailing whitespace trim will be ready after checking the first and the last character.
In case of " world ".trim() two steps are needed: one to remove the first leading whitespace as it is on the first index and the the second to remove the last whitespace as it is on the last index.
" world ".replace(" ", "") will need at least n = " world ".length() steps. It has to check every character if it has to be replaced. But if we take into account that the implementation of String.replace(...) needs to compile a Pattern, build a Matcher and then to replace all the matched regions it's seems far complex comparing to shorten a String.
We also have to consider that " world ".replace(" ", "") does not replace whitespaces but only the String " ". Since String replace(CharSequence target, CharSequence replacement) compiles the target using Pattern.LITERAL we cannot use the character class \s. To be more accurate we would have to compare " world ".trim() to " world ".replaceAll("\\s", ""). It is still not the same because a whitespace in String trim() is defined as c <= ' ' for each c in s.toCharArray().
Summarizing: String.trim() should be faster - especially for long strings
The description how the methods work is based on the implementation of String in Java 8. But implementations can change.
But the question should be: What do you intent to do with the string? Do you want to trim it or to replace some characters? According to it use the corresponding method.
I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.
Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?
Example for StringBuilder:
StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
newString.setCharAt(i, 'X');
}
Example for char array conversion:
char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
myNameChars[i] = 'X';
}
myString = String.valueOf(newStringArray);
What are the pros/cons to each different way?
I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.
I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:
Sting s = "foo";
s += "bar";
s += "baz";
If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.
Also it's important to know that these strings are not being modified. In java, Strings are immutable.
This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.
What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.
As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.
If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.
If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.
On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.
For example:
public String toString() {
return "{field1 =" + this.field1 +
", field2 =" + this.field2 +
...
", field50 =" + this.field50 + "}";
}
Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.
String s = ...;
if (someCondition) {
s += someValue;
}
s += additionalValue;
return s;
Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)
String s = "";
for (final Object item : items) {
s += item + "\n";
}
Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.
Which option will perform the best is not an easy question.
I did a benchmark using Caliper:
RUNTIME (NS)
array 88
builder 126
builderTillEnd 76
concat 3435
Benchmarked methods:
public static String array(String input)
{
char[] result = input.toCharArray(); // COPYING
for (int i = 0; i < input.length(); i++)
{
result[i] = 'X';
}
return String.valueOf(result); // COPYING
}
public static String builder(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result.toString(); // COPYING
}
public static StringBuilder builderTillEnd(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result;
}
public static String concat(String input)
{
String result = "";
for (int i = 0; i < input.length(); i++)
{
result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
// result = new StringBuilder(result).append('X').toString();
}
return result;
}
Remarks
If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.
java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:
public void setCharAt(int index, char ch) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
value[index] = ch;
}
AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.
Conclusions
If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.
If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.
Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().
I'd prefer to use StringBuilder class where original string is modified.
For String manipulation, I like StringUtil class. You'll need to get Apache commons dependency to use it
which of the following is an efficient way to reverse words in a string ?
public String Reverse(StringTokenizer st){
String[] words = new String[st.countTokens()];
int i = 0;
while(st.hasMoreTokens()){
words[i] = st.nextToken();i++}
for(int j = words.length-1;j--)
output = words[j]+" ";}
OR
public String Reverse(StringTokenizer st, String output){
if(!st.hasMoreTokens()) return output;
output = st.nextToken()+" "+output;
return Reverse(st, output);}
public String ReverseMain(StringTokenizer st){
return Reverse(st, "");}
while the first way seems more readable and straight forward, there are two loops in it. In the 2nd method, I've tried doing it in tail-recursive way. But I am not sure whether java does optimize tail-recursive code.
you could do this in just one loop
public String Reverse(StringTokenizer st){
int length = st.countTokens();
String[] words = new String[length];
int i = length - 1;
while(i >= 0){
words[i] = st.nextToken();i--}
}
But I am not sure whether java does optimize tail-recursive code.
It doesn't. Or at least the Sun/Oracle Java implementations don't, up to and including Java 7.
References:
"Tail calls in the VM" by John Rose # Oracle.
Bug 4726340 - RFE: Tail Call Optimization
I don't know whether this makes one solution faster than the other. (Test it yourself ... taking care to avoid the standard micro-benchmarking traps.)
However, the fact that Java doesn't implement tail-call optimization means that the 2nd solution is liable to run out of stack space if you give it a string with a large (enough) number of words.
Finally, if you are looking for a more space efficient way to implement this, there is clever way that uses just a StringBuilder.
Create a StringBuilder from your input String
Reverse the characters in the StringBuilder using reverse().
Step through the StringBuilder, identifying the start and end offset of each word. For each start/end offset pair, reverse the characters between the offsets. (You have to do this using a loop.)
Turn the StringBuilder back into a String.
You can test results by timing both of them on a large amount of results
eg. You reverse 100000000 strings and see how many seconds it takes. You could also compare start and end system timestamps to get the exact difference between the two functions.
StringTokenizer is not deprecated but if you read the current JavaDoc...
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
String[] strArray = str.split(" ");
StringBuilder sb = new StringBuilder();
for (int i = strArray.length() - 1; i >= 0; i--)
sb.append(strArray[i]).append(" ");
String reversedWords = sb.substring(0, sb.length -1) // strip trailing space
I need to trim a String in java so that:
The quick brown fox jumps over the laz dog.
becomes
The quick brown...
In the example above, I'm trimming to 12 characters. If I just use substring I would get:
The quick br...
I already have a method for doing this using substring, but I wanted to know what is the fastest (most efficient) way to do this because a page may have many trim operations.
The only way I can think off is to split the string on spaces and put it back together until its length passes the given length. Is there an other way? Perhaps a more efficient way in which I can use the same method to do a "soft" trim where I preserve the last word (as shown in the example above) and a hard trim which is pretty much a substring.
Thanks,
Below is a method I use to trim long strings in my webapps.
The "soft" boolean as you put it, if set to true will preserve the last word.
This is the most concise way of doing it that I could come up with that uses a StringBuffer which is a lot more efficient than recreating a string which is immutable.
public static String trimString(String string, int length, boolean soft) {
if(string == null || string.trim().isEmpty()){
return string;
}
StringBuffer sb = new StringBuffer(string);
int actualLength = length - 3;
if(sb.length() > actualLength){
// -3 because we add 3 dots at the end. Returned string length has to be length including the dots.
if(!soft)
return escapeHtml(sb.insert(actualLength, "...").substring(0, actualLength+3));
else {
int endIndex = sb.indexOf(" ",actualLength);
return escapeHtml(sb.insert(endIndex,"...").substring(0, endIndex+3));
}
}
return string;
}
Update
I've changed the code so that the ... is appended in the StringBuffer, this is to prevent needless creations of String implicitly which is slow and wasteful.
Note: escapeHtml is a static import from apache commons:
import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
You can remove it and the code should work the same.
Here is a simple, regex-based, 1-line solution:
str.replaceAll("(?<=.{12})\\b.*", "..."); // How easy was that!? :)
Explanation:
(?<=.{12}) is a negative look behind, which asserts that there are at least 12 characters to the left of the match, but it is a non-capturing (ie zero-width) match
\b.* matches the first word boundary (after at least 12 characters - above) to the end
This is replaced with "..."
Here's a test:
public static void main(String[] args) {
String input = "The quick brown fox jumps over the lazy dog.";
String trimmed = input.replaceAll("(?<=.{12})\\b.*", "...");
System.out.println(trimmed);
}
Output:
The quick brown...
If performance is an issue, pre-compile the regex for an approximately 5x speed up (YMMV) by compiling it once:
static Pattern pattern = Pattern.compile("(?<=.{12})\\b.*");
and reusing it:
String trimmed = pattern.matcher(input).replaceAll("...");
Please try following code:
private String trim(String src, int size) {
if (src.length() <= size) return src;
int pos = src.lastIndexOf(" ", size - 3);
if (pos < 0) return src.substring(0, size);
return src.substring(0, pos) + "...";
}
Try searching for the last occurence of a space that is in a position less or more than 11 and trim the string there, by adding "...".
Your requirements aren't clear. If you have trouble articulating them in a natural language, it's no surprise that they'll be difficult to translate into a computer language like Java.
"preserve the last word" implies that the algorithm will know what a "word" is, so you'll have to tell it that first. The split is a way to do it. A scanner/parser with a grammar is another.
I'd worry about making it work before I concerned myself with efficiency. Make it work, measure it, then see what you can do about performance. Everything else is speculation without data.
How about:
mystring = mystring.replaceAll("^(.{12}.*?)\b.*$", "$1...");
I use this hack : suppose that the trimmed string must have 120 of length :
String textToDisplay = textToTrim.substring(0,(textToTrim.length() > 120) ? 120 : textToTrim.length());
if (textToDisplay.lastIndexOf(' ') != textToDisplay.length() &&textToDisplay.length()!=textToTrim().length()) {
textToDisplay = textToDisplay + textToTrim.substring(textToDisplay.length(),textToTrim.indexOf(" ", textToDisplay.length()-1))+ " ...";
}
Basically given an int, I need to generate a String with the same length containing only the specified character. Related question here, but it relates to C# and it does matter what's in the String.
This question, and my answer to it are why I am asking this one. I'm not sure what's the best way to go about it performance wise.
Example
Method signature:
String getPattern(int length, char character);
Usage:
//returns "zzzzzz"
getPattern(6, 'z');
What I've tried
String getPattern(int length, char character) {
String result = "";
for (int i = 0; i < length; i++) {
result += character;
}
return result;
}
Is this the best that I can do performance-wise?
You should use StringBuilder instead of concatenating chars this way. Use StringBuilder.append().
StringBuilder will give you better performance. The problem with concatenation the way you are doing is each time a new String (string is immutable) is created then the old string is copied, the new string is appended, and the old String is thrown away. It's a lot of extra work that over a period of type (like in a big for loop) will cause performance degradation.
StringUtils from commons-lang or Strings from guava are your friends. As already stated avoid String concatenations.
StringUtils.repeat("a", 3) // => "aaa"
Strings.repeat("hey", 3) // => "heyheyhey"
Use primitive char arrays & some standard util classes like Arrays
public class Test {
static String getPattern(int length, char character) {
char[] cArray = new char[length];
Arrays.fill(cArray, character);
// return Arrays.toString(cArray);
return new String(cArray);
}
static String buildPattern(int length, char character) {
StringBuilder sb= new StringBuilder(length);
for (int i = 0; i < length; i++) {
sb.append(character);
}
return sb.toString();
}
public static void main(String args[]){
long time = System.currentTimeMillis();
getPattern(10000000,'c');
time = System.currentTimeMillis() - time;
System.out.println(time); //prints 93
time = System.currentTimeMillis();
buildPattern(10000000,'c');
time = System.currentTimeMillis() - time;
System.out.println(time); //prints 188
}
}
EDIT Arrays.toString() gave lower performance since it eventually used a StringBuilder, but the new String did the magic.
Yikes, no.
A String is immutable in java; you can't change it. When you say:
result += character;
You're creating a new String every time.
You want to use a StringBuilder and append to it, then return a String with its toString() method.
I think it would be more efficient to do it like following,
String getPattern(int length, char character)
{
char[] list = new char[length];
for(int i =0;i<length;i++)
{
list[i] = character;
}
return new string(list);
}
Concatenating a String is never the most efficient, since String is immutable, for better performance you should use StringBuilder, and append()
String getPattern(int length, char character) {
StringBuilder sb= new StringBuilder(length)
for (int i = 0; i < length; i++) {
sb.append(character);
}
return sb.toString();
}
Performance-wise, I think you'd have better results creating a small String and concatenating (using StringBuilder of course) until you reach the request size: concatenating/appending "zzz" to "zzz" performs probably betters than concatenating 'z' three times (well, maybe not for such small numbers, but when you reach 100 or so chars, doing ten concatenations of 'z' followed by ten concatenations of "zzzzzzzzzz" is probably better than 100 concatenatinos of 'z').
Also, because you ask about GWT, results will vary a lot between DevMode (pure Java) and "production mode" (running in JS in the browser), and is likely to vary depending on the browser.
The only way to really know is to benchmark, everything else is pure speculation.
And possibly use deferred binding to use the most performing variant in each browser (that's exactly how StringBuilder is emulated in GWT).