say I use the following methods to search for a palindrome. I know the first one is O(n) because it goes through entire string. Does the .reverse() in the StringBuffer also do O(n)? Im not worried about finding a better way to problem Im trying to understand if the reverse method physically reverses the string or is it much more efficient than that?? Thanks
public static boolean isAPalindrome(String s1){
String tmp = "";
int length = s1.length();
for(int i = 0; i < s1.length(); i++){
tmp += s1.charAt(s1.length()-i-1);
}
if (s1.equals(tmp)) return true;
return false;
}
public static boolean isAPalindrome(String s1){
StringBuffer a = new StringBuffer(s1);
return s1.equals(a.reverse().toString());
}
You could implement a reverse that is effectively o(1), but not with the typical string classes: You could do it by implementing a string class with a direction member, that can take the values "forward" or "backward". But that would only make sense if the reverting dominates your performance over all other computations.
Reversing a string can't be faster than O(n) (under normal representation of a String).
You have to "operate" on every character of the string, so it can't theoretically be faster than that.
StringBuffer extends AbstractStringBuilder which implements reverse. You may look at the souce code yourself here.
Note: this is not saying that StringBuilder/Buffers reverse will run as fast as an O(n) reverse method that you will write. It will probably run much faster. But asymptotically it will still be O(n).
Related
I have:
String str = "Hello, how, are, you";
I want to create a helper method that removes the commas from any string. Which of the following is more accurate?
private static String removeComma(String str){
if(str.contains(",")){
str = str.replaceAll(",","");
}
return str;
}
OR
private static String removeComma(String str){
str = str.replaceAll(",","");
return str;
}
Seems like I don't need the IF statement but there might be a case where I do.
If there is a better way let me know.
Both are functionally equivalent but the former is more verbose and will probably be slower because it runs an extra operation.
Also note that you don't need replaceAll (which accepts a regular expression): replace will do.
So I would go for:
private static String removeComma(String str){
return str.replace(",", "");
}
The IF statement is unnecessary, unless you're handling "large" strings (we're talking megabytes or more).
If you're using the IF statement, your code will first search for the first occurance of a comma, and then execute the replacement. This could be costly if the comma is near the end of the string and your string is large, since it will have to be traversed twice.
Without the IF statement, commas will be replaced if they exist. If the answer is negative, your string will be untouched.
Bottom rule: use the version without the IF statement.
Both are correct, but the second one is cleaner since the IF statement of the first alternative is not needed.
It's a matter of what is the probability to have strings with comma in your universe of strings.
If you have a high probability, call the method replaceAll without checking first.
BUT If you are not using extremely huge strings, I guess you will see no difference in perfomance at all.
Just another solution with time complexity O(n), space complexity O(n):
public static String removeComma(String str){
int length = str.length();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < length; i++) {
char c = str.charAt(i);
if (c != ',') {
sb.append(c);
}
}
return sb.toString();
}
I have a code that check a string for space,comma and etc. Well since
I will deal a scenario where my app will going to check, lets say thousand of string with a max length of 15 and a minimum length of 14. I am worried if it will affect the performance since it is in android. Check the code i used..
private final static char[] undefinedChars = {' ','/','.','<','>','*','!'};
public static boolean checkMessage(String message){
if (message == null)
return false;
char[] _message = message.toCharArray();
for (char c : _message) {
for (int i = 0;i > undefinedChars.length;i++)
if (c == undefinedChars[i])
return true;
}
return false;
}
Is this correct? or there is a way to improve it?
There is a change that you could make that might make a little difference:
Change
char[] _message = message.toCharArray();
for (char c : _message) {
to
for (int i = 0; i < message.length(); i++) {
char c = message.charAt(i);
However, I doubt that it will be significant.
Replacing the inner loop with a switch is more likely to be fruitful, though it depends on what the JIT compiler does with the code. (And a switch will only works if the set of undefined characters can be hard-wired into the switch statement as compile-time constants.)
I am worried if it will affect the performance since it is in android.
Don't "worry". Approach the problem scientifically.
Implement the code and then benchmark it.
If the measured performance is a concern, then:
profile the code
look at hotspots, and identify possible improvements
implement and test possible improvement
rerun the benchmark to see if the improvement actually made any difference
repeat ... until performance is good enough or you run out of options.
The other thing to note is that the same code could well perform differently across different Android platforms. The quality of JIT compilers has (apparently) improved markedly in more recent releases.
I would argue that it is a bad idea to "bend" your code just to get it to run well on old phones. The chances are that the user will upgrade their hardware soon anyway ... and it is conceivable that your optimization for the old platform actually makes your code slower on a new platform ... 'cos your hand-optimizations have made the code too tricky for the JIT compiler's optimizer to deal with.
This is also an argument for NOT trying to make your code go "as fast as possible" ...
First of all, I see a bug there.
for (int i = 0;i > undefinedChars.length;i++)
that I think you meant
for (int i = 0;i < undefinedChars.length;i++)
instead?
Anyway it seems that your algorithm runs in O(m*n) where m is the length of message and n is the length of undefined chars(in this case fixed size, 15). Therefore it should be efficient in run-time analysis perspective.
I would profile the scenario first then decide how to improve it, that you could've sorted the message upfront somewhere then you can only check for either 1st char or the last char of the string, but as I said, only if that's been sorted elsewhere.
Or maybe think of parallelizing the routine. It should be straightforward.
Without using memory, you're about as fast as you can get. You can trade memory for performance. For example, you can put the characters you want to check into a HashMap. Then you can loop over the string you're checking, and check if each index is in that map or not. If the number of characters you want to check for is small, this will be less efficient. If the number is big, it will be more efficient (Technically this algorithm is O(n) instead of O(n*m), but if m is small then the constants you're usually taught to ignore will matter).
Another way is to use an array of booleans, with each possible character in the string mapping to an index in that array. Set only the characters you care about to true (and save that array). Then you can avoid the hash calculation above, but at the cost of a lot of memory.
Really, your original algorithm is likely good enough. But these (especially the hash map) are things you can consider if needed.
Try using a regular expression. I find it very clean and it should not hurt your performance.
public static boolean checkMessage(String message)
{
if (message == null)
return false;
String regex = " |\\.|/|<|>|\\*|!";
Matcher matcher = Pattern.compile(regex).matcher(message);
if (matcher.find())
return true;
else
return false;
}
For symmetry and possibly some compiler optimization, why not use a for-each style loop for both loops. As an additional benefit, you wouldn't risk a typo like the one pointed out by glaze. Your code would then become:
private final static char[] undefinedChars = {' ','/','.','<','>','*','!'};
public static boolean checkMessage(String message){
if (message == null)
return false;
char[] _message = message.toCharArray();
for (char c : _message) {
for (for u : undefinedChars)
if (c == u)
return true;
}
return false;
}
An additional optimization would be to order the characters in undefinedChars in the order most likely to occur. That way you'll bail-out as quick as possible.
Use a Set to hold your undefinedChars
Set<Character> undefinedChars = new HashSet<Character>(Arrays.asList(new Character(' ') ,new Character('/'),new Character('.')));
public boolean hasUndefinedChar(String str) {
for (int i = 0; i < str.length(); i++) {
char iChar = str.charAt(i);
Character charWrapper = new Character(iChar);
if (undefinedChars.contains(charWrapper)) {
return true;
}
}
return false;
}
This method is O(n) time efficient and does not sufficiently affect space complexity. The contains calls to the Set are O(1) operations and you make n of these contains calls in the worst case.
I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.
Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?
Example for StringBuilder:
StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
newString.setCharAt(i, 'X');
}
Example for char array conversion:
char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
myNameChars[i] = 'X';
}
myString = String.valueOf(newStringArray);
What are the pros/cons to each different way?
I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.
I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:
Sting s = "foo";
s += "bar";
s += "baz";
If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.
Also it's important to know that these strings are not being modified. In java, Strings are immutable.
This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.
What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.
As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.
If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.
If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.
On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.
For example:
public String toString() {
return "{field1 =" + this.field1 +
", field2 =" + this.field2 +
...
", field50 =" + this.field50 + "}";
}
Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.
String s = ...;
if (someCondition) {
s += someValue;
}
s += additionalValue;
return s;
Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)
String s = "";
for (final Object item : items) {
s += item + "\n";
}
Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.
Which option will perform the best is not an easy question.
I did a benchmark using Caliper:
RUNTIME (NS)
array 88
builder 126
builderTillEnd 76
concat 3435
Benchmarked methods:
public static String array(String input)
{
char[] result = input.toCharArray(); // COPYING
for (int i = 0; i < input.length(); i++)
{
result[i] = 'X';
}
return String.valueOf(result); // COPYING
}
public static String builder(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result.toString(); // COPYING
}
public static StringBuilder builderTillEnd(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result;
}
public static String concat(String input)
{
String result = "";
for (int i = 0; i < input.length(); i++)
{
result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
// result = new StringBuilder(result).append('X').toString();
}
return result;
}
Remarks
If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.
java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:
public void setCharAt(int index, char ch) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
value[index] = ch;
}
AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.
Conclusions
If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.
If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.
Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().
I'd prefer to use StringBuilder class where original string is modified.
For String manipulation, I like StringUtil class. You'll need to get Apache commons dependency to use it
This question already has answers here:
C++ equivalent of StringBuffer/StringBuilder?
(10 answers)
Closed 9 years ago.
Consider this piece of code:
public String joinWords(String[] words) {
String sentence = "";
for(String w : words) {
sentence = sentence + w;
}
return sentence;
}
On each concatenation a new copy of the string is created, so that the overall complexity is O(n^2). Fortunately in Java we could solve this with a StringBuffer, which has O(1) complexity for each append, then the overall complexity would be O(n).
While in C++, std::string::append() has complexity of O(n), and I'm not clear about the complexity of stringstream.
In C++, are there methods like those in StringBuffer with the same complexity?
C++ strings are mutable, and pretty much as dynamically sizable as a StringBuffer. Unlike its equivalent in Java, this code wouldn't create a new string each time; it just appends to the current one.
std::string joinWords(std::vector<std::string> const &words) {
std::string result;
for (auto &word : words) {
result += word;
}
return result;
}
This runs in linear time if you reserve the size you'll need beforehand. The question is whether looping over the vector to get sizes would be slower than letting the string auto-resize. That, i couldn't tell you. Time it. :)
If you don't want to use std::string itself for some reason (and you should consider it; it's a perfectly respectable class), C++ also has string streams.
#include <sstream>
...
std::string joinWords(std::vector<std::string> const &words) {
std::ostringstream oss;
for (auto &word : words) {
oss << word;
}
return oss.str();
}
It's probably not any more efficient than using std::string, but it's a bit more flexible in other cases -- you can stringify just about any primitive type with it, as well as any type that has specified an operator <<(ostream&, its_type&) override.
This is somewhat tangential to your Question, but relevant nonetheless. (And too big for a comment!!)
On each concatenation a new copy of the string is created, so that the overall complexity is O(n^2).
In Java, the complexity of s1.concat(s2) or s1 + s2 is O(M1 + M2) where M1 and M2 are the respective String lengths. Turning that into the complexity of a sequence of concatenations is difficult in general. However, if you assume N concatenations of Strings of length M, then the complexity is indeed O(M * N * N) which matches what you said in the Question.
Fortunately in Java we could solve this with a StringBuffer, which has O(1) complexity for each append, then the overall complexity would be O(n).
In the StringBuilder case, the amortized complexity of N calls to sb.append(s) for strings of size M is O(M*N). The key word here is amortized. When you append characters to a StringBuilder, the implementation may need to expand its internal array. But the expansion strategy is to double the array's size. And if you do the math, you will see that each character in the buffer is going to be copied on average one extra time during the entire sequence of append calls. So the complexity of the entire sequence of appends still works out as O(M*N) ... and, as it happens M*N is the final string length.
So your end result is correct, but your statement about the complexity of a single call to append is not correct. (I understand what you mean, but the way you say it is facially incorrect.)
Finally, I'd note that in Java you should use StringBuilder rather than StringBuffer unless you need the buffer to be thread-safe.
As an example of a really simple structure that has O(n) complexity in C++11:
template<typename TChar>
struct StringAppender {
std::vector<std::basic_string<TChar>> buff;
StringAppender& operator+=( std::basic_string<TChar> v ) {
buff.push_back(std::move(v));
return *this;
}
explicit operator std::basic_string<TChar>() {
std::basic_string<TChar> retval;
std::size_t total = 0;
for( auto&& s:buff )
total+=s.size();
retval.reserve(total+1);
for( auto&& s:buff )
retval += std::move(s);
return retval;
}
};
use:
StringAppender<char> append;
append += s1;
append += s2;
std::string s3 = append;
This takes O(n), where n is the number of characters.
Finally, if you know how long all of the strings are, just doing a reserve with enough room makes append or += take a total of O(n) time. But I agree that is awkward.
Use of std::move with the above StringAppender (ie, sa += std::move(s1)) will significantly increase performance for non-short strings (or using it with xvalues etc)
I do not know the complexity of std::ostringstream, but ostringstream is for pretty printing formatted output, or cases where high performance is not important. I mean, they aren't bad, and they may even out perform scripted/interpreted/bytecode languages, but if you are in a rush, you need something else.
As usual, you need to profile, because constant factors are important.
A rvalue-reference-to-this operator+ might also be a good one, but few compilers implement rvalue references to this.
From an online notes, I read the following java code snippet for reversing a string, which is claimed to have quadratic time complexity. It seems to me that the “for” loop for i just iterates the whole length of s. How does it cause a quadratic time complexity?
public static String reverse(String s)
{
String rev = new String();
for (int i = (s.length()-1); i>=0; i--) {
rev = rev.append(s.charAt(i));
}
return rev.toString();
}
public static String reverse(String s)
{
String rev = " ";
for (int i=s.length()-1; i>=0; i--)
rev.append(s.charAt(i); // <--------- This is O(n)
Return rev.toString();
}
I copy pasted your code. I'm not sure where you get this but actually String doesn't have append method. Maybe rev is a StringBuilder or another Appendable.
Possibly because the append call does not execute in constant time. If it's linear with the length of the string, that would explain it.
append has to find the end of the string, which is Ο(n). So, you have an Ο(n) loop executed Ο(n) times.
I don't think String has an append method. So, this code won't compile.
But, coming to the problem of quadratic complexity, let us assume that you are actually appending the string with a character using '+' operator or the String.concat() method.
The String objects are immutable. So, whenever you append to a string, a new string of bigger length is created, old string contents are copied to it and then the final character is appended, and the previous string is destroyed. So, this process takes more and more time as the string grows.
The appending loop takes O(n) time but for every loop you take O(n) time to copy the string character by character. This leads to quadratic complexity.
It would be better to use StringBuilder or StringBuffer. However, I guess the time complexity you mentioned would be with older java compilers. But, new advanced compilers would actually optimize the '+' operation with StringBuilder.