This question already has answers here:
C++ equivalent of StringBuffer/StringBuilder?
(10 answers)
Closed 9 years ago.
Consider this piece of code:
public String joinWords(String[] words) {
String sentence = "";
for(String w : words) {
sentence = sentence + w;
}
return sentence;
}
On each concatenation a new copy of the string is created, so that the overall complexity is O(n^2). Fortunately in Java we could solve this with a StringBuffer, which has O(1) complexity for each append, then the overall complexity would be O(n).
While in C++, std::string::append() has complexity of O(n), and I'm not clear about the complexity of stringstream.
In C++, are there methods like those in StringBuffer with the same complexity?
C++ strings are mutable, and pretty much as dynamically sizable as a StringBuffer. Unlike its equivalent in Java, this code wouldn't create a new string each time; it just appends to the current one.
std::string joinWords(std::vector<std::string> const &words) {
std::string result;
for (auto &word : words) {
result += word;
}
return result;
}
This runs in linear time if you reserve the size you'll need beforehand. The question is whether looping over the vector to get sizes would be slower than letting the string auto-resize. That, i couldn't tell you. Time it. :)
If you don't want to use std::string itself for some reason (and you should consider it; it's a perfectly respectable class), C++ also has string streams.
#include <sstream>
...
std::string joinWords(std::vector<std::string> const &words) {
std::ostringstream oss;
for (auto &word : words) {
oss << word;
}
return oss.str();
}
It's probably not any more efficient than using std::string, but it's a bit more flexible in other cases -- you can stringify just about any primitive type with it, as well as any type that has specified an operator <<(ostream&, its_type&) override.
This is somewhat tangential to your Question, but relevant nonetheless. (And too big for a comment!!)
On each concatenation a new copy of the string is created, so that the overall complexity is O(n^2).
In Java, the complexity of s1.concat(s2) or s1 + s2 is O(M1 + M2) where M1 and M2 are the respective String lengths. Turning that into the complexity of a sequence of concatenations is difficult in general. However, if you assume N concatenations of Strings of length M, then the complexity is indeed O(M * N * N) which matches what you said in the Question.
Fortunately in Java we could solve this with a StringBuffer, which has O(1) complexity for each append, then the overall complexity would be O(n).
In the StringBuilder case, the amortized complexity of N calls to sb.append(s) for strings of size M is O(M*N). The key word here is amortized. When you append characters to a StringBuilder, the implementation may need to expand its internal array. But the expansion strategy is to double the array's size. And if you do the math, you will see that each character in the buffer is going to be copied on average one extra time during the entire sequence of append calls. So the complexity of the entire sequence of appends still works out as O(M*N) ... and, as it happens M*N is the final string length.
So your end result is correct, but your statement about the complexity of a single call to append is not correct. (I understand what you mean, but the way you say it is facially incorrect.)
Finally, I'd note that in Java you should use StringBuilder rather than StringBuffer unless you need the buffer to be thread-safe.
As an example of a really simple structure that has O(n) complexity in C++11:
template<typename TChar>
struct StringAppender {
std::vector<std::basic_string<TChar>> buff;
StringAppender& operator+=( std::basic_string<TChar> v ) {
buff.push_back(std::move(v));
return *this;
}
explicit operator std::basic_string<TChar>() {
std::basic_string<TChar> retval;
std::size_t total = 0;
for( auto&& s:buff )
total+=s.size();
retval.reserve(total+1);
for( auto&& s:buff )
retval += std::move(s);
return retval;
}
};
use:
StringAppender<char> append;
append += s1;
append += s2;
std::string s3 = append;
This takes O(n), where n is the number of characters.
Finally, if you know how long all of the strings are, just doing a reserve with enough room makes append or += take a total of O(n) time. But I agree that is awkward.
Use of std::move with the above StringAppender (ie, sa += std::move(s1)) will significantly increase performance for non-short strings (or using it with xvalues etc)
I do not know the complexity of std::ostringstream, but ostringstream is for pretty printing formatted output, or cases where high performance is not important. I mean, they aren't bad, and they may even out perform scripted/interpreted/bytecode languages, but if you are in a rush, you need something else.
As usual, you need to profile, because constant factors are important.
A rvalue-reference-to-this operator+ might also be a good one, but few compilers implement rvalue references to this.
Related
say I use the following methods to search for a palindrome. I know the first one is O(n) because it goes through entire string. Does the .reverse() in the StringBuffer also do O(n)? Im not worried about finding a better way to problem Im trying to understand if the reverse method physically reverses the string or is it much more efficient than that?? Thanks
public static boolean isAPalindrome(String s1){
String tmp = "";
int length = s1.length();
for(int i = 0; i < s1.length(); i++){
tmp += s1.charAt(s1.length()-i-1);
}
if (s1.equals(tmp)) return true;
return false;
}
public static boolean isAPalindrome(String s1){
StringBuffer a = new StringBuffer(s1);
return s1.equals(a.reverse().toString());
}
You could implement a reverse that is effectively o(1), but not with the typical string classes: You could do it by implementing a string class with a direction member, that can take the values "forward" or "backward". But that would only make sense if the reverting dominates your performance over all other computations.
Reversing a string can't be faster than O(n) (under normal representation of a String).
You have to "operate" on every character of the string, so it can't theoretically be faster than that.
StringBuffer extends AbstractStringBuilder which implements reverse. You may look at the souce code yourself here.
Note: this is not saying that StringBuilder/Buffers reverse will run as fast as an O(n) reverse method that you will write. It will probably run much faster. But asymptotically it will still be O(n).
I have a code that check a string for space,comma and etc. Well since
I will deal a scenario where my app will going to check, lets say thousand of string with a max length of 15 and a minimum length of 14. I am worried if it will affect the performance since it is in android. Check the code i used..
private final static char[] undefinedChars = {' ','/','.','<','>','*','!'};
public static boolean checkMessage(String message){
if (message == null)
return false;
char[] _message = message.toCharArray();
for (char c : _message) {
for (int i = 0;i > undefinedChars.length;i++)
if (c == undefinedChars[i])
return true;
}
return false;
}
Is this correct? or there is a way to improve it?
There is a change that you could make that might make a little difference:
Change
char[] _message = message.toCharArray();
for (char c : _message) {
to
for (int i = 0; i < message.length(); i++) {
char c = message.charAt(i);
However, I doubt that it will be significant.
Replacing the inner loop with a switch is more likely to be fruitful, though it depends on what the JIT compiler does with the code. (And a switch will only works if the set of undefined characters can be hard-wired into the switch statement as compile-time constants.)
I am worried if it will affect the performance since it is in android.
Don't "worry". Approach the problem scientifically.
Implement the code and then benchmark it.
If the measured performance is a concern, then:
profile the code
look at hotspots, and identify possible improvements
implement and test possible improvement
rerun the benchmark to see if the improvement actually made any difference
repeat ... until performance is good enough or you run out of options.
The other thing to note is that the same code could well perform differently across different Android platforms. The quality of JIT compilers has (apparently) improved markedly in more recent releases.
I would argue that it is a bad idea to "bend" your code just to get it to run well on old phones. The chances are that the user will upgrade their hardware soon anyway ... and it is conceivable that your optimization for the old platform actually makes your code slower on a new platform ... 'cos your hand-optimizations have made the code too tricky for the JIT compiler's optimizer to deal with.
This is also an argument for NOT trying to make your code go "as fast as possible" ...
First of all, I see a bug there.
for (int i = 0;i > undefinedChars.length;i++)
that I think you meant
for (int i = 0;i < undefinedChars.length;i++)
instead?
Anyway it seems that your algorithm runs in O(m*n) where m is the length of message and n is the length of undefined chars(in this case fixed size, 15). Therefore it should be efficient in run-time analysis perspective.
I would profile the scenario first then decide how to improve it, that you could've sorted the message upfront somewhere then you can only check for either 1st char or the last char of the string, but as I said, only if that's been sorted elsewhere.
Or maybe think of parallelizing the routine. It should be straightforward.
Without using memory, you're about as fast as you can get. You can trade memory for performance. For example, you can put the characters you want to check into a HashMap. Then you can loop over the string you're checking, and check if each index is in that map or not. If the number of characters you want to check for is small, this will be less efficient. If the number is big, it will be more efficient (Technically this algorithm is O(n) instead of O(n*m), but if m is small then the constants you're usually taught to ignore will matter).
Another way is to use an array of booleans, with each possible character in the string mapping to an index in that array. Set only the characters you care about to true (and save that array). Then you can avoid the hash calculation above, but at the cost of a lot of memory.
Really, your original algorithm is likely good enough. But these (especially the hash map) are things you can consider if needed.
Try using a regular expression. I find it very clean and it should not hurt your performance.
public static boolean checkMessage(String message)
{
if (message == null)
return false;
String regex = " |\\.|/|<|>|\\*|!";
Matcher matcher = Pattern.compile(regex).matcher(message);
if (matcher.find())
return true;
else
return false;
}
For symmetry and possibly some compiler optimization, why not use a for-each style loop for both loops. As an additional benefit, you wouldn't risk a typo like the one pointed out by glaze. Your code would then become:
private final static char[] undefinedChars = {' ','/','.','<','>','*','!'};
public static boolean checkMessage(String message){
if (message == null)
return false;
char[] _message = message.toCharArray();
for (char c : _message) {
for (for u : undefinedChars)
if (c == u)
return true;
}
return false;
}
An additional optimization would be to order the characters in undefinedChars in the order most likely to occur. That way you'll bail-out as quick as possible.
Use a Set to hold your undefinedChars
Set<Character> undefinedChars = new HashSet<Character>(Arrays.asList(new Character(' ') ,new Character('/'),new Character('.')));
public boolean hasUndefinedChar(String str) {
for (int i = 0; i < str.length(); i++) {
char iChar = str.charAt(i);
Character charWrapper = new Character(iChar);
if (undefinedChars.contains(charWrapper)) {
return true;
}
}
return false;
}
This method is O(n) time efficient and does not sufficiently affect space complexity. The contains calls to the Set are O(1) operations and you make n of these contains calls in the worst case.
I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.
Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?
Example for StringBuilder:
StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
newString.setCharAt(i, 'X');
}
Example for char array conversion:
char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
myNameChars[i] = 'X';
}
myString = String.valueOf(newStringArray);
What are the pros/cons to each different way?
I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.
I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:
Sting s = "foo";
s += "bar";
s += "baz";
If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.
Also it's important to know that these strings are not being modified. In java, Strings are immutable.
This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.
What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.
As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.
If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.
If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.
On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.
For example:
public String toString() {
return "{field1 =" + this.field1 +
", field2 =" + this.field2 +
...
", field50 =" + this.field50 + "}";
}
Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.
String s = ...;
if (someCondition) {
s += someValue;
}
s += additionalValue;
return s;
Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)
String s = "";
for (final Object item : items) {
s += item + "\n";
}
Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.
Which option will perform the best is not an easy question.
I did a benchmark using Caliper:
RUNTIME (NS)
array 88
builder 126
builderTillEnd 76
concat 3435
Benchmarked methods:
public static String array(String input)
{
char[] result = input.toCharArray(); // COPYING
for (int i = 0; i < input.length(); i++)
{
result[i] = 'X';
}
return String.valueOf(result); // COPYING
}
public static String builder(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result.toString(); // COPYING
}
public static StringBuilder builderTillEnd(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result;
}
public static String concat(String input)
{
String result = "";
for (int i = 0; i < input.length(); i++)
{
result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
// result = new StringBuilder(result).append('X').toString();
}
return result;
}
Remarks
If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.
java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:
public void setCharAt(int index, char ch) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
value[index] = ch;
}
AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.
Conclusions
If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.
If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.
Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().
I'd prefer to use StringBuilder class where original string is modified.
For String manipulation, I like StringUtil class. You'll need to get Apache commons dependency to use it
I posed a question to Stackoverflow a few weeks ago about a creating an efficient algorithm to search for a pattern in a large chunk of text. Right now I am using the String function indexOf to do the search. One suggestion was to use Rabin-Karp as an alternative. I wrote a little test program as follows to test an implementation of Rabin-Karp as follows.
public static void main(String[] args) {
String test = "Mary had a little lamb whose fleece was white as snow";
String p = "was";
long start = Calendar.getInstance().getTimeInMillis();
for (int x = 0; x < 200000; x++)
test.indexOf(p);
long end = Calendar.getInstance().getTimeInMillis();
end = end -start;
System.out.println("Standard Java Time->"+end);
RabinKarp searcher = new RabinKarp("was");
start = Calendar.getInstance().getTimeInMillis();
for (int x = 0; x < 200000; x++)
searcher.search(test);
end = Calendar.getInstance().getTimeInMillis();
end = end -start;
System.out.println("Rabin Karp time->"+end);
}
And here is the implementation of Rabin-Karp that I am using:
import java.math.BigInteger;
import java.util.Random;
public class RabinKarp {
private String pat; // the pattern // needed only for Las Vegas
private long patHash; // pattern hash value
private int M; // pattern length
private long Q; // a large prime, small enough to avoid long overflow
private int R; // radix
private long RM; // R^(M-1) % Q
static private long dochash = -1L;
public RabinKarp(int R, char[] pattern) {
throw new RuntimeException("Operation not supported yet");
}
public RabinKarp(String pat) {
this.pat = pat; // save pattern (needed only for Las Vegas)
R = 256;
M = pat.length();
Q = longRandomPrime();
// precompute R^(M-1) % Q for use in removing leading digit
RM = 1;
for (int i = 1; i <= M - 1; i++)
RM = (R * RM) % Q;
patHash = hash(pat, M);
}
// Compute hash for key[0..M-1].
private long hash(String key, int M) {
long h = 0;
for (int j = 0; j < M; j++)
h = (R * h + key.charAt(j)) % Q;
return h;
}
// Las Vegas version: does pat[] match txt[i..i-M+1] ?
private boolean check(String txt, int i) {
for (int j = 0; j < M; j++)
if (pat.charAt(j) != txt.charAt(i + j))
return false;
return true;
}
// check for exact match
public int search(String txt) {
int N = txt.length();
if (N < M)
return -1;
long txtHash;
if (dochash == -1L) {
txtHash = hash(txt, M);
dochash = txtHash;
} else
txtHash = dochash;
// check for match at offset 0
if ((patHash == txtHash) && check(txt, 0))
return 0;
// check for hash match; if hash match, check for exact match
for (int i = M; i < N; i++) {
// Remove leading digit, add trailing digit, check for match.
txtHash = (txtHash + Q - RM * txt.charAt(i - M) % Q) % Q;
txtHash = (txtHash * R + txt.charAt(i)) % Q;
// match
int offset = i - M + 1;
if ((patHash == txtHash) && check(txt, offset))
return offset;
}
// no match
return -1; // was N
}
// a random 31-bit prime
private static long longRandomPrime() {
BigInteger prime = new BigInteger(31, new Random());
return prime.longValue();
}
// test client
}
The implementation of Rabin-Karp works in that it returns the correct offset of the string I am looking for. What is surprising to me though is the timing statistics that occurred when I ran the test program. Here they are:
Standard Java Time->39
Rabin Karp time->409
This was really surprising. Not only is Rabin-Karp (at least as it is implemented here) not faster than the standard java indexOf String function, it is slower by an order of magnitude. I don't know what is wrong (if anything). Any one have thoughts on this?
Thanks,
Elliott
I answered this question earlier and Elliot pointed out I was just plain wrong. I apologise to the community.
There is nothing magical about the String.indexOf code. It is not natively optimised or anything like that. You can copy the indexOf method from the String source code and it runs just as quickly.
What we have here is the difference between O() efficiency and actual efficiency. Rabin-Karp for a String of length N and a pattern of length M, Rabin-Karp is O(N+M) and a worst case of O(NM). When you look into it, String.indexOf() also has a best case of O(N+M) and a worst case of O(NM).
If the text contains many partial matches to the start of the pattern Rabin-Karp will stay close to its best-case performance, whilst String.indexOf will not. For example I tested the above code (properly this time :-)) on a million '0's followed by a single '1', and the searched for 1000 '0's followed by a single '1'. This forced the String.indexOf to its worst case performance. For this highly degenerate test, the Rabin-Karp algorithm was about 15 times faster than indexOf.
For natural language text, Rabin-Karp will remain close to best-case and indexOf will only deteriorate slightly. The deciding factor is therefore the complexity of operations performed on each step.
In it's innermost loop, indexOf scans for a matching first character. At each iteration is has to:
increment the loop counter
perform two logical tests
do one array access
In Rabin-Karp each iteration has to:
increment the loop counter
perform two logical tests
do two array accesses (actually two method invocations)
update a hash, which above requires 9 numerical operations
Therefore at each iteration Rabin-Karp will fall further and further behind. I tried simplifying the hash algorithm to just XOR characters, but I still had an extra array access and two extra numerical operations so it was still slower.
Furthermore, when a match is find, Rabin-Karp only knows the hashes match and must therefore test every character, whereas indexOf already knows the first character matches and therefore has one less test to do.
Having read on Wikipedia that Rabin-Karp is used to detect plagiarism, I took the Bible's Book of Ruth, removed all punctuation and made everything lower case which left just under 10000 characters. I then searched for "andthewomenherneighboursgaveitaname" which occurs near the very end of the text. String.indexOf was still faster, even with just the XOR hash. However, if I removed String.indexOfs advantage of being able to access String's private internal character array and forced it to copy the character array, then, finally, Rabin-Karp was genuinely faster.
However, I deliberately chose that text as there are 213 "and"s in the Book of Ruth and 28 "andthe"s. If instead I searched just for the last characters "ursgaveitaname", well there are only 3 "urs"s in the text so indexOf returns closer to its best-case and wins the race again.
As a fairer test I chose random 20 character strings from the 2nd half of the text and timed them. Rabin-Karp was about 20% slower than the indexOf algorithm run outside of the String class, and 70% slower than the actual indexOf algorithm. Thus even in the use case it is supposedly appropriate for, it was still not the best choice.
So what good is Rabin-Karp? No matter the length or nature of the text to be searched, at every character compared it will be slower. No matter what hash function we choose we are surely required to make an additional array access and at least two numerical operations. A more complex hash function will give us less false matches, but require more numerical operators. There is simply no way Rabin-Karp can ever keep up.
As demonstrated above, if we need to find a match prefixed by an often repeated block of text, indexOf can be slower, but if we know we are doing that it would look like we would still be better off using indexOf to search for the text without the prefix and then check to see if the prefix was present.
Based on my investigations today, I cannot see any time when the additional complexity of Rabin Karp will pay off.
Here is the source to java.lang.String. indexOf is line 1770.
My suspicion is since you are using it on such a short input string, the extra overhead of the Rabin-Karp algorithm over the seemly naive implementation of java.lang.String's indexOf, you aren't seeing the true performance of the algorithm. I would suggest trying it on a much longer input string to compare performance.
From my understanding, Rabin Karp is best used when searching a block of text for mutiple words/phrases.
Think about a bad word search, for flagging abusive language.
If you have a list of 2000 words, including derivations, then you would need to call indexOf 2000 times, one for each word you are trying to find.
RabinKarp helps with this by doing the search the other way around.
Make a 4 character hash of each of the 2000 words, and put that into a dictionary with a fast lookup.
Now, for each 4 characters of the search text, hash and check against the dictionary.
As you can see, the search is now the other way around - we're searching the 2000 words for a possible match instead.
Then we get the string from the dictionary and do an equals to check to be sure.
It's also a faster search this way, because we're searching a dictionary instead of string matching.
Now, imagine the WORST case scenario of doing all those indexOf searches - the very LAST word we check is a match ...
The wikipedia article for RabinKarp even mentions is inferiority in the situation you describe. ;-)
http://en.wikipedia.org/wiki/Rabin-Karp_algorithm
But this is only natural to happen!
Your test input first of all is too trivial.
indexOf returns the index of was searching a small buffer (String's internal char array`) while the Rabin-Karp has to do preprocessing to setup its data to work which takes extra time.
To see a difference you would have to test in a really large text to find expressions.
Also please note that when using more sofisticated string search algorithm they can have "expensive" setup/preprocessing to provide the really fast search.
In your case you just search a was in a sentence. I any case you should always take the input into account
Without looking into details, two reasons come to my mind:
you are very likely to outperform standard API implementations only for very special cases. I do not consider "Mary had a little lamb whose fleece was white as snow" to be such.
microbenchmarking is very difficult and can give quite misleading results. Here is an explanation, here a list of tools you could use
Not only simply try a longer static string, but try generating random long strings and inserting the search target into a random location each time. Without randomizing it, you will see a fixed result for indexOf.
EDITED:
Random is the wrong concept. Most text is not truly random. But you would need a lot of different long strings to be effective, and not just testing the same String multiple times. I am sure there are ways to extract "random" large Strings from an even larger text source, or something like that.
For this kind of search, Knuth-Morris-Pratt may perform better. In particular if the sub-string doesn't just repeat characters, then KMP should outperform indexOf(). Worst case (string of all the same characters) it will be the same.
From an online notes, I read the following java code snippet for reversing a string, which is claimed to have quadratic time complexity. It seems to me that the “for” loop for i just iterates the whole length of s. How does it cause a quadratic time complexity?
public static String reverse(String s)
{
String rev = new String();
for (int i = (s.length()-1); i>=0; i--) {
rev = rev.append(s.charAt(i));
}
return rev.toString();
}
public static String reverse(String s)
{
String rev = " ";
for (int i=s.length()-1; i>=0; i--)
rev.append(s.charAt(i); // <--------- This is O(n)
Return rev.toString();
}
I copy pasted your code. I'm not sure where you get this but actually String doesn't have append method. Maybe rev is a StringBuilder or another Appendable.
Possibly because the append call does not execute in constant time. If it's linear with the length of the string, that would explain it.
append has to find the end of the string, which is Ο(n). So, you have an Ο(n) loop executed Ο(n) times.
I don't think String has an append method. So, this code won't compile.
But, coming to the problem of quadratic complexity, let us assume that you are actually appending the string with a character using '+' operator or the String.concat() method.
The String objects are immutable. So, whenever you append to a string, a new string of bigger length is created, old string contents are copied to it and then the final character is appended, and the previous string is destroyed. So, this process takes more and more time as the string grows.
The appending loop takes O(n) time but for every loop you take O(n) time to copy the string character by character. This leads to quadratic complexity.
It would be better to use StringBuilder or StringBuffer. However, I guess the time complexity you mentioned would be with older java compilers. But, new advanced compilers would actually optimize the '+' operation with StringBuilder.