Why java Set.contains() is faster than String.contains()? - java

For a problem to find common characters between 2 strings, at first I used the straight forward String.contains() method:
static String twoStrings(String s1, String s2) {
boolean subStringFound = false;
for(int i = 0; i < s2.length(); i++){
if(s1.contains(Character.toString(s2.charAt(i)))) {
subStringFound = true;
break;
}
}
return subStringFound?"YES":"NO";
}
However, it passed most of the test cases 5/7 test cases, but faced time-out for 2 cases which were really long strings.
Then I tried with Set.contains():
static String twoStrings(String s1, String s2) {
boolean subStringFound = false;
HashSet<Character> set = new HashSet<>();
for(int i = 0; i < s1.length(); i++){
set.add(s1.charAt(i));
}
for(int i = 0; i < s2.length(); i++){
if(set.contains(s2.charAt(i))) {
subStringFound = true;
break;
}
}
return subStringFound?"YES":"NO";
}
And despite I'm running an additional loop to create a Set, it passed all the tests.
What's the main reason for this significant difference in runtime.

Because they are different data structures, and the contains method is implemented differently on them.
A string is a sequence of characters, so to test whether it contains a given character, you have to look at each character in the sequence and compare it. This algorithm is called linear search, and it takes O(n) time where n is the number of characters, meaning it takes proportionally more time when there are more characters.
A HashSet is a kind of hash table data structure. Basically, to test whether it contains a given character, you take the hash of that character, use the hash as an index in an array, and either the character is there (or very near to there), or it isn't. So you don't have to search the whole set; it takes O(1) time on average, meaning the time is roughly the same however many characters there are.

You'd have to look at the implementation in the JDK being used, but most likely String.contains is a linear search but HashSet.contains is not. From the HashSet documentation:
This class implements the Set interface, backed by a hash table (actually a HashMap instance)...
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.

Related

Count the Characters in a String Recursively & treat "eu" as a Single Character

I am new to Java, and I'm trying to figure out how to count Characters in the given string and threat a combination of two characters "eu" as a single character, and still count all other characters as one character.
And I want to do that using recursion.
Consider the following example.
Input:
"geugeu"
Desired output:
4 // g + eu + g + eu = 4
Current output:
2
I've been trying a lot and still can't seem to figure out how to implement it correctly.
My code:
public static int recursionCount(String str) {
if (str.length() == 1) {
return 0;
}
else {
String ch = str.substring(0, 2);
if (ch.equals("eu") {
return 1 + recursionCount(str.substring(1));
}
else {
return recursionCount(str.substring(1));
}
}
}
OP wants to count all characters in a string but adjacent characters "ae", "oe", "ue", and "eu" should be considered a single character and counted only once.
Below code does that:
public static int recursionCount(String str) {
int n;
n = str.length();
if(n <= 1) {
return n; // return 1 if one character left or 0 if empty string.
}
else {
String ch = str.substring(0, 2);
if(ch.equals("ae") || ch.equals("oe") || ch.equals("ue") || ch.equals("eu")) {
// consider as one character and skip next character
return 1 + recursionCount(str.substring(2));
}
else {
// don't skip next character
return 1 + recursionCount(str.substring(1));
}
}
}
Recursion explained
In order to address a particular task using Recursion, you need a firm understanding of how recursion works.
And the first thing you need to keep in mind is that every recursive solution should (either explicitly or implicitly) contain two parts: Base case and Recursive case.
Let's have a look at them closely:
Base case - a part that represents a simple edge-case (or a set of edge-cases), i.e. a situation in which recursion should terminate. The outcome for these edge-cases is known in advance. For this task, base case is when the given string is empty, and since there's nothing to count the return value should be 0. That is sufficient for the algorithm to work, outcomes for other inputs should be derived from the recursive case.
Recursive case - is the part of the method where recursive calls are made and where the main logic resides. Every recursive call eventually hits the base case and stars building its return value.
In the recursive case, we need to check whether the given string starts from a particular string like "eu". And for that we don't need to generate a substring (keep in mind that object creation is costful). instead we can use method String.startsWith() which checks if the bytes of the provided prefix string match the bytes at the beginning of this string which is chipper (reminder: starting from Java 9 String is backed by an array of bytes, and each character is represented either with one or two bytes depending on the character encoding) and we also don't bother about the length of the string because if the string is shorter than the prefix startsWith() will return false.
Implementation
That said, here's how an implementation might look:
public static int recursionCount(String str) {
if(str.isEmpty()) {
return 0;
}
return str.startsWith("eu") ?
1 + recursionCount(str.substring(2)) : 1 + recursionCount(str.substring(1));
}
Note: that besides from being able to implement a solution, you also need to evaluate it's Time and Space complexity.
In this case because we are creating a new string with every call time complexity is quadratic O(n^2) (reminder: creation of the new string requires allocating the memory to coping bytes of the original string). And worse case space complexity also would be O(n^2).
There's a way of solving this problem recursively in a linear time O(n) without generating a new string at every call. For that we need to introduce the second argument - current index, and each recursive call should advance this index either by 1 or by 2 (I'm not going to implement this solution and living it for OP/reader as an exercise).
In addition
In addition, here's a concise and simple non-recursive solution using String.replace():
public static int count(String str) {
return str.replace("eu", "_").length();
}
If you would need handle multiple combination of character (which were listed in the first version of the question) you can make use of the regular expressions with String.replaceAll():
public static int count(String str) {
return str.replaceAll("ue|au|oe|eu", "_").length();
}

ArrayList vs HashMap time complexity

The scenario is the following:
You have 2 strings (s1, s2) and want to check whether one is a permutation of the other so you generate all permutations of lets say s1 and store them and then iterate over and compare against s2 until either it's found or not.
Now, in this scenario, i am deliberating whether an ArrayList is better to use or a HashMap when considering strictly time complexity as i believe both have O(N) space complexity.
According to the javadocs, ArrayList has a search complexity of O(N) whereas HashMap is O(1). If this is the case, is there any reason to favor using ArrayList over HashMap here since HashMap would be faster?
The only potential downside i could think of is that your (k,v) pairs might be a bit weird if you did something like where the key = value, i.e. {k = "ABCD", v = "ABCD"}, etc..
As shown here:
import java.io.*;
import java.util.*;
class GFG{
static int NO_OF_CHARS = 256;
/* function to check whether two strings
are Permutation of each other */
static boolean arePermutation(char str1[], char str2[])
{
// Create 2 count arrays and initialize
// all values as 0
int count1[] = new int [NO_OF_CHARS];
Arrays.fill(count1, 0);
int count2[] = new int [NO_OF_CHARS];
Arrays.fill(count2, 0);
int i;
// For each character in input strings,
// increment count in the corresponding
// count array
for (i = 0; i <str1.length && i < str2.length ;
i++)
{
count1[str1[i]]++;
count2[str2[i]]++;
}
// If both strings are of different length.
// Removing this condition will make the program
// fail for strings like "aaca" and "aca"
if (str1.length != str2.length)
return false;
// Compare count arrays
for (i = 0; i < NO_OF_CHARS; i++)
if (count1[i] != count2[i])
return false;
return true;
}
/* Driver program to test to print printDups*/
public static void main(String args[])
{
char str1[] = ("geeksforgeeks").toCharArray();
char str2[] = ("forgeeksgeeks").toCharArray();
if ( arePermutation(str1, str2) )
System.out.println("Yes");
else
System.out.println("No");
}
}
// This code is contributed by Nikita Tiwari.
If you're glued to your implementation, use a HashSet, it still has O(1) lookup time, just without keys
You can use HashSet as you need only one parameter.

Dynamic Programming approach - Interleaving Parentheses

Below is my code for the problem described on https://community.topcoder.com/stat?c=problem_statement&pm=14635. It keeps track of possible interleaves (as described in the problem description given) through a static variable countPossible.
public class InterleavingParentheses{
public static int countPossible = 0;
public static Set<String> dpyes = new HashSet<>(); //used for dp
public static Set<String> dpno = new HashSet<>(); //used for dp
public static void numInterleaves(char[] s1, char[] s2, int size1, int size2){
char[] result = new char[size1+size2];
numInterleavesHelper(result,s1,s2,size1,size2,0,0,0);
}
public static void numInterleavesHelper(char[] res, char[] s1, char[] s2, int size1, int size2, int pos, int start1, int start2){
if (pos == size1+size2){
if (dpyes.contains(new String(res))){
countPossible+=1;
}
else{
if(dpno.contains(new String(res))){
countPossible+=0;
}
else if (isValid(res)){
dpyes.add(new String(res));
countPossible+=1;
}
else{
dpno.add(new String(res));
}
}
}
if (start1 < size1){
res[pos] = s1[start1];
numInterleavesHelper(res,s1,s2,size1,size2,pos+1,start1+1,start2);
}
if (start2 < size2){
res[pos] = s2[start2];
numInterleavesHelper(res,s1,s2,size1,size2,pos+1,start1,start2+1);
}
}
private static boolean isValid(char[] string){
//basically checking to see if parens are balanced
LinkedList<Character> myStack = new LinkedList<>();
for (int i=0; i<string.length; i++){
if (string[i] == "(".charAt(0)){
myStack.push(string[i]);
}
else{
if (myStack.isEmpty()){
return false;
}
if (string[i] == ")".charAt(0)){
myStack.pop();
}
}
}
return myStack.isEmpty();
}
}
I use the scanner class to put in the input strings s1 = "()()()()()()()()()()()()()()()()()()()()" and s2 = "()()()()()()()()()()()()()()()()()" into this function and while the use of the HashSet greatly lowers the time because duplicate interleaves are accounted for, large input strings still take up a lot of time. The sizes of the input strings are supposed to be at most 2500 characters and my code is not working for strings that long. How can i modify this to make it better?
Your dp set is only used at the end, so at best you can save an O(n), but you've already done many O(n) operations to reach that point so the algorithm completexity is about the same. For dp to be effective, you need to be reducing O(2^n) operations to, say O(n^2).
As one of the testcases has an answer of 487,340,184, then for your program to produce this answer, it would need that number of calls to numInterleavesHelper because each call can only increment countPossible by 1. The question asking for the answer "modulo 10^9 + 7" as well indicates that there is a large number expected as an answer.
This rules out things like creating every possible resulting string, most string manipulation, and counting 1 string at a time. Even if you optimized it, then the number of iterations alone makes it unfeasible.
Instead, think of algorithms that have about 10,000,000 iterations. Each string has a length of 2500. These constraints were chosen on purpose so that 2500 * 2500 fits within this number of iterations, suggesting a 2D dp solution.
If you create an array:
int ways[2501][2501] = new int[2501][2501];
then you want the answer to be:
ways[2500][2500]
Here ways[x][y] is the number of ways of creating valid strings where x characters have been taken from the first string, and y characters have been taken from the second string. Each time you add a character, you have 2 choices, taking from the first string or taking from the second. The new number of ways is the sum of the previous ones, so:
ways[x][y] = ways[x-1][y] + ways[x][y-1]
You also need to check that each string is valid. They're valid if each time you add a character, the number of opening parens minus the number of closing parens is 0 or greater, and this number is 0 at the end. The number of parens of each type in every prefix of s1 and s2 can be precalculated to make this a constant-time check.

What should be the logic of hashfunction() in order to check that two strings are anagrams or not?

I want to write a function that takes string as a parameter and returns a number corresponding to that string.
Integer hashfunction(String a)
{
//logic
}
Actually the question im solving is as follows :
Given an array of strings, return all groups of strings that are anagrams. Represent a group by a list of integers representing the index in the original list.
Input : cat dog god tca
Output : [[1, 4], [2, 3]]
Here is my implementation :-
public class Solution {
Integer hashfunction(String a)
{
int i=0;int ans=0;
for(i=0;i<a.length();i++)
{
ans+=(int)(a.charAt(i));//Adding all ASCII values
}
return new Integer(ans);
}
**Obviously this approach is incorrect**
public ArrayList<ArrayList<Integer>> anagrams(final List<String> a) {
int i=0;
HashMap<String,Integer> hashtable=new HashMap<String,Integer>();
ArrayList<Integer> mylist=new ArrayList<Integer>();
ArrayList<ArrayList<Integer>> answer=new ArrayList<ArrayList<Integer>>();
if(a.size()==1)
{
mylist.add(new Integer(1));
answer.add(mylist);
return answer;
}
int j=1;
for(i=0;i<a.size()-1;i++)
{
hashtable.put(a.get(i),hashfunction(a.get(i)));
for(j=i+1;j<a.size();j++)
{
if(hashtable.containsValue(hashfunction(a.get(j))))
{
mylist.add(new Integer(i+1));
mylist.add(new Integer(j+1));
answer.add(mylist);
mylist.clear();
}
}
}
return answer;
}
}
Oh boy... there's quite a bit of stuff that's open for interpretation here. Case-sensitivity, locales, characters allowed/blacklisted... There are going to be a lot of ways to answer the general question. So, first, let me lay down a few assumptions:
Case doesn't matter. ("Rat" is an anagram of "Tar", even with the capital lettering.)
Locale is American English when it comes to the alphabet. (26 letters from A-Z. Compare this to Spanish, which has 28 IIRC, among which 'll' is considered a single letter and a potential consideration for Spanish anagrams!)
Whitespace is ignored in our definition of an anagram. ("arthas menethil" is an anagram of "trash in a helmet" even though the number of whitespaces is different.)
An empty string (null, 0-length, all white-space) has a "hash" (I prefer the term "digest", but a name is a name) of 1.
If you don't like any of those assumptions, you can modify them as you wish. Of course, that will result in the following algorithm being slightly different, but they're a good set of guidelines that will make the general algorithm relatively easy to understand and refactor if you wish.
Two strings are anagrams if they are exhaustively composed of the same set of characters and the same number of each included character. There's a lot of tools available in Java that makes this task fairly simple. We have String methods, Lists, Comparators, boxed primitives, and existing hashCode methods for... well, all of those. And we're going to use them to make our "hash" method.
private static int hashString(String s) {
if (s == null) return 0; // An empty/null string will return 0.
List<Character> charList = new ArrayList<>();
String lowercase = s.toLowerCase(); // This gets us around case sensitivity
for (int i = 0; i < lowercase.length(); i++) {
Character c = Character.valueOf(lowercase.charAt(i));
if (Character.isWhitespace(c)) continue; // spaces don't count
charList.add(c); // Note the character for future processing...
}
// Now we have a list of Characters... Sort it!
Collections.sort(charList);
return charList.hashCode(); // See contract of java.util.List#haschCode
}
And voila; you have a method that can digest a string and produce an integer representing it, regardless of the order of the characters within. You can use this as the basis for determining whether two strings are anagrams of each other... but I wouldn't. You asked for a digest function that produces an Integer, but keep in mind that in java, an Integer is merely a 32-bit value. This method can only produce about 4.2-billion unique values, and there are a whole lot more than 4.2-billion strings you can throw at it. This method can produce collisions and give you nonsensical results. If that's a problem, you might want to consider using BigInteger instead.
private static BigInteger hashString(String s) {
BigInteger THIRTY_ONE = BigInteger.valueOf(31); // You should promote this to a class constant!
if (s == null) return BigInteger.ONE; // An empty/null string will return 1.
BigInteger r = BigInteger.ONE; // The value of r will be returned by this method
List<Character> charList = new ArrayList<>();
String lowercase = s.toLowerCase(); // This gets us around case sensitivity
for (int i = 0; i < lowercase.length(); i++) {
Character c = Character.valueOf(lowercase.charAt(i));
if (Character.isWhitespace(c)) continue; // spaces don't count
charList.add(c); // Note the character for future processing...
}
// Now we have a list of Characters... Sort it!
Collections.sort(charList);
// Calculate our bighash, similar to how java's List interface does.
for (Character c : charList) {
int charHash = c.hashCode();
r=r.multiply(THIRTY_ONE).add(BigInteger.valueOf(charHash));
}
return r;
}
You need a number that is the same for all strings made up of the same characters.
The String.hashCode method returns a number that is the same for all strings made up of the same characters in the same order.
If you can sort all words consistently (for example: alphabetically) then String.hashCode will return the same number for all anagrams.
return String.valueOf(Arrays.sort(inputString.toCharArray())).hashCode();
Note: this will work for all words that are anagrams (no false negatives) but it may not work for all words that are not anagrams (possibly false positives). This is highly unlikely for short words, but once you get to words that are hundreds of characters long, you will start encountering more than one set of anagrams with the same hash code.
Also note: this gives you the answer to the (title of the) question, but it isn't enough for the question you're solving. You need to figure out how to relate this number to an index in your original list.

Java indexOf function more efficient than Rabin-Karp? Search Efficiency of Text

I posed a question to Stackoverflow a few weeks ago about a creating an efficient algorithm to search for a pattern in a large chunk of text. Right now I am using the String function indexOf to do the search. One suggestion was to use Rabin-Karp as an alternative. I wrote a little test program as follows to test an implementation of Rabin-Karp as follows.
public static void main(String[] args) {
String test = "Mary had a little lamb whose fleece was white as snow";
String p = "was";
long start = Calendar.getInstance().getTimeInMillis();
for (int x = 0; x < 200000; x++)
test.indexOf(p);
long end = Calendar.getInstance().getTimeInMillis();
end = end -start;
System.out.println("Standard Java Time->"+end);
RabinKarp searcher = new RabinKarp("was");
start = Calendar.getInstance().getTimeInMillis();
for (int x = 0; x < 200000; x++)
searcher.search(test);
end = Calendar.getInstance().getTimeInMillis();
end = end -start;
System.out.println("Rabin Karp time->"+end);
}
And here is the implementation of Rabin-Karp that I am using:
import java.math.BigInteger;
import java.util.Random;
public class RabinKarp {
private String pat; // the pattern // needed only for Las Vegas
private long patHash; // pattern hash value
private int M; // pattern length
private long Q; // a large prime, small enough to avoid long overflow
private int R; // radix
private long RM; // R^(M-1) % Q
static private long dochash = -1L;
public RabinKarp(int R, char[] pattern) {
throw new RuntimeException("Operation not supported yet");
}
public RabinKarp(String pat) {
this.pat = pat; // save pattern (needed only for Las Vegas)
R = 256;
M = pat.length();
Q = longRandomPrime();
// precompute R^(M-1) % Q for use in removing leading digit
RM = 1;
for (int i = 1; i <= M - 1; i++)
RM = (R * RM) % Q;
patHash = hash(pat, M);
}
// Compute hash for key[0..M-1].
private long hash(String key, int M) {
long h = 0;
for (int j = 0; j < M; j++)
h = (R * h + key.charAt(j)) % Q;
return h;
}
// Las Vegas version: does pat[] match txt[i..i-M+1] ?
private boolean check(String txt, int i) {
for (int j = 0; j < M; j++)
if (pat.charAt(j) != txt.charAt(i + j))
return false;
return true;
}
// check for exact match
public int search(String txt) {
int N = txt.length();
if (N < M)
return -1;
long txtHash;
if (dochash == -1L) {
txtHash = hash(txt, M);
dochash = txtHash;
} else
txtHash = dochash;
// check for match at offset 0
if ((patHash == txtHash) && check(txt, 0))
return 0;
// check for hash match; if hash match, check for exact match
for (int i = M; i < N; i++) {
// Remove leading digit, add trailing digit, check for match.
txtHash = (txtHash + Q - RM * txt.charAt(i - M) % Q) % Q;
txtHash = (txtHash * R + txt.charAt(i)) % Q;
// match
int offset = i - M + 1;
if ((patHash == txtHash) && check(txt, offset))
return offset;
}
// no match
return -1; // was N
}
// a random 31-bit prime
private static long longRandomPrime() {
BigInteger prime = new BigInteger(31, new Random());
return prime.longValue();
}
// test client
}
The implementation of Rabin-Karp works in that it returns the correct offset of the string I am looking for. What is surprising to me though is the timing statistics that occurred when I ran the test program. Here they are:
Standard Java Time->39
Rabin Karp time->409
This was really surprising. Not only is Rabin-Karp (at least as it is implemented here) not faster than the standard java indexOf String function, it is slower by an order of magnitude. I don't know what is wrong (if anything). Any one have thoughts on this?
Thanks,
Elliott
I answered this question earlier and Elliot pointed out I was just plain wrong. I apologise to the community.
There is nothing magical about the String.indexOf code. It is not natively optimised or anything like that. You can copy the indexOf method from the String source code and it runs just as quickly.
What we have here is the difference between O() efficiency and actual efficiency. Rabin-Karp for a String of length N and a pattern of length M, Rabin-Karp is O(N+M) and a worst case of O(NM). When you look into it, String.indexOf() also has a best case of O(N+M) and a worst case of O(NM).
If the text contains many partial matches to the start of the pattern Rabin-Karp will stay close to its best-case performance, whilst String.indexOf will not. For example I tested the above code (properly this time :-)) on a million '0's followed by a single '1', and the searched for 1000 '0's followed by a single '1'. This forced the String.indexOf to its worst case performance. For this highly degenerate test, the Rabin-Karp algorithm was about 15 times faster than indexOf.
For natural language text, Rabin-Karp will remain close to best-case and indexOf will only deteriorate slightly. The deciding factor is therefore the complexity of operations performed on each step.
In it's innermost loop, indexOf scans for a matching first character. At each iteration is has to:
increment the loop counter
perform two logical tests
do one array access
In Rabin-Karp each iteration has to:
increment the loop counter
perform two logical tests
do two array accesses (actually two method invocations)
update a hash, which above requires 9 numerical operations
Therefore at each iteration Rabin-Karp will fall further and further behind. I tried simplifying the hash algorithm to just XOR characters, but I still had an extra array access and two extra numerical operations so it was still slower.
Furthermore, when a match is find, Rabin-Karp only knows the hashes match and must therefore test every character, whereas indexOf already knows the first character matches and therefore has one less test to do.
Having read on Wikipedia that Rabin-Karp is used to detect plagiarism, I took the Bible's Book of Ruth, removed all punctuation and made everything lower case which left just under 10000 characters. I then searched for "andthewomenherneighboursgaveitaname" which occurs near the very end of the text. String.indexOf was still faster, even with just the XOR hash. However, if I removed String.indexOfs advantage of being able to access String's private internal character array and forced it to copy the character array, then, finally, Rabin-Karp was genuinely faster.
However, I deliberately chose that text as there are 213 "and"s in the Book of Ruth and 28 "andthe"s. If instead I searched just for the last characters "ursgaveitaname", well there are only 3 "urs"s in the text so indexOf returns closer to its best-case and wins the race again.
As a fairer test I chose random 20 character strings from the 2nd half of the text and timed them. Rabin-Karp was about 20% slower than the indexOf algorithm run outside of the String class, and 70% slower than the actual indexOf algorithm. Thus even in the use case it is supposedly appropriate for, it was still not the best choice.
So what good is Rabin-Karp? No matter the length or nature of the text to be searched, at every character compared it will be slower. No matter what hash function we choose we are surely required to make an additional array access and at least two numerical operations. A more complex hash function will give us less false matches, but require more numerical operators. There is simply no way Rabin-Karp can ever keep up.
As demonstrated above, if we need to find a match prefixed by an often repeated block of text, indexOf can be slower, but if we know we are doing that it would look like we would still be better off using indexOf to search for the text without the prefix and then check to see if the prefix was present.
Based on my investigations today, I cannot see any time when the additional complexity of Rabin Karp will pay off.
Here is the source to java.lang.String. indexOf is line 1770.
My suspicion is since you are using it on such a short input string, the extra overhead of the Rabin-Karp algorithm over the seemly naive implementation of java.lang.String's indexOf, you aren't seeing the true performance of the algorithm. I would suggest trying it on a much longer input string to compare performance.
From my understanding, Rabin Karp is best used when searching a block of text for mutiple words/phrases.
Think about a bad word search, for flagging abusive language.
If you have a list of 2000 words, including derivations, then you would need to call indexOf 2000 times, one for each word you are trying to find.
RabinKarp helps with this by doing the search the other way around.
Make a 4 character hash of each of the 2000 words, and put that into a dictionary with a fast lookup.
Now, for each 4 characters of the search text, hash and check against the dictionary.
As you can see, the search is now the other way around - we're searching the 2000 words for a possible match instead.
Then we get the string from the dictionary and do an equals to check to be sure.
It's also a faster search this way, because we're searching a dictionary instead of string matching.
Now, imagine the WORST case scenario of doing all those indexOf searches - the very LAST word we check is a match ...
The wikipedia article for RabinKarp even mentions is inferiority in the situation you describe. ;-)
http://en.wikipedia.org/wiki/Rabin-Karp_algorithm
But this is only natural to happen!
Your test input first of all is too trivial.
indexOf returns the index of was searching a small buffer (String's internal char array`) while the Rabin-Karp has to do preprocessing to setup its data to work which takes extra time.
To see a difference you would have to test in a really large text to find expressions.
Also please note that when using more sofisticated string search algorithm they can have "expensive" setup/preprocessing to provide the really fast search.
In your case you just search a was in a sentence. I any case you should always take the input into account
Without looking into details, two reasons come to my mind:
you are very likely to outperform standard API implementations only for very special cases. I do not consider "Mary had a little lamb whose fleece was white as snow" to be such.
microbenchmarking is very difficult and can give quite misleading results. Here is an explanation, here a list of tools you could use
Not only simply try a longer static string, but try generating random long strings and inserting the search target into a random location each time. Without randomizing it, you will see a fixed result for indexOf.
EDITED:
Random is the wrong concept. Most text is not truly random. But you would need a lot of different long strings to be effective, and not just testing the same String multiple times. I am sure there are ways to extract "random" large Strings from an even larger text source, or something like that.
For this kind of search, Knuth-Morris-Pratt may perform better. In particular if the sub-string doesn't just repeat characters, then KMP should outperform indexOf(). Worst case (string of all the same characters) it will be the same.

Categories