I have a method that searches for an element using binary search. I have created two test methods to test it. The first one asserts that the index returned by the method is the same as the index I give during the test. And the second test method asserts that the method returns -1 when the element is not in the array.
Are those two methods sufficient?
I am using Java
How about these cases.
The element occurs twice in the array.
If the array stores objects (rather than primitives), then consider the case when the array has
an element that is equal to the object that your searching for, but
is not the same object.
Searching for null.
When the array has no elements.
Searching for an element that's greater than everything in the array.
Searching for an element that's less than everything in the array.
Your test results should coincide with the business and functional requirements of your application. It sounds like you've covered the "happy path" scenario. Now you'll want to focus on edge cases which may include something like having empty strings to search against or when searching for a "-1".
Related
What is difference between a.remove(a.size()-1) and a.remove(a.indexOf(a.lastElement())) of Vectors Class in Java? Do they remove the same element?
a.remove(a.indexOf(a.lastElement())) gave me wrong output whereas a.remove(a.size()-1) is giving correct output.
Note: a is a Java Vector declared as
Vector<Integer> a = new Vector<Integer>();
a.remove(a.indexOf(a.lastElement())) is a very roundabout way of achieving roughly the same thing.
It gets the lasts element in the vector, tries to find the index of any element that is equal to it and then removes that element.
That is only roughly the same as a.remove(a.size()-1), because if the vector contains a second object that is equal to the last one (i.e. last.equals(otherElement) returns true), then that item will be removed instead.
a.remove(a.size()-1) is definitely the more correct (and faster) way to remove the last element.
I've just come across some odd behaviour I wouldn't expect from an ArrayList<String> in Java. This is coming, for sure, from my poor understanding of references in Java.
Let me show you this piece of code:
List<String> myList = new ArrayList<>();
myList.add("One");
myList.add("Two");
myList.add("Two");
myList.add("Three");
for (String s : myList){
System.out.println(myList.indexOf(s));
}
This piece of code provides the following output:
0
1
1
3
How come? I've added on purpose two Strings containing the same characters ("Two"), but the object itself shouldn't be the same. What am I misunderstanding here? I was expecting this other output:
0
1
2
3
ArrayList.indexOf() doesn't use reference equality to find the object. It uses the equals() method. Notice what the documentation says (emphasis mine):
returns the lowest index i such that (o==null ? get(i)==null : o.equals(get(i))), or -1 if there is no such index.
Thus, it will match on the first string that is logically equal.
EDIT:
Andremoniy's comment is absolutely right. In the case of strings literals, because they are interned, they will also happen to have the same reference. So your 2 strings "Two" are actually the same reference in this case.
System.out.println("Two" == "Two"); // will return true because they are the same reference.
It's simply because indexOf returns the first occurrence of the item in the list that is equal to the given string. See its documentation:
Returns the index of the first occurrence of the specified element in this list, or -1 if this list does not contain the element. More formally, returns the lowest index i such that (o==null ? get(i)==null : o.equals(get(i))), or -1 if there is no such index.
You'll have to note two points:
most probably you are using the same String-instance, since the constant "Two" gets interned, that is all occurences of this literal will refer to the same instance.
List.indexOf() doesn't compare items by == (that is object-identity) but using equals() - that is some class-defined way to compare two objects for equality (which makes perfect sense as otherwise you wouldn't be able to find something in the list unless you already have a reference to it). So even two different String-objects (e.g. created by new String("Two")) would still produce the same output.
For completeness the quote from the javadoc of indexOf(as already mentioned in the other answers:
returns the lowest index i such that (o==null ? get(i)==null :
o.equals(get(i))), or -1 if there is no such index.
Java doesn't allow you to make a distinction between these two, but you have stumbled across the difference between (and disparity between) a method and a function.
Simply put a method may change the state of an object. A function will not. So calling your method add(String) will change the state of the List. Specifically, it adds the String to the list. indexOf(String) however is not a method, it is a function. Now sure, Java calls them methods because... that's what they call them. And it's conceivable that the implementation --could-- change the state. But we know that it doesn't, by contract.
A function, given the same inputs (of which the current state of the underlying object is part of those inputs) will always return the same result. Always. That's what's great about a function. You can call a function (a true function) as many times as you want and always get the same result as long as your inputs and the underlying data haven't changed.
Some folks at MIT did research into the analysis of functions in Java (which to avoid confusion, they call "pure methods"). It would be nice if there was a framework that allowed you to specify that a particular method was indeed a function (or as they call it, was pure) and then have an analyzer make sure you didn't accidentally introduce a mutation into code protected by that annotation.
Can someone please explain In Java how do you find middle element of a linked list in single pass?
I have googled it, but cannot seem to find a simple explanation on how to code it.
LinkedList<String> list = new LinkedList<>();
list.add("foo");
list.add("bar");
list.add("baz");
String middle = list.get(list.size()/2);
System.out.println(middle); // bar
The call to assign middle will pass through half of the list during the get call.
As pointed out in the comments, the middle is the worst place to operate on a LinkedList. Consider using another variation, such as ArrayList.
I think this is a sort of trick question that you see on lists of possible interview questions.
One solution would be to have two pointers to step through the list, one taking steps of two and one taking steps of one.
When the pointer that is taking two steps at a time reaches the end of the list, the one taking only one step will be halfway through.
I doubt that this practice is really useful though..
good luck!
Since it's a LinkedList, you won't be able to find out its size until after the first (and only) pass. To find the middle element, you need to know two things; what index is at the middle, and what is the value of the element at that index. Finding the middle index is easy--just make one pass through the list, keeping a counter of how many nodes there are. As you do this, you'll need to keep track of each element in a separate data structure, perhaps an ArrayList, since you're only allowed one pass through the LinkedList. Once you're done, half the counter to find the middle index, and return the ArrayList element at that index.
The pseudo code looks like this:
int count
ArrayList elements
for each node in LinkedList:
count++
elements.append(node)
middleIndex = count/2
middleElement = elements.getIndex(middleIndex)
return middleElement
Of course, you'll need to take care of the case where there isn't a single middle element.
I have a large collection of Strings. I want to be able to find the Strings that begin with "Foo" or the Strings that end with "Bar". What would be the best Collection type to get the fastest results? (I am using Java)
I know that a HashSet is very fast for complete matches, but not for partial matches I would think? So, what could I use instead of just looping through a List? Should I look into LinkedList's or similar types? Are there any Collection Types that are optimized for this kind of queries?
The best collection type for this problem is SortedSet. You would need two of them in fact:
Words in regular order.
Words with their characters inverted.
Once these SortedSets have been created, you can use method subSet to find what you are looking for. For example:
Words starting with "Foo":
forwardSortedSet.subSet("Foo","Fop");
Words ending with "Bar":
backwardSortedSet.subSet("raB","raC");
The reason we are "adding" 1 to the last search character is to obtain the whole range. The "ending" word is excluded from the subSet, so there is no problem.
EDIT: Of the two concrete classes that implement SortedSet in the standard Java library, use TreeSet. The other (ConcurrentSkipListSet) is oriented to concurrent programs and thus not optimized for this situation.
It's been a while but I needed to implement this now and did some testing.
I already have a HashSet<String> as source so generation of all other datastructures is included in search time. 100 different sources are used and each time the data structures need to be regenerated. I only need to match a few single Strings each time. These tests ran on Android.
Methods:
Simple loop through HashSet and call endsWith() on
each string
Simple loop through HashSet and perform precompiled
Pattern match (regex) on each string.
Convert HashSet to single String joined by \n and
single match on whole String.
Generate SortedTree with reversed Strings from
HashSet. Then match with subset() as explained by #Mario Rossi.
Results:
Duration for method 1: 173ms (data setup:0ms search:173ms)
Duration for method 2: 6909ms (data setup:0ms search:6909ms)
Duration for method 3: 3026ms (data setup:2377ms search:649ms)
Duration for method 4: 2111ms (data setup:2101ms search:10ms)
Conclusion:
SortedSet/SortedTree is extremely fast in searching. Much faster than just looping through all Strings. However, creating the structure takes a lot of time. Regexes are much slower, but generating a single large String out of hundreds of Strings is more of a bottleneck on Android/Java.
If only a few matches need to be made, then you better loop through your collection. If you have much more matches to make it may be very useful to use a SortedTree!
If the list of words is stable (not many words are added or deleted), a very good second alternative is to create 2 lists:
One with the words in normal order.
The second with the characters in each word reversed.
For speed purposes, make them ArrayLists. Never LinkedLists or other variants which perform extremely bad on random access (the core of binary search; see below).
After the lists are created, they can be sorted with method Collections.sort (only once each) and then searched with Collections.binarySearch. For example:
Collections.sort(forwardList);
Collections.sort(backwardList);
And then to search for words starting in "Foo":
int i= Collections.binarySearch(forwardList,"Foo") ;
while( i < forwardList.size() && forwardList.get(i).startsWith("Foo") ) {
// Process String forwardList.get(i)
i++;
}
And words ending in "Bar":
int i= Collections.binarySearch(backwardList,"raB") ;
while( i < backwardList.size() && backwardList.get(i).startsWith("raB") ) {
// Process String backwardList.get(i)
i++;
}
I've got about 2500 short phrases in a file. I want to be able to find phrases as I type possible substrings of them. My app has a text box and a list of phrases. The text box is initially empty and the list contains all 2500 phrases, since the empty string is a substring of all of them. As I type in the text box, the list updates so that it always only contains phrases which contain the text box's value as a substring.
At the moment I have one of Google's Multimaps, specifically:
LinkedHashMultimap<String, String>
with every single possible substring mapped to its possible matches. This takes a while to load (about a second) and I think it must be taking up quite a bit of space (which may be a concern in the future.) It's very fast with the lookups though.
Is there a way I could do this with some other data structure or strategy that would be quicker to load and take less space (possibly at the expense of the speed of the lookups)?
If your list only contains 2500 elements, a simple loop and checking contains() on all of them should be fast enough.
If it grows bigger and/or is too slow, you can apply some easy optimizations:
Don't search immediately as the user types each character, but introduce some delay. So if he types "foobar" really fast, you only search for "foobar", not first "f" then "fo" then "foo",...
Reuse your previous results: if the user first types "foo" and then extends that to "foobar", don't search in the whole original list again, but search inside the results for "foo" (because everything that contains "foobar" must contain "foo").
In my experience, these these basic optimizations already get you quite far.
Now, if the list grows so big that even that is too slow, some "smarter" optimizations as proposed in other answers here (tries, suffix trees,...) would be needed.
You'll want to look into using the Trie data structure.
Try simply looping over the entire list and calling contains() - doing that 2500 times is probably completely unnoticeable.
You definetely need a Suffix Tree.. (wiki)
(i think this implementation could be ok: link)
EDIT:
I've read your comment, you shouldn't blindly check if the string is a substring somewhere in you phrase, you usually start with a word, not with a space. So maybe it's better to tokenize words inside your phrase?
Are you allowed to do it? Otherwise the best way is to build an automata for every phrase or using similar algorithms (for example the Karp-Rabin string search algorithm).
Wouter Coekaerts has a good approach, but I would go a bit further.
Don't bring up anything when the textbox contains a single character. The results won't be useful. You may find that this is true for two characters as well.
Precompute the results for two characters. When there are two characters bring up the precomputed list.
When a third character is added do the 'contains' search on the list you have currently displayed (anything that doesn't contain c1c2 can't contain c1c2c3). By now the list should be small enough that 'contains' has perfectly adequate performance.
Similarly for four characters etc.
As said above, put in a little delay before starting the search. Or better still arrange for a search to be killed if another character is typed before it finishes.