Binary Search - Error - java

I have a list of students, and i would like to sort them by the last name.
The student list looks a bit like this:
Amanda
Dorris
Tucker
Yasmin
Zara
I would like to use the binary search approach to search through these students and output the desired result.
This is what i have so far:
public void binarySearch(String keyword) {
int output;
if (fileSorted == false) {
System.out.println("The file " + fileName + " is not sorted. Please wait while it gets sorted...");
bubbleSort();
System.out.println("Thank you for your patience.");
System.out.println();
System.out.print("Search for: ");
keyword = elmo.nextLine();
output = doBinarySearch(keyword);
} else {
output = doBinarySearch(keyword);
}
System.out.println(output);
}
public int doBinarySearch(String keyword) {
int start = 0;
int end = numStudents - 1;
int mid;
int result;
while (start < end) {
mid = start + (end - start) / 2;
result = students[mid].returnLastName().compareToIgnoreCase(keyword);
if (result == 0) {
return mid;
} else if ((end - start) <= 1 ) {
return -1;
} else if (result > 0) {
start = mid;
} else if (result < 0) {
end = mid;
}
}
return -1;
}

The line
mid = ((end - start) / 2);
is wrong. You need to set mid to (roughly) the midpoint of start and end, so
mid = start + (end - start) / 2;
or
mid = (end + start) / 2;
if you're not afraid of overflow.
With what you have, mid is always in the first half of the array.
Also, you have your cases
} else if (result > 0) {
start = mid;
} else if (result < 0) {
end = mid;
}
wrong.
result = students[mid].returnLastName().compareToIgnoreCase(keyword);
returns a positive number when the last name of students[mid] is lexicographically greater than keyword, so then you need to change end, not start.

Instead of using inequality in your loop condition -- while (start != end) -- use while (start < end). This is the typical approach. When you test for equality, you make the assumption that start and end only change by one in each iteration, and that may not necessarily be true.

Related

Implementing binary search on an array of Strings

I'm having a bit of trouble with this. The input array is based on file input and the size of the array is specified by the first line in the file. The binarySearch method seems to look alright but it doesn't seem to be working would. Anybody be able to help? Thanks.
public static int binarySearch(String[] a, String x) {
int low = 0;
int high = a.length - 1;
int mid;
while (low <= high) {
mid = (low + high) / 2;
if (a[mid].compareTo(x) < 0) {
low = mid + 1;
} else if (a[mid].compareTo(x) > 0) {
high = mid - 1;
} else {
return mid;
}
}
return -1;
}
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
System.out.println("Enter the name of your file (including file extension): ");
String filename = input.next();
String[] numArray;
try (Scanner in = new Scanner(new File(filename))) {
int count = in.nextInt();
numArray = new String[count];
for (int i = 0; in.hasNextInt() && count != -1 && i < count; i++) {
numArray[i] = in.nextLine();
}
for (int i = 0; i < count; i++) //printing all the elements
{
System.out.println(numArray[i]);
}
String searchItem = "The";
System.out.println("The position of the String is:");
binarySearch(numArray, searchItem);
} catch (final FileNotFoundException e) {
System.out.println("That file was not found. Program terminating...");
e.printStackTrace();
}
}
I have added following example for your further referance.
import java.util.Arrays;
public class BinaryStringSearch {
public static void main(String[] args) {
String array[] ={"EWRSFSFSFSB","BB","AA","SDFSFJ","WRTG","FF","ERF","FED","TGH"};
String search = "BB";
Arrays.sort(array);
int searchIndex = binarySearch(array,search);
System.out.println(searchIndex != -1 ? array[searchIndex]+ " - Index is "+searchIndex : "Not found");
}
public static int binarySearch(String[] a, String x) {
int low = 0;
int high = a.length - 1;
int mid;
while (low <= high) {
mid = (low + high) / 2;
if (a[mid].compareTo(x) < 0) {
low = mid + 1;
} else if (a[mid].compareTo(x) > 0) {
high = mid - 1;
} else {
return mid;
}
}
return -1;
}
}
I hope it will help:
public static void main(String ar[]){
String str[] = {"account","angel","apple","application","black"};
String search= "application";
int length = str.length-1;
BinarySearchInString findStrin = new BinarySearchInString();
int i = findStrin.find(str, 0, length, search);
System.out.println(i);
}
public int find(String first[], int start, int end, String searchString){
int mid = start + (end-start)/2;
if(first[mid].compareTo(searchString)==0){
return mid;
}
if(first[mid].compareTo(searchString)> 0){
return find(first, start, mid-1, searchString);
}else if(first[mid].compareTo(searchString)< 0){
return find(first, mid+1, end, searchString);
}
return -1;
}
As spotted by #Mike Rylander you forgot to output the result.
You should use Arrays.binarySearch instead of your own implementation.
(As a general rule, JRE libraries are well-tested, well-documented and fast. I googled "java binary search" and this question is well-ranked. I had a try with #Thusitha Indunil's code, which didn't appear to work. I googled harder and found Arrays.binarySearch, which worked.)
I believe that your problem is that you forgot to output the results. Try replacing binarySearch(numArray, searchItem); with System.out.println(binarySearch(numArray, searchItem));
There are four issues that I can spot in addition to the already mentioned missing print of the result.
1: You have a String[] called numArray and you are searching for a String, "The" the name numArray is possibly a bit mis-leading.
2: I assume you have some sort of specified input file format where the number of String in the file are specified by an integer as the first token in the file. This is ok however, as a condition in the for loop that populates the numArray there is in.hasNextInt(), and the next token is taken out of the Scanner using in.nextLine(). You should use complementing check/removal methods such as in.hasNext() with in.next(). Check out the Scanner API.
3: The binarySearch(String[], String) method uses String.compareTo(String). This is determines a lexicographical ordering of this String to the parameter String. Trying to compare upper case to lower case may not yield what you expect, as "the".compareTo("The") will not result in 0. You should check out the String API for options to either force all of your input to one case, maybe while reading the file, or use a different flavor of a compare to method.
4: The last thing that I see may be considered a bit of a corner case, however with a sufficiently large String array, and a search string that may reside far in the right side, ie. high index side, of the array you may get an ArrayIndexOutOfBoundsException. This is because (low + high) can result in a negative value, when the result "should" be greater than Integer.MAX_VALUE. Then the result is divided by two and still yields a negative value. This can be solved by bit shifting the result instead of dividing by 2, (low + high) >>> 1. Joshua Bloch has a great article about this common flaw in divide and conquer algorithms.
int low = fromIndex;
int high = toIndex - 1;
while (low <= high) {
int mid = (low + high) >>> 1;
Comparable midVal = (Comparable) a[mid];
int cmp = midVal.compareTo(key);
if (cmp < 0)
low = mid + 1;
else if (cmp > 0)
high = mid - 1;
else
return mid;
}
return -(low + 1);
}

How to use Binary Search to find duplicates in sorted array?

I am attempting to expand a function to find the number of integer matches through Binary Search by resetting the high variable, but it gets stuck in a loop. I am guessing a workaround would be to duplicate this function to obtain the last index to determine the number of matches, but I do not think this would be such an elegant solution.
From this:
public static Matches findMatches(int[] values, int query) {
int firstMatchIndex = -1;
int lastMatchIndex = -1;
int numberOfMatches = 0;
int low = 0;
int mid = 0;
int high = values[values.length - 1];
boolean searchFirst = false;
while (low <= high){
mid = (low + high)/2;
if (values[mid] == query && firstMatchIndex == -1){
firstMatchIndex = mid;
if (searchFirst){
high = mid - 1;
searchFirst = false;
} else {
low = mid + 1;
}
} else if (query < values[mid]){
high = mid - 1;
} else {
low = mid + 1;
}
}
if (firstMatchIndex != -1) { // First match index is set
return new Matches(firstMatchIndex, numberOfMatches);
}
else { // First match index is not set
return new Matches(-1, 0);
}
}
To something like this:
public static Matches findMatches(int[] values, int query) {
int firstMatchIndex = -1;
int lastMatchIndex = -1;
int numberOfMatches = 0;
int low = 0;
int mid = 0;
int high = values[values.length - 1];
boolean searchFirst = false;
while (low <= high){
mid = (low + high)/2;
if (values[mid] == query && firstMatchIndex == -1){
firstMatchIndex = mid;
if (searchFirst){
high = values[values.length - 1]; // This is stuck in a loop
searchFirst = false;
}
} else if (values[mid] == query && lastMatchIndex == -1){
lastMatchIndex = mid;
if (!searchFirst){
high = mid - 1;
} else {
low = mid + 1;
}
} else if (query < values[mid]){
high = mid - 1;
} else {
low = mid + 1;
}
}
if (firstMatchIndex != -1) { // First match index is set
return new Matches(firstMatchIndex, numberOfMatches);
}
else { // First match index is not set
return new Matches(-1, 0);
}
}
There is a problem with your code:
high = values[values.length - 1];
should be
high = values.length - 1;
Also you do not need variables like numberOfMatches and searchFirst, we can have rather simple solution.
Now coming to the problem,I understand what you want I think Binary Search is appropriate for such query.
The Best way to do the required is once a match is found you just go forward and backward from that index until a mismatch occurs and this would be both elegant and efficient in calculating the firstMatchIndex and numberOfMatches.
So your function should be:
public static Matches findMatches(int[] values, int query)
{
int firstMatchIndex = -1,lastMatchIndex=-1;
int low = 0,mid = 0,high = values.length - 1;
while (low <= high)
{
mid = (low + high)/2;
if(values[mid]==query)
{
lastMatchIndex=mid;
firstMatchIndex=mid;
while(lastMatchIndex+1<values.length&&values[lastMatchIndex+1]==query)
lastMatchIndex++;
while(firstMatchIndex-1>=0&&values[firstMatchIndex-1]==query)
firstMatchIndex--;
return new Matches(firstMatchIndex,lastMatchIndex-firstMatchIndex+1);
}
else if(values[mid]>query)
high=mid-1;
else low=mid+1;
}
return new Matches(-1,0);
}
Couldn't you just use something like a set to find duplicates?
Something like this:
package example;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
public class DuplicatesExample {
public static void main(String[] args) {
String[] strings = { "one", "two", "two", "three", "four", "five", "six", "six" };
List<String> dups = getDups(strings);
System.out.println("DUPLICATES:");
for(String str : dups) {
System.out.println("\t" + str);
}
}
private static List<String> getDups(String[] strings) {
ArrayList<String> rtn = new ArrayList<String>();
HashSet<String> set = new HashSet<>();
for (String str : strings) {
boolean added = set.add(str);
if (added == false ) {
rtn.add(str);
}
}
return rtn;
}
}
Output:
DUPLICATES:
two
six
I have split your problems into two parts - using binary search to find a number and counting the number of matches. The first part is resolved by the search function while the second part is resolved by the findMatches function:
public static Matches findMatches(int[] values, int query) {
int leftIndex = -1;
int rightIndex = -1;
int high = values.length - 1;
int matchedIndex = search(values, 0, high, query);
//if at least one match
if (matchedIndex != -1) {
//decrement upper bound of left array
int leftHigh = matchedIndex - 1;
//increment lower bound of right array
int rightLow = matchedIndex + 1;
//loop until no more duplicates in left array
while (true) {
int leftMatchedIndex = search(values, 0, leftHigh, query);
//if duplicate found
if (leftMatchedIndex != -1) {
leftIndex = leftMatchedIndex;
//decrement upper bound of left array
leftHigh = leftMatchedIndex - 1;
} else {
break;
}
}
//loop until no more duplicates in right array
while(true){
int rightMatchedIndex = search(values, rightLow, high, query);
//if duplicate found
if(rightMatchedIndex != -1){
rightIndex = rightMatchedIndex;
//increment lower bound of right array
rightLow = rightMatchedIndex + 1;
} else{
break;
}
}
return new Matches(matchedIndex, rightIndex - leftIndex + 1);
}
return new Matches(-1, 0);
}
private static int search(int[] values, int low, int high, int query) {
while (low <= high) {
int mid = (low + high) / 2;
if (values[mid] == query) {
return mid;
} else if (query < values[mid]) {
high = mid - 1;
} else {
low = mid + 1;
}
}
return -1;
}
I found a solution after correcting a mistake with resetting high variable that caused an infinite loop.
public static Matches findMatches(int[] values, int query) {
int firstMatchIndex = -1;
int lastMatchIndex = -1;
int numberOfMatches = 0;
int low = 0;
int mid = 0;
int high = values.length - 1;
while (low <= high){
mid = (low + high)/2;
if (values[mid] == query && firstMatchIndex == -1){
firstMatchIndex = mid;
numberOfMatches++;
high = values.length - 1;
low = mid;
} else if (values[mid] == query && (lastMatchIndex == -1 || lastMatchIndex != -1)){
lastMatchIndex = mid;
numberOfMatches++;
if (query < values[mid]){
high = mid - 1;
} else {
low = mid + 1;
}
} else if (query < values[mid]){
high = mid - 1;
} else {
low = mid + 1;
}
}
if (firstMatchIndex != -1) { // First match index is set
return new Matches(firstMatchIndex, numberOfMatches);
}
else { // First match index is not set
return new Matches(-1, 0);
}
}
Not having any knowledge of the data other than sorted a priori is tough.
See this:
Binary Search O(log n) algorithm to find duplicate in sequential list?
This will find the first index of duplicates of k in a sorted array.
Of course this is related to knowing the value of duplicate first but very useful when it is known.
public static int searchFirstIndexOfK(int[] A, int k) {
int left = 0, right = A.length - 1, result = -1;
// [left : right] is the candidate set.
while (left <= right) {
int mid = left + ((right - left) >>> 1); // left + right >>> 1;
if (A[mid] > k) {
right = mid - 1;
} else if (A[mid] == k) {
result = mid;
right = mid - 1; // Nothing to the right of mid can be
// solution.
} else { // A[mid] < k
left = mid + 1;
}
}
return result;
}
This will find a dupe in log(n) time but is fragile in that the data must be sorted as well as increasing by 1 and in the range 1..n.
static int findeDupe(int[] array) {
int low = 0;
int high = array.length - 1;
while (low <= high) {
int mid = (low + high) >>> 1;
if (array[mid] == mid) {
low = mid + 1;
} else {
high = mid - 1;
}
}
System.out.println("returning" + high);
return high;
}

Java: Binary Search Implementation not working using Deferred detection of equality

I am trying to implement Binary Search Java version. From wikipedia http://en.wikipedia.org/wiki/Binary_search_algorithm#Deferred_detection_of_equality I noticed that deferred detection of equality version. It's working using that algorithm. However, when I was trying to change the if condition expression like this:
public int bsearch1(int[] numbers, int key, int start){
int L = start, R = numbers.length - 1;
while(L < R){
//find the mid point value
int mid = (L + R) / 2;
if (numbers[mid] >key){
//move to left
R = mid - 1;
} else{
// move to right, here numbers[mid] <= key
L = mid;
}
}
return (L == R && numbers[L] == key) ? L : -1;
}
It's not working properly, which goes into an infinity loop. Do you guys have any ideas about it? Thank you so much.
You've missed the effect of the assert in the Wiki you link to.
It states:
code must guarantee the interval is reduced at each iteration
You must exit if your mid >= R.
Added
The Wiki is actually a little misleading as it suggests that merely ensuring mid < r is sufficient - it is not. You must also guard against mid == min (say you have a 4 entry array and l = 2 and r = 3, mid would become 2 and stick there because 2 + 3 = 5 and 5 / 2 = 2 in integer maths).
The solution is to round up after the / 2 which can be easily achieved by:
int mid = (l + r + 1) / 2;
The final corrected and tidied code goes a little like this:
public int binarySearch(int[] numbers, int key, int start) {
int l = start, r = numbers.length - 1;
while (l < r) {
//find the mid point value
int mid = (l + r + 1) / 2;
if (numbers[mid] > key) {
//move to left
r = mid - 1;
} else {
// move to right, here numbers[mid] <= key
l = mid;
}
}
return (l == r && numbers[l] == key) ? l : -1;
}
public void test() {
int[] numbers = new int[]{1, 2, 5, 6};
for (int i = 0; i < 9; i++) {
System.out.println("Searching for " + i);
System.out.println("Found at " + binarySearch(numbers, i, 0));
}
}
There is a trivially similar algorithm here that suggests the correct approach looks more like:
public int binarySearch(int[] numbers, int key) {
int low = 0, high = numbers.length;
while (low < high) {
int mid = (low + high) / 2;
if (numbers[mid] < key) {
low = mid + 1;
} else {
high = mid;
}
}
return low < numbers.length && numbers[low] == key ? low : -1;
}
This takes a slightly different approach to the boundary conditions where high = max + 1 and also works perfectly.

Binary search implementation java algorithm

I need help with implementing the binary search algorithm, can someone tell me what's wrong with my code:
public int bsearch(Item idToSearch) {
int lowerBoundary = 0;
int upperBoundary = myStore.size() - 1;
int mid = -1;
while(upperBoundary >= lowerBoundary) {
mid = (lowerBoundary + upperBoundary) / 2;
//if element at middle is less than item to be searched, than set new lower boundary to mid
if(myStore.get(mid).compareTo(idToSearch) < 0) {
lowerBoundary = mid - 1;
} else {
upperBoundary = mid + 1;
}
} //end while loop
if(myStore.get(mid).equals(idToSearch)) {
return mid;
} else {
return -1; // item not found
}
} // end method
I think you made a mistake when update lowerBoundary and upperBoundary.
It may be:
if(myStore.get(mid).compareTo(idToSearch) < 0){
lowerBoundary = mid + 1;
} else {
upperBoundary = mid - 1;
}
And why don't you break the loop if you find the element at mid?
I would
stop when the lower and upper bound are the same.
stop when you find a match
use mid = (hi + lo) >>> 1; to avoid an overflow causing a bug.
read the code in the JDK which does this already as it works correctly.
The first issue is with the boundaries, also you should stop in case he found the value, but he doesn't which lead to possible overlook.
while(upperBoundary >= lowerBoundary)
{
mid = (lowerBoundary + upperBoundary) / 2;
if (myStore.get(mid).equals(idToSearch)) break; // goal
if(myStore.get(mid).compareTo(idToSearch) < 0)
{
lowerBoundary = mid + 1;
}
else
{
upperBoundary = mid - 1;
}
}

Efficient algo to find number of integers in a sorted array that are within a certain range in O(log(N)) time?

I came across a interview question that has to be done in O(logn)
Given a sorted integer array and a number, find the start and end indexes of the number in the array.
Ex1: Array = {0,0,2,3,3,3,3,4,7,7,9} and Number = 3 --> Output = {3,6}
Ex2: Array = {0,0,2,3,3,3,3,4,7,7,9} and Number = 5 --> Output = {-1,-1}
I am trying to find an efficient algo for this but so fat have not been successful.
You can use the concept of binary search to find the starting and ending index:
To find the starting index, you halve the array, if the value is equal to or greater than the input number, repeat with the lower half of the array, otherwise repeat with the higher half. stop when you reached an array of size 1.
To find the starting index, you halve the array, if the value is greater than the input number, repeat with the lower half of the array, otherwise repeat with the higher half. stop when you reached an array of size 1.
Note that when we reached an array of size 1, we may be one cell next to the input number, so we check if it equals the input number, if not, we fix the index by adding/decreasing 1 from the index we found.
findStartIndex(int[] A, int num)
{
int start = 0; end = A.length-1;
while (end != start)
{
mid = (end - start)/2;
if (A[mid] >= num)
end = mid;
else
start = mid;
}
if(A[start] == num)
return start;
else
return start+1;
}
findEndIndex(int[] A, int num)
{
int start = 0; end = A.length-1;
while (end != start)
{
mid = (end - start)/2;
if (A[mid] > num)
end = mid;
else
start = mid;
}
if(A[start] == num)
return start;
else
return start-1;
}
And the whole procedure:
int start = findStartIndex(A, num);
if (A[start]!=num)
{
print("-1,-1");
}
else
{
int end = findEndIndex(A, num);
print(start, end);
}
Sounds like a binary search -- log graphs iirc represent the effect of "halving" with each increment, which basically is binary search.
Pseudocode:
Set number to search for
Get length of array, check if number is at the half point
if the half is > the #, check the half of the bottom half. is <, do the inverse
repeat
if the half point is the #, mark the first time this happens as a variable storing its index
then repeat binary searches above , and then binary searches below (separately), such that you check for how far to the left/right it can repeat.
note*: and you sort binary left/right instead of just incrementally, in case your code is tested in a dataset with like 1,000,000 3's in a row or something
Is this clear enough to go from there?
The solution is to binary search the array concurrently (does't actually have to be concurrent :P ) at the start. The key is that the left and right searches are slightly different. For the right side if you encounter a dupe you have to search to the right, and for the left side if you encounter a dupe you search to the left. what you are searching for is the boundary so on the right side you check for.
yournum, not_yournum
This is the boundary and on the left side you just search for the boundary in the opposite direction. At the end return the indices of the boundaries.
Double binary search. You start with lower index = 0, upper index = length - 1. Then you check the point halfway and adjust your indexes accordingly.
The trick is that once you've found target, the pivot splits in two pivots.
Since no one has posted working code yet, I'll post some (Java):
public class DuplicateNumberRangeFinder {
public static void main(String[] args) {
int[] nums = { 0, 0, 2, 3, 3, 3, 3, 4, 7, 7, 9 };
Range range = findDuplicateNumberRange(nums, 3);
System.out.println(range);
}
public static Range findDuplicateNumberRange(int[] nums, int toFind) {
Range notFound = new Range(-1, -1);
if (nums == null || nums.length == 0) {
return notFound;
}
int startIndex = notFound.startIndex;
int endIndex = notFound.endIndex;
int n = nums.length;
int low = 0;
int high = n - 1;
while (low <= high) {
int mid = low + (high - low) / 2;
if (nums[mid] == toFind && (mid == 0 || nums[mid - 1] < toFind)) {
startIndex = mid;
break;
} else if (nums[mid] < toFind) {
low = mid + 1;
} else if (nums[mid] >= toFind) {
high = mid - 1;
}
}
low = 0;
high = n - 1;
while (low <= high) {
int mid = low + (high - low) / 2;
if (nums[mid] == toFind && (mid == n - 1 || nums[mid + 1] > toFind)) {
endIndex = mid;
break;
} else if (nums[mid] <= toFind) {
low = mid + 1;
} else if (nums[mid] > toFind) {
high = mid - 1;
}
}
return new Range(startIndex, endIndex);
}
private static class Range {
int startIndex;
int endIndex;
public Range(int startIndex, int endIndex) {
this.startIndex = startIndex;
this.endIndex = endIndex;
}
public String toString() {
return "[" + this.startIndex + ", " + this.endIndex + "]";
}
}
}
It may be error on my end, but Ron Teller's answer has an infinite loop when I've tested it. Here's a working example in Java, that can be tested here if you change the searchRange function to not be static.
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class RangeInArray {
// DO NOT MODIFY THE LIST
public static ArrayList<Integer> searchRange(final List<Integer> a, int b) {
ArrayList<Integer> range = new ArrayList<>();
int startIndex = findStartIndex(a, b);
if(a.get(startIndex) != b) {
range.add(-1);
range.add(-1);
return range;
}
range.add(startIndex);
range.add(findEndIndex(a, b));
return range;
}
public static int findStartIndex(List<Integer> a, int b) {
int midIndex = 0, lowerBound = 0, upperBound = a.size() - 1;
while(lowerBound < upperBound) {
midIndex = (upperBound + lowerBound) / 2;
if(b <= a.get(midIndex)) upperBound = midIndex - 1;
else lowerBound = midIndex + 1;
}
if(a.get(lowerBound) == b) return lowerBound;
return lowerBound + 1;
}
public static int findEndIndex(List<Integer> a, int b) {
int midIndex = 0, lowerBound = 0, upperBound = a.size() - 1;
while(lowerBound < upperBound) {
midIndex = (upperBound + lowerBound) / 2;
if(b < a.get(midIndex)) upperBound = midIndex - 1;
else lowerBound = midIndex + 1;
}
if(a.get(lowerBound) == b) return lowerBound;
return lowerBound - 1;
}
public static void main(String[] args) {
ArrayList<Integer> list = new ArrayList<>();
list.add(1);
list.add(1);
list.add(2);
list.add(2);
list.add(2);
list.add(2);
list.add(2);
list.add(2);
list.add(3);
list.add(4);
list.add(4);
list.add(4);
list.add(4);
list.add(5);
list.add(5);
list.add(5);
System.out.println("Calling search range");
for(int n : searchRange(list, 2)) {
System.out.print(n + " ");
}
}
}

Categories