Implementing binary search on an array of Strings

Implementing binary search on an array of Strings - java

I'm having a bit of trouble with this. The input array is based on file input and the size of the array is specified by the first line in the file. The binarySearch method seems to look alright but it doesn't seem to be working would. Anybody be able to help? Thanks.
public static int binarySearch(String[] a, String x) {
int low = 0;
int high = a.length - 1;
int mid;
while (low <= high) {
mid = (low + high) / 2;
if (a[mid].compareTo(x) < 0) {
low = mid + 1;
} else if (a[mid].compareTo(x) > 0) {
high = mid - 1;
} else {
return mid;
}
}
return -1;
}
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
System.out.println("Enter the name of your file (including file extension): ");
String filename = input.next();
String[] numArray;
try (Scanner in = new Scanner(new File(filename))) {
int count = in.nextInt();
numArray = new String[count];
for (int i = 0; in.hasNextInt() && count != -1 && i < count; i++) {
numArray[i] = in.nextLine();
}
for (int i = 0; i < count; i++) //printing all the elements
{
System.out.println(numArray[i]);
}
String searchItem = "The";
System.out.println("The position of the String is:");
binarySearch(numArray, searchItem);
} catch (final FileNotFoundException e) {
System.out.println("That file was not found. Program terminating...");
e.printStackTrace();
}
}

I have added following example for your further referance.
import java.util.Arrays;
public class BinaryStringSearch {
public static void main(String[] args) {
String array[] ={"EWRSFSFSFSB","BB","AA","SDFSFJ","WRTG","FF","ERF","FED","TGH"};
String search = "BB";
Arrays.sort(array);
int searchIndex = binarySearch(array,search);
System.out.println(searchIndex != -1 ? array[searchIndex]+ " - Index is "+searchIndex : "Not found");
}
public static int binarySearch(String[] a, String x) {
int low = 0;
int high = a.length - 1;
int mid;
while (low <= high) {
mid = (low + high) / 2;
if (a[mid].compareTo(x) < 0) {
low = mid + 1;
} else if (a[mid].compareTo(x) > 0) {
high = mid - 1;
} else {
return mid;
}
}
return -1;
}
}

I hope it will help:
public static void main(String ar[]){
String str[] = {"account","angel","apple","application","black"};
String search= "application";
int length = str.length-1;
BinarySearchInString findStrin = new BinarySearchInString();
int i = findStrin.find(str, 0, length, search);
System.out.println(i);
}
public int find(String first[], int start, int end, String searchString){
int mid = start + (end-start)/2;
if(first[mid].compareTo(searchString)==0){
return mid;
}
if(first[mid].compareTo(searchString)> 0){
return find(first, start, mid-1, searchString);
}else if(first[mid].compareTo(searchString)< 0){
return find(first, mid+1, end, searchString);
}
return -1;
}

As spotted by #Mike Rylander you forgot to output the result.
You should use Arrays.binarySearch instead of your own implementation.
(As a general rule, JRE libraries are well-tested, well-documented and fast. I googled "java binary search" and this question is well-ranked. I had a try with #Thusitha Indunil's code, which didn't appear to work. I googled harder and found Arrays.binarySearch, which worked.)

I believe that your problem is that you forgot to output the results. Try replacing binarySearch(numArray, searchItem); with System.out.println(binarySearch(numArray, searchItem));

There are four issues that I can spot in addition to the already mentioned missing print of the result.
1: You have a String[] called numArray and you are searching for a String, "The" the name numArray is possibly a bit mis-leading.
2: I assume you have some sort of specified input file format where the number of String in the file are specified by an integer as the first token in the file. This is ok however, as a condition in the for loop that populates the numArray there is in.hasNextInt(), and the next token is taken out of the Scanner using in.nextLine(). You should use complementing check/removal methods such as in.hasNext() with in.next(). Check out the Scanner API.
3: The binarySearch(String[], String) method uses String.compareTo(String). This is determines a lexicographical ordering of this String to the parameter String. Trying to compare upper case to lower case may not yield what you expect, as "the".compareTo("The") will not result in 0. You should check out the String API for options to either force all of your input to one case, maybe while reading the file, or use a different flavor of a compare to method.
4: The last thing that I see may be considered a bit of a corner case, however with a sufficiently large String array, and a search string that may reside far in the right side, ie. high index side, of the array you may get an ArrayIndexOutOfBoundsException. This is because (low + high) can result in a negative value, when the result "should" be greater than Integer.MAX_VALUE. Then the result is divided by two and still yields a negative value. This can be solved by bit shifting the result instead of dividing by 2, (low + high) >>> 1. Joshua Bloch has a great article about this common flaw in divide and conquer algorithms.

int low = fromIndex;
int high = toIndex - 1;
while (low <= high) {
int mid = (low + high) >>> 1;
Comparable midVal = (Comparable) a[mid];
int cmp = midVal.compareTo(key);
if (cmp < 0)
low = mid + 1;
else if (cmp > 0)
high = mid - 1;
else
return mid;
}
return -(low + 1);
}

Related

Logic Error: Binary search fails with more than two elements in a string array

The objective is to return the index of an element in a string array if present. The method uses a basic binary search using compareTo statements. With more than two elements in a tested array, the method will not detect the present element and return -1. The Java code this question is referring to is below. How do I make the binary search method work as intended?
public static int binarySearch(String[] array, String x) {
int high = array.length - 1;
int low = 0;
while (low <= high) {
int mid = low + (high - low) / 2;
if (x.compareTo(array[mid]) == 0) {
return mid;
}
if (x.compareTo(array[mid]) > 0) {
low = mid + 1;
}
else {
high = mid - 1;
}
}
return -1;
}

add an extra variable and set it to -1;
int loc=-1;
change the code
int mid=low+(high-low)/2;
to
int mid=(low+high)/2;
if(x.compareTo(array[mid]==0)
{
loc=mid;
break;
}
else if(x<array[mid])
{
last=mid-1;
}
else
{
low=mid+1;
}
then
if(loc>=0)
{
System.out.println(loc+1);
}
else
{
System.out.println("no element");
}

Binary Search on Generic Array returning first two Index or Stackoverflow Exception

Homework Question! We are using Binary Searches with Generic Arrays and Comparator. The assignment requires we have Integer / Double / String type arrays and that we can search each of them with a Binary Search. I have successfully used Binary Searching on non-Generic arrays before, this is a bit more difficult though. I generate my Arrays prior to calling the search, prompt user for selection, then perform the search (that's the idea). The currently implement Binary Search SORT of works. It will find the key on the first two index locations...throw a stackoverflow exception on mid range index, high end index return nothing, and input not found spits out -1 (I will add an output for that when the Binary Search works). I know the problem is falling in how I am implementing my Binary Search, I am just failing to fix it. Any help would be appreciated. Code is below:
public static void searchIntegers() {
System.out.print("Please enter the Integer you would like to search for: ");
try {
keyInt = input.nextInt();
System.out.print(keyInt + " is found in index " + binarySearch(integerArray, keyInt));
}
catch (InputMismatchException inputMismatchException) {
input.nextLine();
System.out.printf("Please enter only Integers. Try again. \n\n");
}
}
public static <E extends Comparable<E>> int binarySearch(E[] list, E key) {
return binarySearch(list, key, 0, list.length);
}
public static <E extends Comparable<E>> int binarySearch(E[] list, E key, int low, int high) {
int mid = low + high / 2;
if (low > high) {
return -1;
}
else if (list[mid].equals(key)) {
return mid;
}
else if (list[mid].compareTo(key) == -1) {
return binarySearch(list, key, mid +1, high);
}
else {
return binarySearch(list, key, low, mid -1);
}
}
public static void generateArrays() {
//GENERATE INTEGER ARRAY
for(i = 0; i < 10; i++) {
integerArray[i] = generator.nextInt(100);
}
Arrays.sort(integerArray);
//GENERATE DOUBLE ARRAY
for(i = 0; i < 10; i ++) {
doubleArray[i] = i + generator.nextDouble();
}
Arrays.sort(doubleArray);
}

It's basically failing at the check for low and high and going into an loop
Try this -
public static <E extends Comparable<E>> int binarySearch(E[] list, E key, int low, int high) {
int mid = (low + high) / 2;
if (low > high) {
return -1;
}
if (list[mid].equals(key)) {
return mid;
} else if (list[mid].compareTo(key) == -1) {
return binarySearch(list, key, mid + 1, high);
} else {
return binarySearch(list, key, low, mid - 1);
}
}
You need to add brackets to your mid calculation.
Just a suggestion, check for overflow conditions as well.

My binary search code is too slow

i am trying to solve this algorithm task. And when i submit my code, on some test cases my code is too slow and on some my code gives wrong output. I was trying to find where i made mistake but i really couldnt. Because in test cases where my code fails there are more thousand length arrays and i cant check every output to find mistake.
So i was wondering if you could give me some advice:
What can i do to improve my algorithm efficiency.
Where i make mistake so on some test cases i get wrong output.
Here is my code:
public class Main
{
public static void main(String[] args)
{
Scanner sc = new Scanner(System.in);
int length = sc.nextInt();
int arr[] = new int[length];
for(int i=0; i<length; i++)
arr[i] = sc.nextInt();
int test = sc.nextInt();
int type, check;
for(int i=0; i<test; i++)
{
type = sc.nextInt();
check = sc.nextInt();
if(type == 0)
{
System.out.println(greaterOrEqualThan(arr, check, length));
}
else if(type == 1)
{
System.out.println(greaterThan(arr, check, length));
}
}
}
public static int greaterThan(int arr[],int x, int length)
{
int low = 0;
int high = length-1;
int mid;
while( low+1 < high)
{
mid = (high+low)/2;
if(arr[mid] <= x)
low = mid;
else if(arr[mid] > x)
high = mid;
}
int startIndex;
if(arr[low] > x && arr[low] != arr[high])
startIndex = low;
else if(arr[high] > x && arr[high] != arr[low])
startIndex = high;
else
return 0;
return length-startIndex;
}
public static int greaterOrEqualThan(int arr[], int x, int length)
{
int low = 0;
int high = length-1;
int mid;
while(low+1 < high)
{
mid = (low+high)/2;
if(arr[mid] < x)
low = mid;
else if(arr[mid] == x)
{
high = mid;
break;
}
else
high = mid;
}
int startIndex;
if(arr[low] >= x)
startIndex = low;
else if(arr[high] >= x)
startIndex = high;
else
return 0;
return length-(startIndex);
}
}

I think one or both of your algorithms may be incorrect in cases where there are multiple instances of the target value in the array. (e.g. [1,3,3,3,5].
Three Cases to Consider
There are three cases to consider:
x does not occur in the array
x occurs in the array exactly once
x occurs in the array more than once
How To Solve
I recommend using a classical binary search algorithm for each of the two methods (the exact binary search algorithm without modification). What you do after that is what is different.
So first, run a classical binary search algorithm, inlined directly into your methods (so that you have access to the terminal values of low and high).
Second, after the binary search terminates, test if array[mid] != x. If array[mid] != x, then x does not occur in the array and it is true that low == high + 1 (since high and low have crossed. Therefore, the count of numbers in the array which are not less than x and the count of numbers in the array which are greater than x are both equal to array.length - low.
Third, if it is instead true that array[mid] == x, then x does occur one or more times in the array. Since the classical binary search algorithm terminates immediately when if finds x, it is indeterminate "which" x it terminated on.
In this case, in order to find the count of numbers not less than x, you must find the "first" x in the array using the following code snippet:
do {
mid = mid - 1;
} while (array[mid] == x);
mid will then be the index of the element immediately before the "first" x in the array, and so the count of numbers not less than x will be array.length - mid + 1.
Similarly, in order to find the count of numbers greater than x, you must first find the "last" x in the array using the following code snippet:
do {
mid = mid + 1;
} while (array[mid] == x);
mid will then be the index of the element immediately after the "last" x in the array, and so the count of numbers greater than x will be array.length - mid - 1.
Code
simplified, inlined version of a classical binary search
int low = 0;
int high = array.length - 1;
int mid = (high + low) / 2; // priming read
while (array[mid] != x && low <= high) {
if (array[mid] > x)
high = mid - 1;
else // (array[mid] < x)
low = mid + 1;
mid = (high - mid) / 2;
}
not less than x
int countNotLessThan(int[] array, int x)
{
/* simplified, inlined classical binary search goes here */
if (array[mid] != x) {
return array.length - low;
}
else { // array[mid] == x
do {
mid = mid - 1;
} while (array[mid] == x);
return array.length - mid + 1;
}
}
greater than x
int countGreaterThan(int[] array, int x)
{
/* simplified, inlined classical binary search goes here */
if (array[mid] != x) {
return array.length - low;
}
else { // array[mid] == x
do {
mid = mid + 1;
} while (array[mid] == x);
return array.length - mid - 1;
}
}

Efficient algo to find number of integers in a sorted array that are within a certain range in O(log(N)) time?

I came across a interview question that has to be done in O(logn)
Given a sorted integer array and a number, find the start and end indexes of the number in the array.
Ex1: Array = {0,0,2,3,3,3,3,4,7,7,9} and Number = 3 --> Output = {3,6}
Ex2: Array = {0,0,2,3,3,3,3,4,7,7,9} and Number = 5 --> Output = {-1,-1}
I am trying to find an efficient algo for this but so fat have not been successful.

You can use the concept of binary search to find the starting and ending index:
To find the starting index, you halve the array, if the value is equal to or greater than the input number, repeat with the lower half of the array, otherwise repeat with the higher half. stop when you reached an array of size 1.
To find the starting index, you halve the array, if the value is greater than the input number, repeat with the lower half of the array, otherwise repeat with the higher half. stop when you reached an array of size 1.
Note that when we reached an array of size 1, we may be one cell next to the input number, so we check if it equals the input number, if not, we fix the index by adding/decreasing 1 from the index we found.
findStartIndex(int[] A, int num)
{
int start = 0; end = A.length-1;
while (end != start)
{
mid = (end - start)/2;
if (A[mid] >= num)
end = mid;
else
start = mid;
}
if(A[start] == num)
return start;
else
return start+1;
}
findEndIndex(int[] A, int num)
{
int start = 0; end = A.length-1;
while (end != start)
{
mid = (end - start)/2;
if (A[mid] > num)
end = mid;
else
start = mid;
}
if(A[start] == num)
return start;
else
return start-1;
}
And the whole procedure:
int start = findStartIndex(A, num);
if (A[start]!=num)
{
print("-1,-1");
}
else
{
int end = findEndIndex(A, num);
print(start, end);
}

Sounds like a binary search -- log graphs iirc represent the effect of "halving" with each increment, which basically is binary search.
Pseudocode:
Set number to search for
Get length of array, check if number is at the half point
if the half is > the #, check the half of the bottom half. is <, do the inverse
repeat
if the half point is the #, mark the first time this happens as a variable storing its index
then repeat binary searches above , and then binary searches below (separately), such that you check for how far to the left/right it can repeat.
note*: and you sort binary left/right instead of just incrementally, in case your code is tested in a dataset with like 1,000,000 3's in a row or something
Is this clear enough to go from there?

The solution is to binary search the array concurrently (does't actually have to be concurrent :P ) at the start. The key is that the left and right searches are slightly different. For the right side if you encounter a dupe you have to search to the right, and for the left side if you encounter a dupe you search to the left. what you are searching for is the boundary so on the right side you check for.
yournum, not_yournum
This is the boundary and on the left side you just search for the boundary in the opposite direction. At the end return the indices of the boundaries.

Double binary search. You start with lower index = 0, upper index = length - 1. Then you check the point halfway and adjust your indexes accordingly.
The trick is that once you've found target, the pivot splits in two pivots.

Since no one has posted working code yet, I'll post some (Java):
public class DuplicateNumberRangeFinder {
public static void main(String[] args) {
int[] nums = { 0, 0, 2, 3, 3, 3, 3, 4, 7, 7, 9 };
Range range = findDuplicateNumberRange(nums, 3);
System.out.println(range);
}
public static Range findDuplicateNumberRange(int[] nums, int toFind) {
Range notFound = new Range(-1, -1);
if (nums == null || nums.length == 0) {
return notFound;
}
int startIndex = notFound.startIndex;
int endIndex = notFound.endIndex;
int n = nums.length;
int low = 0;
int high = n - 1;
while (low <= high) {
int mid = low + (high - low) / 2;
if (nums[mid] == toFind && (mid == 0 || nums[mid - 1] < toFind)) {
startIndex = mid;
break;
} else if (nums[mid] < toFind) {
low = mid + 1;
} else if (nums[mid] >= toFind) {
high = mid - 1;
}
}
low = 0;
high = n - 1;
while (low <= high) {
int mid = low + (high - low) / 2;
if (nums[mid] == toFind && (mid == n - 1 || nums[mid + 1] > toFind)) {
endIndex = mid;
break;
} else if (nums[mid] <= toFind) {
low = mid + 1;
} else if (nums[mid] > toFind) {
high = mid - 1;
}
}
return new Range(startIndex, endIndex);
}
private static class Range {
int startIndex;
int endIndex;
public Range(int startIndex, int endIndex) {
this.startIndex = startIndex;
this.endIndex = endIndex;
}
public String toString() {
return "[" + this.startIndex + ", " + this.endIndex + "]";
}
}
}

It may be error on my end, but Ron Teller's answer has an infinite loop when I've tested it. Here's a working example in Java, that can be tested here if you change the searchRange function to not be static.
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class RangeInArray {
// DO NOT MODIFY THE LIST
public static ArrayList<Integer> searchRange(final List<Integer> a, int b) {
ArrayList<Integer> range = new ArrayList<>();
int startIndex = findStartIndex(a, b);
if(a.get(startIndex) != b) {
range.add(-1);
range.add(-1);
return range;
}
range.add(startIndex);
range.add(findEndIndex(a, b));
return range;
}
public static int findStartIndex(List<Integer> a, int b) {
int midIndex = 0, lowerBound = 0, upperBound = a.size() - 1;
while(lowerBound < upperBound) {
midIndex = (upperBound + lowerBound) / 2;
if(b <= a.get(midIndex)) upperBound = midIndex - 1;
else lowerBound = midIndex + 1;
}
if(a.get(lowerBound) == b) return lowerBound;
return lowerBound + 1;
}
public static int findEndIndex(List<Integer> a, int b) {
int midIndex = 0, lowerBound = 0, upperBound = a.size() - 1;
while(lowerBound < upperBound) {
midIndex = (upperBound + lowerBound) / 2;
if(b < a.get(midIndex)) upperBound = midIndex - 1;
else lowerBound = midIndex + 1;
}
if(a.get(lowerBound) == b) return lowerBound;
return lowerBound - 1;
}
public static void main(String[] args) {
ArrayList<Integer> list = new ArrayList<>();
list.add(1);
list.add(1);
list.add(2);
list.add(2);
list.add(2);
list.add(2);
list.add(2);
list.add(2);
list.add(3);
list.add(4);
list.add(4);
list.add(4);
list.add(4);
list.add(5);
list.add(5);
list.add(5);
System.out.println("Calling search range");
for(int n : searchRange(list, 2)) {
System.out.print(n + " ");
}
}
}

How to use a binarysearch on a sorted array to find the number of integers within a certain range. (with duplicates)

Say you have a sorted array of integers:
{3,4,4,6,10,15,15,19,23,23,24,30}
And you want to find the number of integers that fall within a range of 4 and 23.
{4,4,6,10,15,15,19,23,23}
Thus the result would be 9.
I wrote a binarysearch implementation, but I'm not sure how I would modify it to also take into account the fact that there can be multiple integers that match the upper bounds of the range.
I thought of adding a boolean in the method signature to ask whether to look for the upper bounds of the key, but I'm not sure if it can be done in a single method while keeping O(log(N)) complexity.
Or is there some other way of finding the # of items in that range in the sorted array in O(log(N)) time?
This is what I have so far:
int start = rangeBinarySearch(arr, 4, false);
int end = rangeBinarySearch(arr, 23, true); // true would indicate that I want the position of the last occurrence of the key.
int totalInRange = (Math.abs(end) - Math.abs(start) -1)
private static int rangeBinarySearch(int[] items, int key, boolean lastIndex) {
if(items == null)
throw new IllegalArgumentException();
int start = 0;
int end = items.length - 1;
while(start <= end) {
int mIndex = (start + end) / 2;
int middle = items[mIndex];
if(middle < key)
start = (mIndex +1);
else if(middle > key)
end = (mIndex -1);
else
return mIndex; // Possible something here to find the upper bounds?
}
return -(start +1);
}

Range binary search for the lower bound and the upper bound are different. Here different means they have different stopping criteria and return step.
For the lower bound (left range), you can call the following function to get the index in the sorted array where the value is larger or equal than it, -1 otherwise.
int binarySearchForLeftRange(int a[], int length, int left_range)
{
if (a[length-1] < left_range)
return -1;
int low = 0;
int high = length-1;
while (low<=high)
{
int mid = low+((high-low)/2);
if(a[mid] >= left_range)
high = mid-1;
else //if(a[mid]<i)
low = mid+1;
}
return high+1;
}
For the upper bound (right range), you can call the following function to get the index in the sorted array where the value is smaller or equal than it, -1 otherwise.
int binarySearchForRightRange(int a[], int length, int right_range)
{
if (a[0] > right_range)
return -1;
int low = 0;
int high = length-1;
while (low<=high)
{
int mid = low+((high-low)/2);
if(a[mid] > right_range)
high = mid-1;
else //if(a[mid]<i)
low = mid+1;
}
return low-1;
}
Finally, if you want to get the number of how many elements within this range, it's easy based on return values of these two above functions.
int index_left = binarySearchForLeftRange(a, length, left_range);
int index_right = binarySearchForRightRange(a, length, right_range);
if (index_left==-1 || index_right==-1 || index_left>index_right)
count = 0;
else
count = index_right-index_left+1;
Test: (with duplicates)
int a[] = {3,4,4,6,10,15,15,19,23,23,24,30};
int length = sizeof(arr)/sizeof(arr[0]);
int left_range = 4;
int right_range = 23;
int index_left = binarySearchForLeftRange(a, length, left_range); // will be 1
int index_right = binarySearchForRightRange(a, length, right_range); // will be 9
int count; // will be 9
if (index_left==-1 || index_right==-1 || index_left>index_right)
count = 0;
else
count = index_right-index_left+1;
EDIT: Of course, you can merge the first two functions into one by passing one extra flag to indicate it as lower bound or upper bound, though it will be much more clear if not. Your choice!

If you are not learning the algorithm use standart functions instead:
Arrays.binarySearch
You basicly need first occurence of your first element (4) and last occurence of last (23) and substract. But there is no need for (4) to be there, so read documentation of Arrays.binarySearch, it gives you where (4) would be.
If you are expecting lots of (4)s you have to write your own binSearch, that returns both first and last index:
find first occurence at index i
if there is previous one look at i/2, if there is (4) look at i/4 else look at 3*i/4
...

You need to perform two binary searches to find the lowest index before the rangeLow and the highestIndex after rangeHigh, that way you can count duplicates within the range.
This would give a time complexity of o(2 log n) as we perform the binary search twice.
private int searchArrayForNumbersInRange(int[] arr, int start, int end) {
int leftIndex = searchLeft(arr, start);
int rightIndex = searchRight(arr, end);
int count;
if (leftIndex < 0 || rightIndex < 0)
return -1;
if (rightIndex == leftIndex)
count = 1;
else {
count = rightIndex - leftIndex;
}
return count;
}
private int searchLeft(int[] arr, int start) {
int lo = 0;
int hi = arr.length - 1;
while (lo <= hi) {
int mid = lo + (hi - lo) / 2;
if (arr[mid] == start && arr[mid -1] < start) {
return mid - 1;
}
if (arr[mid] >= start)
hi = mid - 1;
else
lo = mid + 1;
}
return -1;
}
private int searchRight(int[] arr, int end) {
int lo = 0;
int hi = arr.length -1;
while (lo <= hi) {
int mid = lo + (hi - lo) / 2;
if (arr[mid] == end && arr[mid+1] > end)
return mid;
if (mid <= end)
lo = mid + 1;
else
hi = mid - 1;
}
return -1;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.