Java : Binary Search for String Arrays - java

I found some code that can be used to perform a binary search on an array of integers, and I am trying to change it so that I can use it on an array of strings instead.
Here is what I have so far:
import edu.princeton.cs.introcs.In;
import edu.princeton.cs.introcs.StdIn;
import edu.princeton.cs.introcs.StdOut;
import java.util.Arrays;
public class BinarySearch {
/**
* This class should not be instantiated.
*/
private BinarySearch() { }
/**
* Searches for the integer key in the sorted array a[].
* #param key the search key
* #param a the array of integers, must be sorted in ascending order
* #return index of key in array a[] if present; -1 if not present
*/
public static int rank(String key, String[] a) {
int lo = 0;
int hi = a.length - 1;
int steps = 0;
while (lo <= hi) {
steps = steps + 1; // this will keep count of the number of steps needed
// Key is in a[lo..hi] or not present.
int mid = lo + (hi - lo) / 2;
if (key < a[mid])
{
hi = mid - 1;
}
else if (key > a[mid])
{
lo = mid + 1;
}
else return mid;
}
return steps;
}
/**
* Reads in a sequence of integers from the whitelist file, specified as
* a command-line argument. Reads in integers from standard input and
* prints to standard output those integers that do *not* appear in the file.
* #param args
*/
public static void main(String[] args) {
// read the integers from a file
In in = new In("C:\\Users\\Owner\\Desktop\\EnglishWordList.txt");
String[] whitelist = in.readAllStrings();
// sort the array
Arrays.sort(whitelist);
// read integer key from standard input; print if not in whitelist
while (!StdIn.isEmpty()) {
String key = StdIn.readString();
if (rank(key, whitelist) == -1)
StdOut.println(key);
}
}
}
I am getting an error in the rank() method on both if statements. It states that I cannot use the operator "<" and ">" for strings, meaning that its not looking at ASCII code. How can I fix this problem? There may be other issues, but this is the only thing that is highlighted by my IDE. Please let me know what you think.

You can use a.compareTo(b) i.e use comparable inteface instead of using > or < operators.
Comparable interface. Compares values and returns an int which tells if the values compare less than, equal, or greater than. If your class objects have a natural order, implement the Comparable interface and define this method. All Java classes that have a natural ordering implement this (String, Double, BigInteger, ...).

Please try with the following algorithm
public boolean binarySearch(int key, int[] arrayToSearch) {
if (arrayToSearch.length == 0) {
return false;
}
int low = 0;
int high = arrayToSearch.length - 1;
while (low <= high) {
int midItem = (low + high) / 2;
if (key > arrayToSearch[midItem]) {
low = midItem + 1;
} else if (key < arrayToSearch[midItem]) {
high = midItem - 1;
} else { // The element has been found
return true;
}
}
return false;
}

Try this, it should work with Strings too
public static String binarySearch(String[] names, String x) {
int high = names.length - 1;
int low = 0;
int mid = 0;
while (low <= high) {
mid = (low + high) / 2;
if (names[mid].compareTo(x) == 0) {
return names[mid];
}
else if (names[mid].compareTo(x) > 0) {
high = mid - 1;
}
else {
low = mid + 1;
}
}
return null;
}

Related

Insert an element to Array List sorted descending order

I'm trying to insert an element in the correct position in an array list that is sorted in descending order.
The complexity to find the correct position must be O(LOGN).
That's why I tried using binary search to find the correct position.
This is what I did:
I added:
middle = (low + high) / 2;
after the while loop.
The problem is that it's inserting the elements in ascending order. instead of descending order
public void insert(E x) {
if(q.size()==0){
q.add(0, x);
}
else{
int place = binarySearch(x);
q.add(place, x);
}
}
private int binarySearch (E x) {
int size = q.size();
int low = 0;
int high = size - 1;
int middle = 0;
while(high > low) {
middle = (low + high) / 2;
if(q.get(middle).getPriority() == x.getPriority()) {
return middle;
}
if(q.get(middle).getPriority() < x.getPriority()) {
low = middle + 1;
}
if(q.get(middle).getPriority() > x.getPriority()) {
high = middle - 1;
}
}
middle = (low + high) / 2;
if(q.get(middle).getPriority() < x.getPriority()) {
return middle + 1 ;
}
return middle;
}
There are a few problems with your code:
all your comparisons are the wrong way, thus you are inserting in ascending order
you should loop while (high >= low), or you can not insert an element that's smaller than all the existing elements; also, with this you no longer need the if/else in insert
if you want ties to be handled such that the oldest element is sorted first, remove the "same as middle" check and reverse the if/else within the loop; this way, in case of ties, the low bound is raised, inserting the new element after the older one
now, after the while loop, you can just return low
This seems to work (Note: Changed to Integer instead of E for testing, populating an initially empty list with random integers.):
public void insert(E x) {
q.add(binarySearch(x), x);
}
private int binarySearch (E x) {
int low = 0;
int high = q.size() - 1;
while (high >= low) {
int middle = (low + high) / 2;
if (q.get(middle).getPriority() < x.getPriority()) {
high = middle - 1;
} else {
low = middle + 1;
}
}
return low;
}
Tests and example output:
#Data #AllArgsConstructor
class E {
int id, priority;
public String toString() { return String.format("%d/%d", id, priority); }
}
Random random = new Random();
int id = 0;
for (int i = 0; i < 50; i++) {
test.insert(new E(id++, random.nextInt(20)));
}
System.out.println(test.q);
// [2/19, 3/19, 24/19, 32/19, 46/19, 18/18, 23/18, 39/18, 31/17, 10/16, 28/16, 40/16, 45/16, 7/15, 19/14, 33/14, 37/14, 38/14, 36/13, 44/13, 5/11, 12/11, 15/11, 20/11, 30/11, 9/10, 41/10, 48/10, 16/9, 34/9, 13/8, 1/7, 8/7, 35/7, 0/6, 6/6, 22/6, 29/6, 21/5, 26/5, 42/5, 14/4, 27/4, 47/4, 25/3, 4/1, 11/1, 17/1, 43/1, 49/0]
This could be a lot simpler using Collections.binarySearch. The methods will return the index if it is found or return a negative value matching where it should be :
the index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size() if all elements in the list are less than the specified key. Note that this guarantees that the return value will be >= 0 if and only if the key is found.
Here is a quick example of a SortedList
class SortedList<E> extends ArrayList<E>{
Comparator<E> comparator;
public SortedList(Comparator<E> comparator) {
this.comparator = comparator;
}
#Override
public boolean add(E e) {
int index = Collections.binarySearch(this, e, comparator);
if(index < 0){
index = -index - 1;
}
if(index >= this.size()){
super.add(e);
} else {
super.add(index, e);
}
return true;
}
}
And a test case for a descending order:
SortedList<Integer> list = new SortedList<>(
(i1, i2) -> i2 - i1
);
list.add(10);
list.add(20);
list.add(15);
list.add(10);
System.out.println(list);
[20, 15, 10, 10]
The comparator in the constructor allow you to set the order to use for the insertion. Not that this is not safe, this is not overriding every methods but this is a quick answer ;)

Using Binary Search to open up a space in an array at the correct index

I am supposed to create two methods: The first method, bSearch, is a binary search algorithm. The second method, insertInOrder, is supposed to take the index we got from the bSearch method and find that index in the array, open up a spot at that index by shifting all other elements of that array over, and insert the key/ int at that index.
The ints will be received from a text file containing 25 ints. I am not to make any copies or extra arrays to do this, and I am supposed to encode and then decode the index in the insertInOrder method. In these methods, key will be the current int read from the file of ints, and count will count the number of ints which have been received. I am basically building my own sort method, I can't call any outside methods, and the array is not to be out of order at any time.
I have filled in these methods, but I am a shaky on my understanding. I think my problem is that in the bSearch method, because I can't get it to return anything but 0. I can't get it to return the new value of mid, which is the index where the key/ int is supposed to be inserted. Than you for your help. The code is below:
static void insertInOrder( int[] arr, int count, int key )
{
int index=bSearch( arr, count, key );
int encodedIndex = -(index+1);
int decodedIndex = -(index+1);
for (int i = index; i > 0; i--)
{
arr[i] = arr[i-1];
}
arr[index]=key;
}
static int bSearch(int[] a, int count, int key)
{
int lo = 0;
int hi = a.length-1;
int index = 0;
boolean found = false;
while (!found && lo <= hi)
{
int mid = lo + ((hi - lo) / 2);
if (a[mid] == key)
{
found = true;
index = mid;
}
else if (a[mid] > key)
{
hi = mid -1;
}
else
{
lo = mid +1;
}
}
return index;
}
Your binary search method works correctly for a pre-filled array.
The problem is during insertion into the array, the binary search returns 0 if the value doesn't exist in the array instead of providing the correct insert position.
In this case, you probably need track how many values are current used in the array. Is that what the value "count" is for? In that case, you should start your "hi" value at count instead of the max length and you need to return the count as the index if you've reached the end of the array without finding a value.
Update: You can use this binary search to return insertion position.
int lo = 0;
int hi = count-1;
int index = 0;
boolean found = false;
while (!found && lo < hi)
{
int mid = lo + ((hi - lo) / 2);
if (a[mid] == key)
{
found = true;
index = mid;
}
else if (a[mid] > key)
{
hi = mid;
index = hi;
}
else
{
lo = mid +1;
index = lo;
}
}
return index;

Java - Recursive Binary Search through Arraylist

So I need help with a method. using a recursive binary search algorithm to search through an arraylist.
private static < E extends Employee > int binarySearch(ArrayList<E> list, int firstElem, int lastElem, String searchLastName)
{
int middle=0;
if(firstElem > lastElem){
return -1;
}
middle = (firstElem + lastElem) / 2;
if(list.get(middle).getLastName().equals(searchLastName)){
return middle;
}else if( ){ // <-------------?
return binarySearch(list,middle+1,lastElem, searchLastName);
}
else {
return binarySearch(list, firstElem, middle -1, searchLastName);
}
}
This is what I have so far but I'm stuck on the logic part. Any help would be appreciated.
Since you say you're stuck on the logic part, I think a pseudo-code answer suffices.
First of all, I'm assuming your array is sorted in ascending order. If the array is not sorted, binary search is not possible. Because it is sorted, you can keep cutting the problem size in half with each comparison. So if the answer is in the right part of the array, you throw out the left part and only recurse into the right part.
Because with every recursive call the problem gets half as small, you get a running time of O(log n).
The basic logic is like this:
binarySearch(list,begin,end,query)
if (begin > end)
return -1
middle = (begin + end) / 2
if list[middle].value == query
return middle
if list[middle].value < query
return binarySearch(list,begin,middle,query)
return binarySearch(list,middle,end,query)
Here you are
You have to implement a compare method to decide which half you'll search in.
I did a simple method to compare two strings .. it converts is to the a number value.
Check this sample code:
import java.util.ArrayList;
public class BinarySearch {
public static void main(String[] args) {
ArrayList<Employee> arrayList = new ArrayList<Employee>();
for (int i = 0; i < 10; i++) {
Employee employee = new Employee();
employee.setLastName("name" + i);
arrayList.add(employee);
}
System.out.println(arrayList);
System.out
.println(binarySearch(arrayList, 0, arrayList.size(), "name6"));
}
private static <E extends Employee> int binarySearch(ArrayList<E> list,
int firstElem, int lastElem, String searchLastName) {
int middle = 0;
if (firstElem > lastElem) {
return -1;
}
middle = (firstElem + lastElem) / 2;
if (list.get(middle).getLastName().equals(searchLastName)) {
return middle;
} else if (compare(searchLastName, list.get(middle).getLastName()) >= 0) { // <-------------?
return binarySearch(list, middle + 1, lastElem, searchLastName);
} else {
return binarySearch(list, firstElem, middle, searchLastName);
}
}
/**
* Compare to strings.
* #param str1
* #param str2
* #return
*/
public static int compare(String str1, String str2) {
int minLength = Math.min(str1.length(), str2.length());
if (getValue(str1, minLength) > getValue(str2, minLength)) {
return 1;
} else {
return -1;
}
}
/**
* Calculate a value to a string to compare it.
* #param str
* #param length
* #return
*/
public static int getValue(String str, int length) {
int result = 0;
for (int i = 0; i < length; i++) {
char c = str.charAt(i);
result += Math.pow(10, i) * c;
}
return result;
}
}
Here's a sample output:
[name0, name1, name2, name3, name4, name5, name6, name7, name8, name9]
6
I hope this could help.

Find an array inside another larger array

I was recently asked to write 3 test programs for a job. They would be written using just core Java API's and any test framework of my choice. Unit tests should be implemented where appropriate.
Although I haven't received any feedback at all, I suppose they didn't like my solutions (otherwise I would have heard from them), so I decided to show my programs here and ask if this implementation can be considered good, and, if not, then why?
To avoid confusion, I'll ask only first one for now.
Implement a function that finds an
array in another larger array. It
should accept two arrays as parameters
and it will return the index of the
first array where the second array
first occurs in full. Eg,
findArray([2,3,7,1,20], [7,1]) should
return 2.
I didn't try to find any existing solution, but instead wanted to do it myself.
Possible reasons:
1. Should be static.
2. Should use line comments instead of block ones.
3. Didn't check for null values first (I know, just spotted too late).
4. ?
UPDATE:
Quite a few reasons have been presented, and it's very difficult for me to choose one answer as many answers have a good solution. As #adietrich mentioned, I tend to believe they wanted me to demonstrate knowledge of core API (they even asked to write a function, not to write an algorithm).
I believe the best way to secure the job was to provide as many solutions as possible, including:
1. Implementation using Collections.indexOfSubList() method to show that I know core collections API.
2. Implement using brute-force approach, but provide a more elegant solution.
3. Implement using a search algorithm, for example Boyer-Moore.
4. Implement using combination of System.arraycopy() and Arrays.equal(). However not the best solution in terms of performance, it would show my knowledge of standard array routines.
Thank you all for your answers!
END OF UPDATE.
Here is what I wrote:
Actual program:
package com.example.common.utils;
/**
* This class contains functions for array manipulations.
*
* #author Roman
*
*/
public class ArrayUtils {
/**
* Finds a sub array in a large array
*
* #param largeArray
* #param subArray
* #return index of sub array
*/
public int findArray(int[] largeArray, int[] subArray) {
/* If any of the arrays is empty then not found */
if (largeArray.length == 0 || subArray.length == 0) {
return -1;
}
/* If subarray is larger than large array then not found */
if (subArray.length > largeArray.length) {
return -1;
}
for (int i = 0; i < largeArray.length; i++) {
/* Check if the next element of large array is the same as the first element of subarray */
if (largeArray[i] == subArray[0]) {
boolean subArrayFound = true;
for (int j = 0; j < subArray.length; j++) {
/* If outside of large array or elements not equal then leave the loop */
if (largeArray.length <= i+j || subArray[j] != largeArray[i+j]) {
subArrayFound = false;
break;
}
}
/* Sub array found - return its index */
if (subArrayFound) {
return i;
}
}
}
/* Return default value */
return -1;
}
}
Test code:
package com.example.common.utils;
import com.example.common.utils.ArrayUtils;
import junit.framework.TestCase;
public class ArrayUtilsTest extends TestCase {
private ArrayUtils arrayUtils = new ArrayUtils();
public void testFindArrayDoesntExist() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {8,9,10};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistSimple() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {3,4,5};
int expected = 2;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistFirstPosition() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {1,2,3};
int expected = 0;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistLastPosition() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {5,6,7};
int expected = 4;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayDoesntExistPartiallyEqual() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {6,7,8};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistPartiallyEqual() {
int[] largeArray = {1,2,3,1,2,3,4,5,6,7};
int[] subArray = {1,2,3,4};
int expected = 3;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArraySubArrayEmpty() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArraySubArrayLargerThanArray() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {4,5,6,7,8,9,10,11};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistsVeryComplex() {
int[] largeArray = {1234, 56, -345, 789, 23456, 6745};
int[] subArray = {56, -345, 789};
int expected = 1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
}
The requirement of "using just core Java API's" could also mean that they wanted to see whether you would reinvent the wheel. So in addition to your own implementation, you could give the one-line solution, just to be safe:
public static int findArray(Integer[] array, Integer[] subArray)
{
return Collections.indexOfSubList(Arrays.asList(array), Arrays.asList(subArray));
}
It may or may not be a good idea to point out that the example given contains invalid array literals.
Clean and improved code
public static int findArrayIndex(int[] subArray, int[] parentArray) {
if(subArray.length==0){
return -1;
}
int sL = subArray.length;
int l = parentArray.length - subArray.length;
int k = 0;
for (int i = 0; i < l; i++) {
if (parentArray[i] == subArray[k]) {
for (int j = 0; j < subArray.length; j++) {
if (parentArray[i + j] == subArray[j]) {
sL--;
if (sL == 0) {
return i;
}
}
}
}
}
return -1;
}
For finding an array of integers in a larger array of integers, you can use the same kind of algorithms as finding a substring in a larger string. For this there are many algorithms known (see Wikipedia). Especially the Boyer-Moore string search is efficient for large arrays. The algorithm that you are trying to implement is not very efficient (Wikipedia calls this the 'naive' implementation).
For your questions:
Yes, such a method should be static
Don't care, that's a question of taste
The null check can be included, or you should state in the JavaDoc that null values are not allowed, or JavaDoc should state that when either parameter is null a NullPointerException will be thrown.
Well, off the top of my head:
Yes, should be static.
A company complaining about that would not be worth working for.
Yeah, but what would you do? Return? Or throw an exception? It'll throw an exception the way it is already.
I think the main problem is that your code is not very elegant. Too many checks in the inner loop. Too many redundant checks.
Just raw, off the top of my head:
public int findArray(int[] largeArray, int[] subArray) {
int subArrayLength = subArray.length;
if (subArrayLength == 0) {
return -1;
}
int limit = largeArray.length - subArrayLength;
int i=0;
for (int i = 0; i <= limit; i++) {
boolean subArrayFound = true;
for (int j = 0; j < subArrayLength; j++) {
if (subArray[j] != largeArray[i+j]) {
subArrayFound = false;
break;
}
/* Sub array found - return its index */
if (subArrayFound) {
return i;
}
}
/* Return default value */
return -1;
}
You could keep that check for the first element so you don't have the overhead of setting up the boolean and the for loop for every single element in the array. Then you'd be looking at
public int findArray(int[] largeArray, int[] subArray) {
int subArrayLength = subArray.length;
if (subArrayLength == 0) {
return -1;
}
int limit = largeArray.length - subArrayLength;
for (int i = 0; i <= limit; i++) {
if (subArray[0] == largeArray[i]) {
boolean subArrayFound = true;
for (int j = 1; j < subArrayLength; j++) {
if (subArray[j] != largeArray[i+j]) {
subArrayFound = false;
break;
}
/* Sub array found - return its index */
if (subArrayFound) {
return i;
}
}
}
/* Return default value */
return -1;
}
Following is an approach using KMP pattern matching algorithm. This solution takes O(n+m). Where n = length of large array and m = length of sub array. For more information, check:
https://en.wikipedia.org/wiki/KMP_algorithm
Brute force takes O(n*m). I just checked that Collections.indexOfSubList method is also O(n*m).
public static int subStringIndex(int[] largeArray, int[] subArray) {
if (largeArray.length == 0 || subArray.length == 0){
throw new IllegalArgumentException();
}
if (subArray.length > largeArray.length){
throw new IllegalArgumentException();
}
int[] prefixArr = getPrefixArr(subArray);
int indexToReturn = -1;
for (int m = 0, s = 0; m < largeArray.length; m++) {
if (subArray[s] == largeArray[m]) {
s++;
} else {
if (s != 0) {
s = prefixArr[s - 1];
m--;
}
}
if (s == subArray.length) {
indexToReturn = m - subArray.length + 1;
break;
}
}
return indexToReturn;
}
private static int[] getPrefixArr(int[] subArray) {
int[] prefixArr = new int[subArray.length];
prefixArr[0] = 0;
for (int i = 1, j = 0; i < prefixArr.length; i++) {
while (subArray[i] != subArray[j]) {
if (j == 0) {
break;
}
j = prefixArr[j - 1];
}
if (subArray[i] == subArray[j]) {
prefixArr[i] = j + 1;
j++;
} else {
prefixArr[i] = j;
}
}
return prefixArr;
}
A little bit optimized code that was posted before:
public int findArray(byte[] largeArray, byte[] subArray) {
if (subArray.length == 0) {
return -1;
}
int limit = largeArray.length - subArray.length;
next:
for (int i = 0; i <= limit; i++) {
for (int j = 0; j < subArray.length; j++) {
if (subArray[j] != largeArray[i+j]) {
continue next;
}
}
/* Sub array found - return its index */
return i;
}
/* Return default value */
return -1;
}
int findSubArr(int[] arr,int[] subarr)
{
int lim=arr.length-subarr.length;
for(int i=0;i<=lim;i++)
{
int[] tmpArr=Arrays.copyOfRange(arr,i,i+subarr.length);
if(Arrays.equals(tmpArr,subarr))
return i; //returns starting index of sub array
}
return -1;//return -1 on finding no sub-array
}
UPDATE:
By reusing the same int array instance:
int findSubArr(int[] arr,int[] subarr)
{
int lim=arr.length-subarr.length;
int[] tmpArr=new int[subarr.length];
for(int i=0;i<=lim;i++)
{
System.arraycopy(arr,i,tmpArr,0,subarr.length);
if(Arrays.equals(tmpArr,subarr))
return i; //returns starting index of sub array
}
return -1;//return -1 on finding no sub-array
}
I would suggest the following improvements:
make the function static so that you can avoid creating an instance
the outer loop condition could be i <= largeArray.length-subArray.length, to avoid a test inside the loop
remove the test (largeArray[i] == subArray[0]) that is redundant
Here's #indexOf from String:
/**
* Code shared by String and StringBuffer to do searches. The
* source is the character array being searched, and the target
* is the string being searched for.
*
* #param source the characters being searched.
* #param sourceOffset offset of the source string.
* #param sourceCount count of the source string.
* #param target the characters being searched for.
* #param targetOffset offset of the target string.
* #param targetCount count of the target string.
* #param fromIndex the index to begin searching from.
*/
static int indexOf(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount,
int fromIndex) {
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (targetCount == 0) {
return fromIndex;
}
char first = target[targetOffset];
int max = sourceOffset + (sourceCount - targetCount);
for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);
}
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j]
== target[k]; j++, k++);
if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}
}
return -1;
}
First to your possible reasons:
Yes. And the class final with a private constructor.
Shouldn't use this kind of comments at all. The code should be self-explanatory.
You're basically implicitly checking for null by accessing the length field which will throw a NullPointerException. Only in the case of a largeArray.length == 0 and a subArray == null will this slip through.
More potential reasons:
The class doesn't contain any function for array manipulations, opposed to what the documentation says.
The documentation for the method is very sparse. It should state when and which exceptions are thrown (e.g. NullPointerException) and which return value to expect if the second array isn't found or if it is empty.
The code is more complex than needed.
Why is the equality of the first elements so important that it gets its own check?
In the first loop, it is assumed that the second array will be found, which is unintentional.
Unneeded variable and jump (boolean and break), further reducing legibility.
largeArray.length <= i+j is not easy to grasp. Should be checked before the loop, improving the performance along the way.
I'd swap the operands of subArray[j] != largeArray[i+j]. Seems more natural to me.
All in all too long.
The test code is lacking more edge cases (null arrays, first array empty, both arrays empty, first array contained in second array, second array contained multiple times etc.).
Why is the last test case named testFindArrayExistsVeryComplex?
What the exercise is missing is a specification of the component type of the array parameters, respectively the signature of the method. It makes a huge difference whether the component type is a primitive type or a reference type. The solution of adietrich assumes a reference type (thus could be generified as further improvement), mine assumes a primitive type (int).
So here's my shot, concentrating on the code / disregarding documentation and tests:
public final class ArrayUtils {
// main method
public static int indexOf(int[] haystack, int[] needle) {
return indexOf(haystack, needle, 0);
}
// helper methods
private static int indexOf(int[] haystack, int[] needle, int fromIndex) {
for (int i = fromIndex; i < haystack.length - needle.length; i++) {
if (containsAt(haystack, needle, i)) {
return i;
}
}
return -1;
}
private static boolean containsAt(int[] haystack, int[] needle, int offset) {
for (int i = 0; i < needle.length; i++) {
if (haystack[i + offset] != needle[i]) {
return false;
}
}
return true;
}
// prevent initialization
private ArrayUtils() {}
}
byte[] arr1 = {1, 2, 3, 4, 5, 6, 7, 7, 8, 9, 1, 3, 4, 56, 6, 7};
byte[] arr2 = {9, 1, 3};
boolean i = IsContainsSubArray(arr1, arr2);
public static boolean IsContainsSubArray(byte[] Large_Array, byte[] Sub_Array){
try {
int Large_Array_size, Sub_Array_size, k = 0;
Large_Array_size = Large_Array.length;
Sub_Array_size = Sub_Array.length;
if (Sub_Array_size > Large_Array_size) {
return false;
}
for (int i = 0; i < Large_Array_size; i++) {
if (Large_Array[i] == Sub_Array[k]) {
k++;
} else {
k = 0;
}
if (k == Sub_Array_size) {
return true;
}
}
} catch (Exception e) {
}
return false;
}
Code from Guava:
import javax.annotation.Nullable;
/**
* Ensures that an object reference passed as a parameter to the calling method is not null.
*
* #param reference an object reference
* #param errorMessage the exception message to use if the check fails; will be converted to a
* string using {#link String#valueOf(Object)}
* #return the non-null reference that was validated
* #throws NullPointerException if {#code reference} is null
*/
public static <T> T checkNotNull(T reference, #Nullable Object errorMessage) {
if (reference == null) {
throw new NullPointerException(String.valueOf(errorMessage));
}
return reference;
}
/**
* Returns the start position of the first occurrence of the specified {#code
* target} within {#code array}, or {#code -1} if there is no such occurrence.
*
* <p>More formally, returns the lowest index {#code i} such that {#code
* java.util.Arrays.copyOfRange(array, i, i + target.length)} contains exactly
* the same elements as {#code target}.
*
* #param array the array to search for the sequence {#code target}
* #param target the array to search for as a sub-sequence of {#code array}
*/
public static int indexOf(int[] array, int[] target) {
checkNotNull(array, "array");
checkNotNull(target, "target");
if (target.length == 0) {
return 0;
}
outer:
for (int i = 0; i < array.length - target.length + 1; i++) {
for (int j = 0; j < target.length; j++) {
if (array[i + j] != target[j]) {
continue outer;
}
}
return i;
}
return -1;
}
I would to do it in three ways:
Using no imports i.e. using plain Java statements.
Using JAVA core APIs - to some extent or to much extent.
Using string pattern search algorithms like KMP etc. (Probably the most optimized one.)
1,2 and 3 are all shown above in the answers. Here is approach 2 from my side:
public static void findArray(int[] array, int[] subArray) {
if (subArray.length > array.length) {
return;
}
if (array == null || subArray == null) {
return;
}
if (array.length == 0 || subArray.length == 0) {
return;
}
//Solution 1
List<Integer> master = Arrays.stream(array).boxed().collect(Collectors.toList());
List<Integer> pattern = IntStream.of(subArray).boxed().collect(Collectors.toList());
System.out.println(Collections.indexOfSubList(master, pattern));
//Solution2
for (int i = 0; i <= array.length - subArray.length; i++) {
String s = Arrays.toString(Arrays.copyOfRange(array, i, i + subArray.length));
if (s.equals(Arrays.toString(subArray))) {
System.out.println("Found at:" + i);
return;
}
}
System.out.println("Not found.");
}
Using java 8 and lambda expressions:
String[] smallArray = {"1","2","3"};
final String[] bigArray = {"0","1","2","3","4"};
boolean result = Arrays.stream(smallArray).allMatch(s -> Arrays.stream(bigArray).anyMatch(b -> b.equals(s)));
PS: is important to have finalString[] bigArray for enclosing space of lambda expression.
FYI: if the goal is simply to search wether an array y is a subset of an array x, we can use this:
val x = Array(1,2,3,4,5)
val y = Array(3,4,5)
val z = Array(3,4,8)
x.containsSlice(y) // true
x.containsSlice(z) // false

Natural sort order string comparison in Java - is one built in? [duplicate]

This question already has answers here:
Sort on a string that may contain a number
(24 answers)
Closed 5 years ago.
I'd like some kind of string comparison function that preserves natural sort order1. Is there anything like this built into Java? I can't find anything in the String class, and the Comparator class only knows of two implementations.
I can roll my own (it's not a very hard problem), but I'd rather not re-invent the wheel if I don't have to.
In my specific case, I have software version strings that I want to sort. So I want "1.2.10.5" to be considered greater than "1.2.9.1".
1 By "natural" sort order, I mean it compares strings the way a human would compare them, as opposed to "ascii-betical" sort ordering that only makes sense to programmers. In other words, "image9.jpg" is less than "image10.jpg", and "album1set2page9photo1.jpg" is less than "album1set2page10photo5.jpg", and "1.2.9.1" is less than "1.2.10.5"
In java the "natural" order meaning is "lexicographical" order, so there is no implementation in the core like the one you're looking for.
There are open source implementations.
Here's one:
NaturalOrderComparator.java
Make sure you read the:
Cougaar Open Source License
I hope this helps!
I have tested three Java implementations mentioned here by others and found that their work slightly differently but none as I would expect.
Both AlphaNumericStringComparator and AlphanumComparator do not ignore whitespaces so that pic2 is placed before pic 1.
On the other hand NaturalOrderComparator ignores not only whitespaces but also all leading zeros so that sig[1] precedes sig[0].
Regarding performance AlphaNumericStringComparator is ~x10 slower then the other two.
String implements Comparable, and that is what natural ordering is in Java (comparing using the comparable interface). You can put the strings in a TreeSet or sort using the Collections or Arrays classes.
However, in your case you don't want "natural ordering" you really want a custom comparator, which you can then use in the Collections.sort method or the Arrays.sort method that takes a comparator.
In terms of the specific logic you are looking for implementing within the comparator, (numbers separated by dots) I'm not aware of any existing standard implementations of that, but as you said, it is not a hard problem.
EDIT: In your comment, your link gets you here, which does a decent job if you don't mind the fact that it is case sensitive. Here is that code modified to allow you to pass in the String.CASE_INSENSITIVE_ORDER:
/*
* The Alphanum Algorithm is an improved sorting algorithm for strings
* containing numbers. Instead of sorting numbers in ASCII order like
* a standard sort, this algorithm sorts numbers in numeric order.
*
* The Alphanum Algorithm is discussed at http://www.DaveKoelle.com
*
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*
*/
import java.util.Comparator;
/**
* This is an updated version with enhancements made by Daniel Migowski,
* Andre Bogus, and David Koelle
*
* To convert to use Templates (Java 1.5+):
* - Change "implements Comparator" to "implements Comparator<String>"
* - Change "compare(Object o1, Object o2)" to "compare(String s1, String s2)"
* - Remove the type checking and casting in compare().
*
* To use this class:
* Use the static "sort" method from the java.util.Collections class:
* Collections.sort(your list, new AlphanumComparator());
*/
public class AlphanumComparator implements Comparator<String>
{
private Comparator<String> comparator = new NaturalComparator();
public AlphanumComparator(Comparator<String> comparator) {
this.comparator = comparator;
}
public AlphanumComparator() {
}
private final boolean isDigit(char ch)
{
return ch >= 48 && ch <= 57;
}
/** Length of string is passed in for improved efficiency (only need to calculate it once) **/
private final String getChunk(String s, int slength, int marker)
{
StringBuilder chunk = new StringBuilder();
char c = s.charAt(marker);
chunk.append(c);
marker++;
if (isDigit(c))
{
while (marker < slength)
{
c = s.charAt(marker);
if (!isDigit(c))
break;
chunk.append(c);
marker++;
}
} else
{
while (marker < slength)
{
c = s.charAt(marker);
if (isDigit(c))
break;
chunk.append(c);
marker++;
}
}
return chunk.toString();
}
public int compare(String s1, String s2)
{
int thisMarker = 0;
int thatMarker = 0;
int s1Length = s1.length();
int s2Length = s2.length();
while (thisMarker < s1Length && thatMarker < s2Length)
{
String thisChunk = getChunk(s1, s1Length, thisMarker);
thisMarker += thisChunk.length();
String thatChunk = getChunk(s2, s2Length, thatMarker);
thatMarker += thatChunk.length();
// If both chunks contain numeric characters, sort them numerically
int result = 0;
if (isDigit(thisChunk.charAt(0)) && isDigit(thatChunk.charAt(0)))
{
// Simple chunk comparison by length.
int thisChunkLength = thisChunk.length();
result = thisChunkLength - thatChunk.length();
// If equal, the first different number counts
if (result == 0)
{
for (int i = 0; i < thisChunkLength; i++)
{
result = thisChunk.charAt(i) - thatChunk.charAt(i);
if (result != 0)
{
return result;
}
}
}
} else
{
result = comparator.compare(thisChunk, thatChunk);
}
if (result != 0)
return result;
}
return s1Length - s2Length;
}
private static class NaturalComparator implements Comparator<String> {
public int compare(String o1, String o2) {
return o1.compareTo(o2);
}
}
}
Have a look at this implementation. It should be as fast as possible, without any regular expressions or array manipulation or method calls, just a couple of flags and a lot of cases.
This should sort any combination of numbers inside strings and properly support numbers which are equal and move on.
public static int naturalCompare(String a, String b, boolean ignoreCase) {
if (ignoreCase) {
a = a.toLowerCase();
b = b.toLowerCase();
}
int aLength = a.length();
int bLength = b.length();
int minSize = Math.min(aLength, bLength);
char aChar, bChar;
boolean aNumber, bNumber;
boolean asNumeric = false;
int lastNumericCompare = 0;
for (int i = 0; i < minSize; i++) {
aChar = a.charAt(i);
bChar = b.charAt(i);
aNumber = aChar >= '0' && aChar <= '9';
bNumber = bChar >= '0' && bChar <= '9';
if (asNumeric)
if (aNumber && bNumber) {
if (lastNumericCompare == 0)
lastNumericCompare = aChar - bChar;
} else if (aNumber)
return 1;
else if (bNumber)
return -1;
else if (lastNumericCompare == 0) {
if (aChar != bChar)
return aChar - bChar;
asNumeric = false;
} else
return lastNumericCompare;
else if (aNumber && bNumber) {
asNumeric = true;
if (lastNumericCompare == 0)
lastNumericCompare = aChar - bChar;
} else if (aChar != bChar)
return aChar - bChar;
}
if (asNumeric)
if (aLength > bLength && a.charAt(bLength) >= '0' && a.charAt(bLength) <= '9') // as number
return 1; // a has bigger size, thus b is smaller
else if (bLength > aLength && b.charAt(aLength) >= '0' && b.charAt(aLength) <= '9') // as number
return -1; // b has bigger size, thus a is smaller
else if (lastNumericCompare == 0)
return aLength - bLength;
else
return lastNumericCompare;
else
return aLength - bLength;
}
How about using the split() method from String, parse the single numeric string and then compare them one by one?
#Test
public void test(){
System.out.print(compare("1.12.4".split("\\."), "1.13.4".split("\\."),0));
}
public static int compare(String[] arr1, String[] arr2, int index){
// if arrays do not have equal size then and comparison reached the upper bound of one of them
// then the longer array is considered the bigger ( --> 2.2.0 is bigger then 2.2)
if(arr1.length <= index || arr2.length <= index) return arr1.length - arr2.length;
int result = Integer.parseInt(arr1[index]) - Integer.parseInt(arr2[index]);
return result == 0 ? compare(arr1, arr2, ++index) : result;
}
I did not check the corner cases but that should work and it's quite compact
It concats the digits, then compares it. And if it's not applicable it continues.
public int compare(String o1, String o2) {
if(o1 == null||o2 == null)
return 0;
for(int i = 0; i<o1.length()&&i<o2.length();i++){
if(Character.isDigit(o1.charAt(i)) || Character.isDigit(o2.charAt(i)))
{
String dig1 = "",dig2 = "";
for(int x = i; x<o1.length() && Character.isDigit(o1.charAt(i)); x++){
dig1+=o1.charAt(x);
}
for(int x = i; x<o2.length() && Character.isDigit(o2.charAt(i)); x++){
dig2+=o2.charAt(x);
}
if(Integer.valueOf(dig1) < Integer.valueOf(dig2))
return -1;
if(Integer.valueOf(dig1) > Integer.valueOf(dig2))
return 1;
}
if(o1.charAt(i)<o2.charAt(i))
return -1;
if(o1.charAt(i)>o2.charAt(i))
return 1;
}
return 0;
}
Using RuleBasedCollator might be an option as well. Though you'd have to add all the sort order rules in advance so it's not a good solution if you want to take larger numbers into account as well.
Adding specific customizations such as 2 < 10 is quite easy though and might be useful for sorting special version identifiers like Trusty < Precise < Xenial < Yakkety.
RuleBasedCollator localRules = (RuleBasedCollator) Collator.getInstance();
String extraRules = IntStream.range(0, 100).mapToObj(String::valueOf).collect(joining(" < "));
RuleBasedCollator c = new RuleBasedCollator(localRules.getRules() + " & " + extraRules);
List<String> a = asList("1-2", "1-02", "1-20", "10-20", "fred", "jane", "pic01", "pic02", "pic02a", "pic 5", "pic05", "pic 7", "pic100", "pic100a", "pic120", "pic121");
shuffle(a);
a.sort(c);
System.out.println(a);
Might be a late reply. But my answer can help someone else who needs a comparator like this.
I verified couple of other comparators too. But mine seems bit efficient than others I compared. Also tried the one that Yishai has posted. Mine is taking only half of the time as the mentioned one for data of alphanumeric data set of 100 entries.
/**
* Sorter that compares the given Alpha-numeric strings. This iterates through each characters to
* decide the sort order. There are 3 possible cases while iterating,
*
* <li>If both have same non-digit characters then the consecutive characters will be considered for
* comparison.</li>
*
* <li>If both have numbers at the same position (with/without non-digit characters) the consecutive
* digit characters will be considered to form the valid integer representation of the characters
* will be taken and compared.</li>
*
* <li>At any point if the comparison gives the order(either > or <) then the consecutive characters
* will not be considered.</li>
*
* For ex., this will be the ordered O/P of the given list of Strings.(The bold characters decides
* its order) <i><b>2</b>b,<b>100</b>b,a<b>1</b>,A<b>2</b>y,a<b>100</b>,</i>
*
* #author kannan_r
*
*/
class AlphaNumericSorter implements Comparator<String>
{
/**
* Does the Alphanumeric sort of the given two string
*/
public int compare(String theStr1, String theStr2)
{
char[] theCharArr1 = theStr1.toCharArray();
char[] theCharArr2 = theStr2.toCharArray();
int aPosition = 0;
if (Character.isDigit(theCharArr1[aPosition]) && Character.isDigit(theCharArr2[aPosition]))
{
return sortAsNumber(theCharArr1, theCharArr2, aPosition++ );
}
return sortAsString(theCharArr1, theCharArr2, 0);
}
/**
* Sort the given Arrays as string starting from the given position. This will be a simple case
* insensitive sort of each characters. But at any given position if there are digits in both
* arrays then the method sortAsNumber will be invoked for the given position.
*
* #param theArray1 The first character array.
* #param theArray2 The second character array.
* #param thePosition The position starting from which the calculation will be done.
* #return positive number when the Array1 is greater than Array2<br/>
* negative number when the Array2 is greater than Array1<br/>
* zero when the Array1 is equal to Array2
*/
private int sortAsString(char[] theArray1, char[] theArray2, int thePosition)
{
int aResult = 0;
if (thePosition < theArray1.length && thePosition < theArray2.length)
{
aResult = (int)theArray1[thePosition] - (int)theArray2[thePosition];
if (aResult == 0)
{
++thePosition;
if (thePosition < theArray1.length && thePosition < theArray2.length)
{
if (Character.isDigit(theArray1[thePosition]) && Character.isDigit(theArray2[thePosition]))
{
aResult = sortAsNumber(theArray1, theArray2, thePosition);
}
else
{
aResult = sortAsString(theArray1, theArray2, thePosition);
}
}
}
}
else
{
aResult = theArray1.length - theArray2.length;
}
return aResult;
}
/**
* Sorts the characters in the given array as number starting from the given position. When
* sorted as numbers the consecutive characters starting from the given position upto the first
* non-digit character will be considered.
*
* #param theArray1 The first character array.
* #param theArray2 The second character array.
* #param thePosition The position starting from which the calculation will be done.
* #return positive number when the Array1 is greater than Array2<br/>
* negative number when the Array2 is greater than Array1<br/>
* zero when the Array1 is equal to Array2
*/
private int sortAsNumber(char[] theArray1, char[] theArray2, int thePosition)
{
int aResult = 0;
int aNumberInStr1;
int aNumberInStr2;
if (thePosition < theArray1.length && thePosition < theArray2.length)
{
if (Character.isDigit(theArray1[thePosition]) && Character.isDigit(theArray1[thePosition]))
{
aNumberInStr1 = getNumberInStr(theArray1, thePosition);
aNumberInStr2 = getNumberInStr(theArray2, thePosition);
aResult = aNumberInStr1 - aNumberInStr2;
if (aResult == 0)
{
thePosition = getNonDigitPosition(theArray1, thePosition);
if (thePosition != -1)
{
aResult = sortAsString(theArray1, theArray2, thePosition);
}
}
}
else
{
aResult = sortAsString(theArray1, theArray2, ++thePosition);
}
}
else
{
aResult = theArray1.length - theArray2.length;
}
return aResult;
}
/**
* Gets the position of the non digit character in the given array starting from the given
* position.
*
* #param theCharArr /the character array.
* #param thePosition The position after which the array need to be checked for non-digit
* character.
* #return The position of the first non-digit character in the array.
*/
private int getNonDigitPosition(char[] theCharArr, int thePosition)
{
for (int i = thePosition; i < theCharArr.length; i++ )
{
if ( !Character.isDigit(theCharArr[i]))
{
return i;
}
}
return -1;
}
/**
* Gets the integer value of the number starting from the given position of the given array.
*
* #param theCharArray The character array.
* #param thePosition The position form which the number need to be calculated.
* #return The integer value of the number.
*/
private int getNumberInStr(char[] theCharArray, int thePosition)
{
int aNumber = 0;
for (int i = thePosition; i < theCharArray.length; i++ )
{
if(!Character.isDigit(theCharArray[i]))
{
return aNumber;
}
aNumber += aNumber * 10 + (theCharArray[i] - 48);
}
return aNumber;
}
}

Categories