Find closest value in an ordered list - java

I am wondering how you would write a simple java method finding the closest Integer to a given value in a sorted Integer list.
Here is my first attempt:
public class Closest {
private static List<Integer> integers = new ArrayList<Integer>();
static {
for (int i = 0; i <= 10; i++) {
integers.add(Integer.valueOf(i * 10));
}
}
public static void main(String[] args) {
Integer closest = null;
Integer arg = Integer.valueOf(args[0]);
int index = Collections.binarySearch(
integers, arg);
if (index < 0) /*arg doesn't exist in integers*/ {
index = -index - 1;
if (index == integers.size()) {
closest = integers.get(index - 1);
} else if (index == 0) {
closest = integers.get(0);
} else {
int previousDate = integers.get(index - 1);
int nextDate = integers.get(index);
if (arg - previousDate < nextDate - arg) {
closest = previousDate;
} else {
closest = nextDate;
}
}
} else /*arg exists in integers*/ {
closest = integers.get(index);
}
System.out.println("The closest Integer to " + arg + " in " + integers
+ " is " + closest);
}
}
What do you think about this solution ? I am sure there is a cleaner way to do this job.
Maybe such method exists somewhere in the Java libraries and I missed it ?

try this little method:
public int closest(int of, List<Integer> in) {
int min = Integer.MAX_VALUE;
int closest = of;
for (int v : in) {
final int diff = Math.abs(v - of);
if (diff < min) {
min = diff;
closest = v;
}
}
return closest;
}
some testcases:
private final static List<Integer> list = Arrays.asList(10, 20, 30, 40, 50);
#Test
public void closestOf21() {
assertThat(closest(21, list), is(20));
}
#Test
public void closestOf19() {
assertThat(closest(19, list), is(20));
}
#Test
public void closestOf20() {
assertThat(closest(20, list), is(20));
}

Kotlin is so helpful
fun List<Int>.closestValue(value: Int) = minBy { abs(value - it) }
val values = listOf(1, 8, 4, -6)
println(values.closestValue(-7)) // -6
println(values.closestValue(2)) // 1
println(values.closestValue(7)) // 8
List doesn't need to be sorted BTW
Edit: since kotlin 1.4, minBy is deprecated. Prefer minByOrNull
#Deprecated("Use minByOrNull instead.", ReplaceWith("this.minByOrNull(selector)"))
#DeprecatedSinceKotlin(warningSince = "1.4")

A solution without binary search (takes advantage of list being sorted):
public int closest(int value, int[] sorted) {
if(value < sorted[0])
return sorted[0];
int i = 1;
for( ; i < sorted.length && value > sorted[i] ; i++);
if(i >= sorted.length)
return sorted[sorted.length - 1];
return Math.abs(value - sorted[i]) < Math.abs(value - sorted[i-1]) ?
sorted[i] : sorted[i-1];
}

To solve the problem, I'd extend the Comparable Interface by a distanceTo method. The implementation of distanceTo returns a double value that represents the intended distance and which is compatible with the result of the compareTo implementation.
The following example illustrates the idea with just apples. You can exchange diameter by weight, volume or sweetness. The bag will always return the 'closest' apple (most similiar in size, wight or taste)
public interface ExtComparable<T> extends Comparable<T> {
public double distanceTo(T other);
}
public class Apple implements Comparable<Apple> {
private Double diameter;
public Apple(double diameter) {
this.diameter = diameter;
}
public double distanceTo(Apple o) {
return diameter - o.diameter;
}
public int compareTo(Apple o) {
return (int) Math.signum(distanceTo(o));
}
}
public class AppleBag {
private List<Apple> bag = new ArrayList<Apple>();
public addApples(Apple...apples){
bag.addAll(Arrays.asList(apples));
Collections.sort(bag);
}
public removeApples(Apple...apples){
bag.removeAll(Arrays.asList(apples));
}
public Apple getClosest(Apple apple) {
Apple closest = null;
boolean appleIsInBag = bag.contains(apple);
if (!appleIsInBag) {
bag.addApples(apple);
}
int appleIndex = bag.indexOf(apple);
if (appleIndex = 0) {
closest = bag.get(1);
} else if(appleIndex = bag.size()-1) {
closest = bag.get(bag.size()-2);
} else {
double absDistToPrev = Math.abs(apple.distanceTo(bag.get(appleIndex-1));
double absDistToNext = Math.abs(apple.distanceTo(bag.get(appleIndex+1));
closest = bag.get(absDistToNext < absDistToPrev ? next : previous);
}
if (!appleIsInBag) {
bag.removeApples(apple);
}
return closest;
}
}

Certainly you can simply use a for loop to go through the and keep track of the difference between the value you are on and the value. It would look cleaner, but be much slower.
See: Finding closest match in collection of numbers

I think what you have is about the simplest and most efficient way to do it. Finding the "closest" item in a sorted list isn't something that is commonly encountered in programming (you typically look for the one that is bigger, or the one that is smaller). The problem only makes sense for numeric types, so is not very generalizable, and thus it would be unusual to have a library function for it.

Not tested
int[] randomArray; // your array you want to find the closest
int theValue; // value the closest should be near to
for (int i = 0; i < randomArray.length; i++) {
int compareValue = randomArray[i];
randomArray[i] -= theValue;
}
int indexOfClosest = 0;
for (int i = 1; i < randomArray.length; i++) {
int compareValue = randomArray[i];
if(Math.abs(randomArray[indexOfClosest] > Math.abs(randomArray[i]){
indexOfClosest = i;
}
}

I think your answer is probably the most efficient way to return a single result.
However, the problem with your approach is that there are 0 (if there is no list), 1, or 2 possible solutions. It's when you have two possible solutions to a function that your problems really start: What if this is not the final answer, but only the first in a series of steps to determine an optimal course of action, and the answer that you didn't return would have provided a better solution? The only correct thing to do would be to consider both answers and compare the results of further processing only at the end.
Think of the square root function as a somewhat analogous problem to this.

If you're not massively concerned on performance (given that the set is searched twice), I think using a Navigable set leads to clearer code:
public class Closest
{
private static NavigableSet<Integer> integers = new TreeSet<Integer>();
static
{
for (int i = 0; i <= 10; i++)
{
integers.add(Integer.valueOf(i * 10));
}
}
public static void main(String[] args)
{
final Integer arg = Integer.valueOf(args[0]);
final Integer lower = integers.lower(arg);
final Integer higher = integers.higher(arg);
final Integer closest;
if (lower != null)
{
if (higher != null)
closest = (higher - arg > arg - lower) ? lower : higher;
else
closest = lower;
}
else
closest = higher;
System.out.println("The closest Integer to " + arg + " in " + integers + " is " + closest);
}
}

Your solution appears to be asymptotically optimal. It might be slightly faster (though probably less maintainable) if it used Math.min/max. A good JIT likely has intrinsics that make these fast.
int index = Collections.binarySearch(integers, arg);
if (index < 0) {
int previousDate = integers.get(Math.max(0, -index - 2));
int nextDate = integers.get(Math.min(integers.size() - 1, -index - 1));
closest = arg - previousDate < nextDate - arg ? previousDate : nextDate;
} else {
closest = integers.get(index);
}

Probably a bit late, but this WILL work, this is a data structure binary search:
Kotlin:
fun binarySearch(list: List<Int>, valueToCompare: Int): Int {
var central: Int
var initialPosition = 0
var lastPosition: Int
var centralValue: Int
lastPosition = list.size - 1
while (initialPosition <= lastPosition) {
central = (initialPosition + lastPosition) / 2 //Central index
centralValue = list[central] //Central index value
when {
valueToCompare == centralValue -> {
return centralValue //found; returns position
}
valueToCompare < centralValue -> {
lastPosition = central - 1 //position changes to the previous index
}
else -> {
initialPosition = central + 1 //position changes to next index
}
}
}
return -1 //element not found
}
Java:
public int binarySearch(int list[], int valueToCompare) {
int central;
int centralValue;
int initialPosition = 0;
int lastPosition = list . length -1;
while (initialPosition <= lastPosition) {
central = (initialPosition + lastPosition) / 2; //central index
centralValue = list[central]; //central index value
if (valueToCompare == centralValue) {
return centralValue; //element found; returns position
} else if (valueToCompare < centralValue) {
lastPosition = central - 1; //Position changes to the previous index
} else {
initialPosition = central + 1; //Position changes to the next index
}
return -1; //element not found
}
}
I hope this helps, happy coding.

Related

How can I build this tree with O(n) space complexity?

The Problem
Given a set of integers, find a subset of those integers which sum to 100,000,000.
Solution
I am attempting to build a tree containing all the combinations of the given set along with the sum. For example, if the given set looked like 0,1,2, I would build the following tree, checking the sum at each node:
{}
{} {0}
{} {1} {0} {0,1}
{} {2} {1} {1,2} {0} {2} {0,1} {0,1,2}
Since I keep both the array of integers at each node and the sum, I should only need the bottom (current) level of the tree in memory.
Issues
My current implementation will maintain the entire tree in memory and therefore uses way too much heap space.
How can I change my current implementation so that the GC will take care of my upper tree levels?
(At the moment I am just throwing a RuntimeException when I have found the target sum but this is obviously just for playing around)
public class RecursiveSolver {
static final int target = 100000000;
static final int[] set = new int[]{98374328, 234234123, 2341234, 123412344, etc...};
Tree initTree() {
return nextLevel(new Tree(null), 0);
}
Tree nextLevel(Tree currentLocation, int current) {
if (current == set.length) { return null; }
else if (currentLocation.sum == target) throw new RuntimeException(currentLocation.getText());
else {
currentLocation.left = nextLevel(currentLocation.copy(), current + 1);
Tree right = currentLocation.copy();
right.value = add(currentLocation.value, set[current]);
right.sum = currentLocation.sum + set[current];
currentLocation.right = nextLevel(right, current + 1);
return currentLocation;
}
}
int[] add(int[] array, int digit) {
if (array == null) {
return new int[]{digit};
}
int[] newValue = new int[array.length + 1];
for (int i = 0; i < array.length; i++) {
newValue[i] = array[i];
}
newValue[array.length] = digit;
return newValue;
}
public static void main(String[] args) {
RecursiveSolver rs = new RecursiveSolver();
Tree subsetTree = rs.initTree();
}
}
class Tree {
Tree left;
Tree right;
int[] value;
int sum;
Tree(int[] value) {
left = null;
right = null;
sum = 0;
this.value = value;
if (value != null) {
for (int i = 0; i < value.length; i++) sum += value[i];
}
}
Tree copy() {
return new Tree(this.value);
}
}
The time and space you need for building the tree here is absolutely nothing at all.
The reason is because, if you're given
A node of the tree
The depth of the node
The ordered array of input elements
you can simply compute its parent, left, and right children nodes using O(1) operations. And you have access to each of those things while you're traversing the tree, so you don't need anything else.
The problem is NP-complete.
If you really want to improve performance, then you have to forget about your tree implementation. You either have to just generate all the subsets and sum them up or to use dynamic programming.
The choice depends on the number of elements to sum and the sum you want to achieve. You know the sum it is 100,000,000, bruteforce exponential algorithm runs in O(2^n * n) time, so for number below 22 it makes sense.
In python you can achieve this with a simple:
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
You can significantly improve this complexity (sacrificing the memory) by using meet in the middle technique (read the wiki article). This will decrease it to O(2^(n/2)), which means that it will perform better than DP solution for n <~ 53
After thinking more about erip's comments, I realized he is correct - I shouldn't be using a tree to implement this algorithm.
Brute force usually is O(n*2^n) because there are n additions for 2^n subsets. Because I only do one addition per node, the solution I came up with is O(2^n) where n is the size of the given set. Also, this algorithm is only O(n) space complexity. Since the number of elements in the original set in my particular problem is small (around 25) O(2^n) complexity is not too much of a problem.
The dynamic solution to this problem is O(t*n) where t is the target sum and n is the number of elements. Because t is very large in my problem, the dynamic solution ends up with a very long runtime and a high memory usage.
This completes my particular solution in around 311 ms on my machine, which is a tremendous improvement over the dynamic programming solutions I have seen for this particular class of problem.
public class TailRecursiveSolver {
public static void main(String[] args) {
final long starttime = System.currentTimeMillis();
try {
step(new Subset(null, 0), 0);
}
catch (RuntimeException ex) {
System.out.println(ex.getMessage());
final long endtime = System.currentTimeMillis();
System.out.println(endtime - starttime);
}
}
static final int target = 100000000;
static final int[] set = new int[]{ . . . };
static void step(Subset current, int counter) {
if (current.sum == target) throw new RuntimeException(current.getText());
else if (counter == set.length) {}
else {
step(new Subset(add(current.subset, set[counter]), current.sum + set[counter]), counter + 1);
step(current, counter + 1);
}
}
static int[] add(int[] array, int digit) {
if (array == null) {
return new int[]{digit};
}
int[] newValue = new int[array.length + 1];
for (int i = 0; i < array.length; i++) {
newValue[i] = array[i];
}
newValue[array.length] = digit;
return newValue;
}
}
class Subset {
int[] subset;
int sum;
Subset(int[] subset, int sum) {
this.subset = subset;
this.sum = sum;
}
public String getText() {
String ret = "";
for (int i = 0; i < (subset == null ? 0 : subset.length); i++) {
ret += " + " + subset[i];
}
if (ret.startsWith(" ")) {
ret = ret.substring(3);
ret = ret + " = " + sum;
} else ret = "null";
return ret;
}
}
EDIT -
The above code still runs in O(n*2^n) time - since the add method runs in O(n) time. This following code will run in true O(2^n) time, and is MUCH more performant, completing in around 20 ms on my machine.
It is limited to sets less than 64 elements due to storing the current subset as the bits in a long.
public class SubsetSumSolver {
static boolean found = false;
static final int target = 100000000;
static final int[] set = new int[]{ . . . };
public static void main(String[] args) {
step(0,0,0);
}
static void step(long subset, int sum, int counter) {
if (sum == target) {
found = true;
System.out.println(getText(subset, sum));
}
else if (!found && counter != set.length) {
step(subset + (1 << counter), sum + set[counter], counter + 1);
step(subset, sum, counter + 1);
}
}
static String getText(long subset, int sum) {
String ret = "";
for (int i = 0; i < 64; i++) if((1 & (subset >> i)) == 1) ret += " + " + set[i];
if (ret.startsWith(" ")) ret = ret.substring(3) + " = " + sum;
else ret = "null";
return ret;
}
}
EDIT 2 -
Here is another version uses a meet in the middle attack, along with a little bit shifting in order to reduce the complexity from O(2^n) to O(2^(n/2)).
If you want to use this for sets with between 32 and 64 elements, you should change the int which represents the current subset in the step function to a long although performance will obviously drastically decrease as the set size increases. If you want to use this for a set with odd number of elements, you should add a 0 to the set to make it even numbered.
import java.util.ArrayList;
import java.util.List;
public class SubsetSumMiddleAttack {
static final int target = 100000000;
static final int[] set = new int[]{ ... };
static List<Subset> evens = new ArrayList<>();
static List<Subset> odds = new ArrayList<>();
static int[][] split(int[] superSet) {
int[][] ret = new int[2][superSet.length / 2];
for (int i = 0; i < superSet.length; i++) ret[i % 2][i / 2] = superSet[i];
return ret;
}
static void step(int[] superSet, List<Subset> accumulator, int subset, int sum, int counter) {
accumulator.add(new Subset(subset, sum));
if (counter != superSet.length) {
step(superSet, accumulator, subset + (1 << counter), sum + superSet[counter], counter + 1);
step(superSet, accumulator, subset, sum, counter + 1);
}
}
static void printSubset(Subset e, Subset o) {
String ret = "";
for (int i = 0; i < 32; i++) {
if (i % 2 == 0) {
if ((1 & (e.subset >> (i / 2))) == 1) ret += " + " + set[i];
}
else {
if ((1 & (o.subset >> (i / 2))) == 1) ret += " + " + set[i];
}
}
if (ret.startsWith(" ")) ret = ret.substring(3) + " = " + (e.sum + o.sum);
System.out.println(ret);
}
public static void main(String[] args) {
int[][] superSets = split(set);
step(superSets[0], evens, 0,0,0);
step(superSets[1], odds, 0,0,0);
for (Subset e : evens) {
for (Subset o : odds) {
if (e.sum + o.sum == target) printSubset(e, o);
}
}
}
}
class Subset {
int subset;
int sum;
Subset(int subset, int sum) {
this.subset = subset;
this.sum = sum;
}
}

How to make a range tree implementation thread safe

I've implemented a range tree which supports updates in the form of incrementing or decrementing the count of a specific value. It can also query the number of values lower or equal to the value provided.
The range tree has been tested to work in a single threaded environment, however I would like to know how to modify the implementation such that it can be updated and queried concurrently.
I know a simple solution would be to synchronise methods that access this tree, but I would like to know if there are ways to make RangeTree thread safe by itself with minimal affect on performance.
public class RangeTree {
public static final int ROOT_NODE = 0;
private int[] count;
private int[] min;
private int[] max;
private int levels;
private int lastLevelSize;
public RangeTree(int maxValue) {
levels = 1;
lastLevelSize = 1;
while (lastLevelSize <= maxValue) {
levels++;
lastLevelSize = lastLevelSize << 1;
}
int alloc = lastLevelSize * 2;
count = new int[alloc];
min = new int[alloc];
max = new int[alloc];
int step = lastLevelSize;
int pointer = ROOT_NODE;
for (int i = 0; i < levels; i++) {
int current = 0;
while (current < lastLevelSize) {
min[pointer] = current;
max[pointer] = current + step - 1;
current += step;
pointer++;
}
step = step >> 1;
}
}
public void register(int value) {
int index = lastLevelSize - 1 + value;
count[index]++;
walkAndRefresh(index);
}
public void unregister(int value) {
int index = lastLevelSize - 1 + value;
count[index]--;
walkAndRefresh(index);
}
private void walkAndRefresh(int node) {
int currentNode = node;
while (currentNode != ROOT_NODE) {
currentNode = (currentNode - 1) >> 1;
count[currentNode] = count[currentNode * 2 + 1] + count[currentNode * 2 + 2];
}
}
public int countLesserOrEq(int value) {
return countLesserOrEq0(value, ROOT_NODE);
}
private int countLesserOrEq0(int value, int node) {
if (max[node] <= value) {
return count[node];
} else if (min[node] > value) {
return 0;
}
return countLesserOrEq0(value, node * 2 + 1) + countLesserOrEq0(value, node * 2 + 2);
}
}
Louis Wasserman is right, this is a difficult question. But it may have simple solution.
Depending on your updates/reads ratio and the contention for the data, it may be useful to use ReadWriteLock instead of synchronized.
Another solution which may be efficient in some cases (depends on your workload) is to copy whole RangeTree object before update and then switch the reference to 'actual' RangeTree. Like it is done in CopyOnWriteArrayList. But this also violates atomic consistency agreement and leads us to eventual consistency.

Storing values of a Fibonacci sequence w/ recursion with minimal runtime

I know my code has a lot of issues right now, but I just want to get the ideas correct before trying anything. I need to have a method which accepts an integer n that returns the nth number in the Fibonacci sequence. While solving it normally with recursion, I have to minimize runtime so when it gets something like the 45th integer, it will still run fairly quickly. Also, I can't use class constants and globals.
The normal way w/ recursion.
public static int fibonacci(int n) {
if (n <= 2) { // to indicate the first two elems in the sequence
return 1;
} else { // goes back to very first integer to calculate (n-1) and (n+1) for (n)
return fibonacci(n-1) + fibonacci(n-2);
}
}
I believe the issue is that there is a lot of redundancy in this process. I figure that I can create a List to calculate up to nth elements so it only run through once before i return the nth element. However, I am having trouble seeing how to use recursion in that case though.
If I am understanding it correctly, the standard recursive method is slow because there are a lot of repeats:
fib(6) = fib(5) + fib(4)
fib(5) = fib(4) + fib(3)
fib(4) = fib(3) + 1
fib(3) = 1 + 1
Is this the correct way of approaching this? Is it needed to have some form of container to have a faster output while still being recursive? Should I use a helper method? I just recently got into recursive programming and I am having a hard time wrapping my head around this since I've been so used to iterative approaches. Thanks.
Here's my flawed and unfinished code:
public static int fasterFib(int n) {
ArrayList<Integer> results = new ArrayList<Integer>();
if (n <= 2) { // if
return 1;
} else if (results.size() <= n){ // If the list has fewer elems than
results.add(0, 1);
results.add(0, 1);
results.add(results.get(results.size() - 1 + results.get(results.size() - 2)));
return fasterFib(n); // not sure what to do with this yet
} else if (results.size() == n) { // base case if reached elems
return results.get(n);
}
return 0;
}
I think you want to use a Map<Integer, Integer> instead of a List. You should probably move that collection outside of your method (so it can cache the results) -
private static Map<Integer, Integer> results = new HashMap<>();
public static int fasterFib(int n) {
if (n == 0) {
return 0;
} else if (n <= 2) { // if
return 1;
}
if (results.get(n) != null) {
return results.get(n);
} else {
int v = fasterFib(n - 1) + fasterFib(n - 2);
results.put(n, v);
return v;
}
}
This optimization is called memoization, from the Wikipedia article -
In computing, memoization is an optimization technique used primarily to speed up computer programs by keeping the results of expensive function calls and returning the cached result when the same inputs occur again.
You can use Map::computeIfAbsent method (since 1.8) to re-use the already calculated numbers.
import java.util.HashMap;
import java.util.Map;
public class Fibonacci {
private final Map<Integer, Integer> cache = new HashMap<>();
public int fib(int n) {
if (n <= 2) {
return n;
} else {
return cache.computeIfAbsent(n, (key) -> fib(n - 1) + fib(n - 2));
}
}
}
The other way to do this is to use a helper method.
static private int fibonacci(int a, int b, int n) {
if(n == 0) return a;
else return fibonacci(b, a+b, n-1);
}
static public int fibonacci(int n) {
return fibonacci(0, 1, n);
}
How about a class and a private static HashMap?
import java.util.HashMap;
public class Fibonacci {
private static HashMap<Integer,Long> cache = new HashMap<Integer,Long>();
public Long get(Integer n) {
if ( n <= 2 ) {
return 1L;
} else if (cache.containsKey(n)) {
return cache.get(n);
} else {
Long result = get(n-1) + get(n-2);
cache.put(n, result);
System.err.println("Calculate once for " + n);
return result;
}
}
/**
* #param args
*/
public static void main(String[] args) {
Fibonacci f = new Fibonacci();
System.out.println(f.get(10));
System.out.println(f.get(15));
}
}
public class Fibonacci {
private Map<Integer, Integer> cache = new HashMap<>();
private void addToCache(int index, int value) {
cache.put(index, value);
}
private int getFromCache(int index) {
return cache.computeIfAbsent(index, this::fibonacci);
}
public int fibonacci(int i) {
if (i == 1)
addToCache(i, 0);
else if (i == 2)
addToCache(i, 1);
else
addToCache(i, getFromCache(i - 1) + getFromCache(i - 2));
return getFromCache(i);
}
}
You can use memoization (store the values you already have in an array, if the value at a given index of this array is not a specific value you have given to ignore --> return that).
Code:
public static void main(String[] args) {
Scanner s = new Scanner(System.in);
int n = Integer.parseInt(s.nextLine());
int[] memo = new int[n+1];
for (int i = 0; i < n+1 ; i++) {
memo[i] = -1;
}
System.out.println(fib(n,memo));
}
static int fib(int n, int[] memo){
if (n<=1){
return n;
}
if(memo[n] != -1){
return memo[n];
}
memo[n] = fib(n-1,memo) + fib(n-2,memo);
return memo[n];
}
Explaination:
memo :
-> int array (all values -1)
-> length (n+1) // easier for working on index
You assign a value to a given index of memo ex: memo[2]
memo will look like [-1,-1, 1, ..... ]
Every time you need to know the fib of 2 it will return memo[2] -> 1
Which saves a lot of computing time on bigger numbers.
private static Map<Integer, Integer> cache = new HashMap<Integer, Integer(){
{
put(0, 1);
put(1, 1);
}
};
/**
* Smallest fibonacci sequence program using dynamic programming.
* #param n
* #return
*/
public static int fibonacci(int n){
return n < 2 ? n : cache.computeIfAbsent(n, (key) -> fibonacci( n - 1) + fibonacci(n - 2));
}
public static long Fib(int n, Dictionary<int, long> dict)
{
if (n <= 1)
return n;
if (dict.ContainsKey(n))
return dict[n];
var value = Fib(n - 1,dict) + Fib(n - 2,dict);
dict[n] = value;
return value;
}

How to iteratively generate k elements subsets from a set of size n in java?

I'm working on a puzzle that involves analyzing all size k subsets and figuring out which one is optimal. I wrote a solution that works when the number of subsets is small, but it runs out of memory for larger problems. Now I'm trying to translate an iterative function written in python to java so that I can analyze each subset as it's created and get only the value that represents how optimized it is and not the entire set so that I won't run out of memory. Here is what I have so far and it doesn't seem to finish even for very small problems:
public static LinkedList<LinkedList<Integer>> getSets(int k, LinkedList<Integer> set)
{
int N = set.size();
int maxsets = nCr(N, k);
LinkedList<LinkedList<Integer>> toRet = new LinkedList<LinkedList<Integer>>();
int remains, thresh;
LinkedList<Integer> newset;
for (int i=0; i<maxsets; i++)
{
remains = k;
newset = new LinkedList<Integer>();
for (int val=1; val<=N; val++)
{
if (remains==0)
break;
thresh = nCr(N-val, remains-1);
if (i < thresh)
{
newset.add(set.get(val-1));
remains --;
}
else
{
i -= thresh;
}
}
toRet.add(newset);
}
return toRet;
}
Can anybody help me debug this function or suggest another algorithm for iteratively generating size k subsets?
EDIT: I finally got this function working, I had to create a new variable that was the same as i to do the i and thresh comparison because python handles for loop indexes differently.
First, if you intend to do random access on a list, you should pick a list implementation that supports that efficiently. From the javadoc on LinkedList:
All of the operations perform as could be expected for a doubly-linked
list. Operations that index into the list will traverse the list from
the beginning or the end, whichever is closer to the specified index.
An ArrayList is both more space efficient and much faster for random access. Actually, since you know the length beforehand, you can even use a plain array.
To algorithms: Let's start simple: How would you generate all subsets of size 1? Probably like this:
for (int i = 0; i < set.length; i++) {
int[] subset = {i};
process(subset);
}
Where process is a method that does something with the set, such as checking whether it is "better" than all subsets processed so far.
Now, how would you extend that to work for subsets of size 2? What is the relationship between subsets of size 2 and subsets of size 1? Well, any subset of size 2 can be turned into a subset of size 1 by removing its largest element. Put differently, each subset of size 2 can be generated by taking a subset of size 1 and adding a new element larger than all other elements in the set. In code:
processSubset(int[] set) {
int subset = new int[2];
for (int i = 0; i < set.length; i++) {
subset[0] = set[i];
processLargerSets(set, subset, i);
}
}
void processLargerSets(int[] set, int[] subset, int i) {
for (int j = i + 1; j < set.length; j++) {
subset[1] = set[j];
process(subset);
}
}
For subsets of arbitrary size k, observe that any subset of size k can be turned into a subset of size k-1 by chopping of the largest element. That is, all subsets of size k can be generated by generating all subsets of size k - 1, and for each of these, and each value larger than the largest in the subset, add that value to the set. In code:
static void processSubsets(int[] set, int k) {
int[] subset = new int[k];
processLargerSubsets(set, subset, 0, 0);
}
static void processLargerSubsets(int[] set, int[] subset, int subsetSize, int nextIndex) {
if (subsetSize == subset.length) {
process(subset);
} else {
for (int j = nextIndex; j < set.length; j++) {
subset[subsetSize] = set[j];
processLargerSubsets(set, subset, subsetSize + 1, j + 1);
}
}
}
Test code:
static void process(int[] subset) {
System.out.println(Arrays.toString(subset));
}
public static void main(String[] args) throws Exception {
int[] set = {1,2,3,4,5};
processSubsets(set, 3);
}
But before you invoke this on huge sets remember that the number of subsets can grow rather quickly.
You can use
org.apache.commons.math3.util.Combinations.
Example:
import java.util.Arrays;
import java.util.Iterator;
import org.apache.commons.math3.util.Combinations;
public class tmp {
public static void main(String[] args) {
for (Iterator<int[]> iter = new Combinations(5, 3).iterator(); iter.hasNext();) {
System.out.println(Arrays.toString(iter.next()));
}
}
}
Output:
[0, 1, 2]
[0, 1, 3]
[0, 2, 3]
[1, 2, 3]
[0, 1, 4]
[0, 2, 4]
[1, 2, 4]
[0, 3, 4]
[1, 3, 4]
[2, 3, 4]
Here is a combination iterator I wrote recetnly
package psychicpoker;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;
import static com.google.common.base.Preconditions.checkArgument;
public class CombinationIterator<T> implements Iterator<Collection<T>> {
private int[] indices;
private List<T> elements;
private boolean hasNext = true;
public CombinationIterator(List<T> elements, int k) throws IllegalArgumentException {
checkArgument(k<=elements.size(), "Impossible to select %d elements from hand of size %d", k, elements.size());
this.indices = new int[k];
for(int i=0; i<k; i++)
indices[i] = k-1-i;
this.elements = elements;
}
public boolean hasNext() {
return hasNext;
}
private int inc(int[] indices, int maxIndex, int depth) throws IllegalStateException {
if(depth == indices.length) {
throw new IllegalStateException("The End");
}
if(indices[depth] < maxIndex) {
indices[depth] = indices[depth]+1;
} else {
indices[depth] = inc(indices, maxIndex-1, depth+1)+1;
}
return indices[depth];
}
private boolean inc() {
try {
inc(indices, elements.size() - 1, 0);
return true;
} catch (IllegalStateException e) {
return false;
}
}
public Collection<T> next() {
Collection<T> result = new ArrayList<T>(indices.length);
for(int i=indices.length-1; i>=0; i--) {
result.add(elements.get(indices[i]));
}
hasNext = inc();
return result;
}
public void remove() {
throw new UnsupportedOperationException();
}
}
I've had the same problem today, of generating all k-sized subsets of a n-sized set.
I had a recursive algorithm, written in Haskell, but the problem required that I wrote a new version in Java.
In Java, I thought I'd probably have to use memoization to optimize recursion. Turns out, I found a way to do it iteratively. I was inspired by this image, from Wikipedia, on the article about Combinations.
Method to calculate all k-sized subsets:
public static int[][] combinations(int k, int[] set) {
// binomial(N, K)
int c = (int) binomial(set.length, k);
// where all sets are stored
int[][] res = new int[c][Math.max(0, k)];
// the k indexes (from set) where the red squares are
// see image above
int[] ind = k < 0 ? null : new int[k];
// initialize red squares
for (int i = 0; i < k; ++i) { ind[i] = i; }
// for every combination
for (int i = 0; i < c; ++i) {
// get its elements (red square indexes)
for (int j = 0; j < k; ++j) {
res[i][j] = set[ind[j]];
}
// update red squares, starting by the last
int x = ind.length - 1;
boolean loop;
do {
loop = false;
// move to next
ind[x] = ind[x] + 1;
// if crossing boundaries, move previous
if (ind[x] > set.length - (k - x)) {
--x;
loop = x >= 0;
} else {
// update every following square
for (int x1 = x + 1; x1 < ind.length; ++x1) {
ind[x1] = ind[x1 - 1] + 1;
}
}
} while (loop);
}
return res;
}
Method for the binomial:
(Adapted from Python example, from Wikipedia)
private static long binomial(int n, int k) {
if (k < 0 || k > n) return 0;
if (k > n - k) { // take advantage of symmetry
k = n - k;
}
long c = 1;
for (int i = 1; i < k+1; ++i) {
c = c * (n - (k - i));
c = c / i;
}
return c;
}
Of course, combinations will always have the problem of space, as they likely explode.
In the context of my own problem, the maximum possible is about 2,000,000 subsets. My machine calculated this in 1032 milliseconds.
Inspired by afsantos's answer :-)... I decided to write a C# .NET implementation to generate all subset combinations of a certain size from a full set. It doesn't need to calc the total number of possible subsets; it detects when it's reached the end. Here it is:
public static List<object[]> generateAllSubsetCombinations(object[] fullSet, ulong subsetSize) {
if (fullSet == null) {
throw new ArgumentException("Value cannot be null.", "fullSet");
}
else if (subsetSize < 1) {
throw new ArgumentException("Subset size must be 1 or greater.", "subsetSize");
}
else if ((ulong)fullSet.LongLength < subsetSize) {
throw new ArgumentException("Subset size cannot be greater than the total number of entries in the full set.", "subsetSize");
}
// All possible subsets will be stored here
List<object[]> allSubsets = new List<object[]>();
// Initialize current pick; will always be the leftmost consecutive x where x is subset size
ulong[] currentPick = new ulong[subsetSize];
for (ulong i = 0; i < subsetSize; i++) {
currentPick[i] = i;
}
while (true) {
// Add this subset's values to list of all subsets based on current pick
object[] subset = new object[subsetSize];
for (ulong i = 0; i < subsetSize; i++) {
subset[i] = fullSet[currentPick[i]];
}
allSubsets.Add(subset);
if (currentPick[0] + subsetSize >= (ulong)fullSet.LongLength) {
// Last pick must have been the final 3; end of subset generation
break;
}
// Update current pick for next subset
ulong shiftAfter = (ulong)currentPick.LongLength - 1;
bool loop;
do {
loop = false;
// Move current picker right
currentPick[shiftAfter]++;
// If we've gotten to the end of the full set, move left one picker
if (currentPick[shiftAfter] > (ulong)fullSet.LongLength - (subsetSize - shiftAfter)) {
if (shiftAfter > 0) {
shiftAfter--;
loop = true;
}
}
else {
// Update pickers to be consecutive
for (ulong i = shiftAfter+1; i < (ulong)currentPick.LongLength; i++) {
currentPick[i] = currentPick[i-1] + 1;
}
}
} while (loop);
}
return allSubsets;
}
This solution worked for me:
private static void findSubsets(int array[])
{
int numOfSubsets = 1 << array.length;
for(int i = 0; i < numOfSubsets; i++)
{
int pos = array.length - 1;
int bitmask = i;
System.out.print("{");
while(bitmask > 0)
{
if((bitmask & 1) == 1)
System.out.print(array[pos]+",");
bitmask >>= 1;
pos--;
}
System.out.print("}");
}
}
Swift implementation:
Below are two variants on the answer provided by afsantos.
The first implementation of the combinations function mirrors the functionality of the original Java implementation.
The second implementation is a general case for finding all combinations of k values from the set [0, setSize). If this is really all you need, this implementation will be a bit more efficient.
In addition, they include a few minor optimizations and a smidgin logic simplification.
/// Calculate the binomial for a set with a subset size
func binomial(setSize: Int, subsetSize: Int) -> Int
{
if (subsetSize <= 0 || subsetSize > setSize) { return 0 }
// Take advantage of symmetry
var subsetSizeDelta = subsetSize
if (subsetSizeDelta > setSize - subsetSizeDelta)
{
subsetSizeDelta = setSize - subsetSizeDelta
}
// Early-out
if subsetSizeDelta == 0 { return 1 }
var c = 1
for i in 1...subsetSizeDelta
{
c = c * (setSize - (subsetSizeDelta - i))
c = c / i
}
return c
}
/// Calculates all possible combinations of subsets of `subsetSize` values within `set`
func combinations(subsetSize: Int, set: [Int]) -> [[Int]]?
{
// Validate inputs
if subsetSize <= 0 || subsetSize > set.count { return nil }
// Use a binomial to calculate total possible combinations
let comboCount = binomial(setSize: set.count, subsetSize: subsetSize)
if comboCount == 0 { return nil }
// Our set of combinations
var combos = [[Int]]()
combos.reserveCapacity(comboCount)
// Initialize the combination to the first group of set indices
var subsetIndices = [Int](0..<subsetSize)
// For every combination
for _ in 0..<comboCount
{
// Add the new combination
var comboArr = [Int]()
comboArr.reserveCapacity(subsetSize)
for j in subsetIndices { comboArr.append(set[j]) }
combos.append(comboArr)
// Update combination, starting with the last
var x = subsetSize - 1
while true
{
// Move to next
subsetIndices[x] = subsetIndices[x] + 1
// If crossing boundaries, move previous
if (subsetIndices[x] > set.count - (subsetSize - x))
{
x -= 1
if x >= 0 { continue }
}
else
{
for x1 in x+1..<subsetSize
{
subsetIndices[x1] = subsetIndices[x1 - 1] + 1
}
}
break
}
}
return combos
}
/// Calculates all possible combinations of subsets of `subsetSize` values within a set
/// of zero-based values for the set [0, `setSize`)
func combinations(subsetSize: Int, setSize: Int) -> [[Int]]?
{
// Validate inputs
if subsetSize <= 0 || subsetSize > setSize { return nil }
// Use a binomial to calculate total possible combinations
let comboCount = binomial(setSize: setSize, subsetSize: subsetSize)
if comboCount == 0 { return nil }
// Our set of combinations
var combos = [[Int]]()
combos.reserveCapacity(comboCount)
// Initialize the combination to the first group of elements
var subsetValues = [Int](0..<subsetSize)
// For every combination
for _ in 0..<comboCount
{
// Add the new combination
combos.append([Int](subsetValues))
// Update combination, starting with the last
var x = subsetSize - 1
while true
{
// Move to next
subsetValues[x] = subsetValues[x] + 1
// If crossing boundaries, move previous
if (subsetValues[x] > setSize - (subsetSize - x))
{
x -= 1
if x >= 0 { continue }
}
else
{
for x1 in x+1..<subsetSize
{
subsetValues[x1] = subsetValues[x1 - 1] + 1
}
}
break
}
}
return combos
}

Find an array inside another larger array

I was recently asked to write 3 test programs for a job. They would be written using just core Java API's and any test framework of my choice. Unit tests should be implemented where appropriate.
Although I haven't received any feedback at all, I suppose they didn't like my solutions (otherwise I would have heard from them), so I decided to show my programs here and ask if this implementation can be considered good, and, if not, then why?
To avoid confusion, I'll ask only first one for now.
Implement a function that finds an
array in another larger array. It
should accept two arrays as parameters
and it will return the index of the
first array where the second array
first occurs in full. Eg,
findArray([2,3,7,1,20], [7,1]) should
return 2.
I didn't try to find any existing solution, but instead wanted to do it myself.
Possible reasons:
1. Should be static.
2. Should use line comments instead of block ones.
3. Didn't check for null values first (I know, just spotted too late).
4. ?
UPDATE:
Quite a few reasons have been presented, and it's very difficult for me to choose one answer as many answers have a good solution. As #adietrich mentioned, I tend to believe they wanted me to demonstrate knowledge of core API (they even asked to write a function, not to write an algorithm).
I believe the best way to secure the job was to provide as many solutions as possible, including:
1. Implementation using Collections.indexOfSubList() method to show that I know core collections API.
2. Implement using brute-force approach, but provide a more elegant solution.
3. Implement using a search algorithm, for example Boyer-Moore.
4. Implement using combination of System.arraycopy() and Arrays.equal(). However not the best solution in terms of performance, it would show my knowledge of standard array routines.
Thank you all for your answers!
END OF UPDATE.
Here is what I wrote:
Actual program:
package com.example.common.utils;
/**
* This class contains functions for array manipulations.
*
* #author Roman
*
*/
public class ArrayUtils {
/**
* Finds a sub array in a large array
*
* #param largeArray
* #param subArray
* #return index of sub array
*/
public int findArray(int[] largeArray, int[] subArray) {
/* If any of the arrays is empty then not found */
if (largeArray.length == 0 || subArray.length == 0) {
return -1;
}
/* If subarray is larger than large array then not found */
if (subArray.length > largeArray.length) {
return -1;
}
for (int i = 0; i < largeArray.length; i++) {
/* Check if the next element of large array is the same as the first element of subarray */
if (largeArray[i] == subArray[0]) {
boolean subArrayFound = true;
for (int j = 0; j < subArray.length; j++) {
/* If outside of large array or elements not equal then leave the loop */
if (largeArray.length <= i+j || subArray[j] != largeArray[i+j]) {
subArrayFound = false;
break;
}
}
/* Sub array found - return its index */
if (subArrayFound) {
return i;
}
}
}
/* Return default value */
return -1;
}
}
Test code:
package com.example.common.utils;
import com.example.common.utils.ArrayUtils;
import junit.framework.TestCase;
public class ArrayUtilsTest extends TestCase {
private ArrayUtils arrayUtils = new ArrayUtils();
public void testFindArrayDoesntExist() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {8,9,10};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistSimple() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {3,4,5};
int expected = 2;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistFirstPosition() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {1,2,3};
int expected = 0;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistLastPosition() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {5,6,7};
int expected = 4;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayDoesntExistPartiallyEqual() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {6,7,8};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistPartiallyEqual() {
int[] largeArray = {1,2,3,1,2,3,4,5,6,7};
int[] subArray = {1,2,3,4};
int expected = 3;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArraySubArrayEmpty() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArraySubArrayLargerThanArray() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {4,5,6,7,8,9,10,11};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistsVeryComplex() {
int[] largeArray = {1234, 56, -345, 789, 23456, 6745};
int[] subArray = {56, -345, 789};
int expected = 1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
}
The requirement of "using just core Java API's" could also mean that they wanted to see whether you would reinvent the wheel. So in addition to your own implementation, you could give the one-line solution, just to be safe:
public static int findArray(Integer[] array, Integer[] subArray)
{
return Collections.indexOfSubList(Arrays.asList(array), Arrays.asList(subArray));
}
It may or may not be a good idea to point out that the example given contains invalid array literals.
Clean and improved code
public static int findArrayIndex(int[] subArray, int[] parentArray) {
if(subArray.length==0){
return -1;
}
int sL = subArray.length;
int l = parentArray.length - subArray.length;
int k = 0;
for (int i = 0; i < l; i++) {
if (parentArray[i] == subArray[k]) {
for (int j = 0; j < subArray.length; j++) {
if (parentArray[i + j] == subArray[j]) {
sL--;
if (sL == 0) {
return i;
}
}
}
}
}
return -1;
}
For finding an array of integers in a larger array of integers, you can use the same kind of algorithms as finding a substring in a larger string. For this there are many algorithms known (see Wikipedia). Especially the Boyer-Moore string search is efficient for large arrays. The algorithm that you are trying to implement is not very efficient (Wikipedia calls this the 'naive' implementation).
For your questions:
Yes, such a method should be static
Don't care, that's a question of taste
The null check can be included, or you should state in the JavaDoc that null values are not allowed, or JavaDoc should state that when either parameter is null a NullPointerException will be thrown.
Well, off the top of my head:
Yes, should be static.
A company complaining about that would not be worth working for.
Yeah, but what would you do? Return? Or throw an exception? It'll throw an exception the way it is already.
I think the main problem is that your code is not very elegant. Too many checks in the inner loop. Too many redundant checks.
Just raw, off the top of my head:
public int findArray(int[] largeArray, int[] subArray) {
int subArrayLength = subArray.length;
if (subArrayLength == 0) {
return -1;
}
int limit = largeArray.length - subArrayLength;
int i=0;
for (int i = 0; i <= limit; i++) {
boolean subArrayFound = true;
for (int j = 0; j < subArrayLength; j++) {
if (subArray[j] != largeArray[i+j]) {
subArrayFound = false;
break;
}
/* Sub array found - return its index */
if (subArrayFound) {
return i;
}
}
/* Return default value */
return -1;
}
You could keep that check for the first element so you don't have the overhead of setting up the boolean and the for loop for every single element in the array. Then you'd be looking at
public int findArray(int[] largeArray, int[] subArray) {
int subArrayLength = subArray.length;
if (subArrayLength == 0) {
return -1;
}
int limit = largeArray.length - subArrayLength;
for (int i = 0; i <= limit; i++) {
if (subArray[0] == largeArray[i]) {
boolean subArrayFound = true;
for (int j = 1; j < subArrayLength; j++) {
if (subArray[j] != largeArray[i+j]) {
subArrayFound = false;
break;
}
/* Sub array found - return its index */
if (subArrayFound) {
return i;
}
}
}
/* Return default value */
return -1;
}
Following is an approach using KMP pattern matching algorithm. This solution takes O(n+m). Where n = length of large array and m = length of sub array. For more information, check:
https://en.wikipedia.org/wiki/KMP_algorithm
Brute force takes O(n*m). I just checked that Collections.indexOfSubList method is also O(n*m).
public static int subStringIndex(int[] largeArray, int[] subArray) {
if (largeArray.length == 0 || subArray.length == 0){
throw new IllegalArgumentException();
}
if (subArray.length > largeArray.length){
throw new IllegalArgumentException();
}
int[] prefixArr = getPrefixArr(subArray);
int indexToReturn = -1;
for (int m = 0, s = 0; m < largeArray.length; m++) {
if (subArray[s] == largeArray[m]) {
s++;
} else {
if (s != 0) {
s = prefixArr[s - 1];
m--;
}
}
if (s == subArray.length) {
indexToReturn = m - subArray.length + 1;
break;
}
}
return indexToReturn;
}
private static int[] getPrefixArr(int[] subArray) {
int[] prefixArr = new int[subArray.length];
prefixArr[0] = 0;
for (int i = 1, j = 0; i < prefixArr.length; i++) {
while (subArray[i] != subArray[j]) {
if (j == 0) {
break;
}
j = prefixArr[j - 1];
}
if (subArray[i] == subArray[j]) {
prefixArr[i] = j + 1;
j++;
} else {
prefixArr[i] = j;
}
}
return prefixArr;
}
A little bit optimized code that was posted before:
public int findArray(byte[] largeArray, byte[] subArray) {
if (subArray.length == 0) {
return -1;
}
int limit = largeArray.length - subArray.length;
next:
for (int i = 0; i <= limit; i++) {
for (int j = 0; j < subArray.length; j++) {
if (subArray[j] != largeArray[i+j]) {
continue next;
}
}
/* Sub array found - return its index */
return i;
}
/* Return default value */
return -1;
}
int findSubArr(int[] arr,int[] subarr)
{
int lim=arr.length-subarr.length;
for(int i=0;i<=lim;i++)
{
int[] tmpArr=Arrays.copyOfRange(arr,i,i+subarr.length);
if(Arrays.equals(tmpArr,subarr))
return i; //returns starting index of sub array
}
return -1;//return -1 on finding no sub-array
}
UPDATE:
By reusing the same int array instance:
int findSubArr(int[] arr,int[] subarr)
{
int lim=arr.length-subarr.length;
int[] tmpArr=new int[subarr.length];
for(int i=0;i<=lim;i++)
{
System.arraycopy(arr,i,tmpArr,0,subarr.length);
if(Arrays.equals(tmpArr,subarr))
return i; //returns starting index of sub array
}
return -1;//return -1 on finding no sub-array
}
I would suggest the following improvements:
make the function static so that you can avoid creating an instance
the outer loop condition could be i <= largeArray.length-subArray.length, to avoid a test inside the loop
remove the test (largeArray[i] == subArray[0]) that is redundant
Here's #indexOf from String:
/**
* Code shared by String and StringBuffer to do searches. The
* source is the character array being searched, and the target
* is the string being searched for.
*
* #param source the characters being searched.
* #param sourceOffset offset of the source string.
* #param sourceCount count of the source string.
* #param target the characters being searched for.
* #param targetOffset offset of the target string.
* #param targetCount count of the target string.
* #param fromIndex the index to begin searching from.
*/
static int indexOf(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount,
int fromIndex) {
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (targetCount == 0) {
return fromIndex;
}
char first = target[targetOffset];
int max = sourceOffset + (sourceCount - targetCount);
for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);
}
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j]
== target[k]; j++, k++);
if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}
}
return -1;
}
First to your possible reasons:
Yes. And the class final with a private constructor.
Shouldn't use this kind of comments at all. The code should be self-explanatory.
You're basically implicitly checking for null by accessing the length field which will throw a NullPointerException. Only in the case of a largeArray.length == 0 and a subArray == null will this slip through.
More potential reasons:
The class doesn't contain any function for array manipulations, opposed to what the documentation says.
The documentation for the method is very sparse. It should state when and which exceptions are thrown (e.g. NullPointerException) and which return value to expect if the second array isn't found or if it is empty.
The code is more complex than needed.
Why is the equality of the first elements so important that it gets its own check?
In the first loop, it is assumed that the second array will be found, which is unintentional.
Unneeded variable and jump (boolean and break), further reducing legibility.
largeArray.length <= i+j is not easy to grasp. Should be checked before the loop, improving the performance along the way.
I'd swap the operands of subArray[j] != largeArray[i+j]. Seems more natural to me.
All in all too long.
The test code is lacking more edge cases (null arrays, first array empty, both arrays empty, first array contained in second array, second array contained multiple times etc.).
Why is the last test case named testFindArrayExistsVeryComplex?
What the exercise is missing is a specification of the component type of the array parameters, respectively the signature of the method. It makes a huge difference whether the component type is a primitive type or a reference type. The solution of adietrich assumes a reference type (thus could be generified as further improvement), mine assumes a primitive type (int).
So here's my shot, concentrating on the code / disregarding documentation and tests:
public final class ArrayUtils {
// main method
public static int indexOf(int[] haystack, int[] needle) {
return indexOf(haystack, needle, 0);
}
// helper methods
private static int indexOf(int[] haystack, int[] needle, int fromIndex) {
for (int i = fromIndex; i < haystack.length - needle.length; i++) {
if (containsAt(haystack, needle, i)) {
return i;
}
}
return -1;
}
private static boolean containsAt(int[] haystack, int[] needle, int offset) {
for (int i = 0; i < needle.length; i++) {
if (haystack[i + offset] != needle[i]) {
return false;
}
}
return true;
}
// prevent initialization
private ArrayUtils() {}
}
byte[] arr1 = {1, 2, 3, 4, 5, 6, 7, 7, 8, 9, 1, 3, 4, 56, 6, 7};
byte[] arr2 = {9, 1, 3};
boolean i = IsContainsSubArray(arr1, arr2);
public static boolean IsContainsSubArray(byte[] Large_Array, byte[] Sub_Array){
try {
int Large_Array_size, Sub_Array_size, k = 0;
Large_Array_size = Large_Array.length;
Sub_Array_size = Sub_Array.length;
if (Sub_Array_size > Large_Array_size) {
return false;
}
for (int i = 0; i < Large_Array_size; i++) {
if (Large_Array[i] == Sub_Array[k]) {
k++;
} else {
k = 0;
}
if (k == Sub_Array_size) {
return true;
}
}
} catch (Exception e) {
}
return false;
}
Code from Guava:
import javax.annotation.Nullable;
/**
* Ensures that an object reference passed as a parameter to the calling method is not null.
*
* #param reference an object reference
* #param errorMessage the exception message to use if the check fails; will be converted to a
* string using {#link String#valueOf(Object)}
* #return the non-null reference that was validated
* #throws NullPointerException if {#code reference} is null
*/
public static <T> T checkNotNull(T reference, #Nullable Object errorMessage) {
if (reference == null) {
throw new NullPointerException(String.valueOf(errorMessage));
}
return reference;
}
/**
* Returns the start position of the first occurrence of the specified {#code
* target} within {#code array}, or {#code -1} if there is no such occurrence.
*
* <p>More formally, returns the lowest index {#code i} such that {#code
* java.util.Arrays.copyOfRange(array, i, i + target.length)} contains exactly
* the same elements as {#code target}.
*
* #param array the array to search for the sequence {#code target}
* #param target the array to search for as a sub-sequence of {#code array}
*/
public static int indexOf(int[] array, int[] target) {
checkNotNull(array, "array");
checkNotNull(target, "target");
if (target.length == 0) {
return 0;
}
outer:
for (int i = 0; i < array.length - target.length + 1; i++) {
for (int j = 0; j < target.length; j++) {
if (array[i + j] != target[j]) {
continue outer;
}
}
return i;
}
return -1;
}
I would to do it in three ways:
Using no imports i.e. using plain Java statements.
Using JAVA core APIs - to some extent or to much extent.
Using string pattern search algorithms like KMP etc. (Probably the most optimized one.)
1,2 and 3 are all shown above in the answers. Here is approach 2 from my side:
public static void findArray(int[] array, int[] subArray) {
if (subArray.length > array.length) {
return;
}
if (array == null || subArray == null) {
return;
}
if (array.length == 0 || subArray.length == 0) {
return;
}
//Solution 1
List<Integer> master = Arrays.stream(array).boxed().collect(Collectors.toList());
List<Integer> pattern = IntStream.of(subArray).boxed().collect(Collectors.toList());
System.out.println(Collections.indexOfSubList(master, pattern));
//Solution2
for (int i = 0; i <= array.length - subArray.length; i++) {
String s = Arrays.toString(Arrays.copyOfRange(array, i, i + subArray.length));
if (s.equals(Arrays.toString(subArray))) {
System.out.println("Found at:" + i);
return;
}
}
System.out.println("Not found.");
}
Using java 8 and lambda expressions:
String[] smallArray = {"1","2","3"};
final String[] bigArray = {"0","1","2","3","4"};
boolean result = Arrays.stream(smallArray).allMatch(s -> Arrays.stream(bigArray).anyMatch(b -> b.equals(s)));
PS: is important to have finalString[] bigArray for enclosing space of lambda expression.
FYI: if the goal is simply to search wether an array y is a subset of an array x, we can use this:
val x = Array(1,2,3,4,5)
val y = Array(3,4,5)
val z = Array(3,4,8)
x.containsSlice(y) // true
x.containsSlice(z) // false

Categories