Median of medians java implementation

Median of medians java implementation - java

I implemented Median of medians selection algorithm based on algs4 quickselect using the Wikipedia article, but my code doesn't work well:
1) it is said that median of medians finds kth largest element. However, my code finds kth smallest element.
2) my implementation runs 1-20 times slower than quickselect, but the median of medians algorithm should be asymptotically faster.
I've checked everything several times, but I cannot find the issue.
public class MedianOfMedians {
public static Comparable medianOfMedians(Comparable[] nums, int k) {
return nums[select(nums, 0, nums.length - 1, k)];
}
private static int select(Comparable[] nums, int lo, int hi, int k) {
while (lo < hi) {
int pivotIndex = pivot(nums, lo, hi);
int j = partition(nums, lo, hi, pivotIndex);
if (j < k) {
lo = j + 1;
} else if (j > k) {
hi = j - 1;
} else {
return j;
}
}
return lo;
}
private static int pivot(Comparable[] list, int left, int right) {
// for 5 or less elements just get median
if (right - left < 5) {
return partition5(list, left, right);
}
// otherwise move the medians of five-element subgroups to the first n/5 positions
for (int i = left; i <= right; i += 5) {
// get the median of the i'th five-element subgroup
int subRight = i + 4;
if (subRight > right) {
subRight = right;
}
int median5 = partition5(list, i, subRight);
exch(list, median5, (int) (left + Math.floor((i - left) / 5d)));
}
// compute the median of the n/5 medians-of-five
return select(list,
left,
(int) (left + Math.ceil((right - left) / 5d) - 1),
(int) (left + (right - left) / 10d));
}
private static int partition5(Comparable[] list, int lo, int hi) {
for (int i = lo; i <= hi; i++) {
for (int j = i; j > lo; j--) {
if (less(list[j - 1], list[j])) {
exch(list, j, j - 1);
}
}
}
return (hi + lo) / 2;
}
private static int partition(Comparable[] a, int lo, int hi, int pivotIndex) {
exch(a, lo, pivotIndex);
int i = lo;
int j = hi + 1;
Comparable v = a[lo];
while (true) {
while (less(a[++i], v) && i != hi) { }
while (less(v, a[--j]) && j != lo) { }
if (j <= i) break;
exch(a, i, j);
}
exch(a, j, lo);
return j;
}
private static void exch(Comparable[] nums, int i, int j) { }
private static boolean less(Comparable v, Comparable w) { }
}
JUnit test:
public class MedianOfMediansTest {
private final static int TESTS_COUNT = 100;
#org.junit.Test
public void test() {
// generate TESTS_COUNT arrays of 10000 entries from 0..Integer.MAX_VALUE
Integer[][] tests = generateTestComparables(TESTS_COUNT, 10000, 10000, 0, Integer.MAX_VALUE);
for (int i = 0; i < tests.length; i++) {
Integer[] array1 = Arrays.copyOf(tests[i], tests[i].length);
Integer[] array2 = Arrays.copyOf(tests[i], tests[i].length);
Integer[] array3 = Arrays.copyOf(tests[i], tests[i].length);
long time = System.nanoTime();
final int a = (Integer) MedianOfMedians.medianOfMedians(array1, 0);
long nanos_a = System.nanoTime() - time;
time = System.nanoTime();
final int b = (Integer) Quick.select(array2, 0);
long nanos_b = System.nanoTime() - time;
time = System.nanoTime();
Arrays.sort(array3);
final int c = array3[0];
long nanos_c = System.nanoTime() - time;
System.out.println("MedianOfMedians: " + a + " (" + nanos_a + ") " +
"QuickSelect: " + b + " (" + nanos_b + ") " +
"Arrays.sort: " + c + " (" + nanos_c + ")");
System.out.println(((double) nanos_a) / ((double) nanos_b));
Assert.assertEquals(c, a);
Assert.assertEquals(b, a);
}
}
public static Integer[][] generateTestComparables(int numberOfTests,
int arraySizeMin, int arraySizeMax,
int valueMin, int valueMax) {
Random rand = new Random(System.currentTimeMillis());
Integer[][] ans = new Integer[numberOfTests][];
for (int i = 0; i < ans.length; i++) {
ans[i] = new Integer[randInt(rand, arraySizeMin, arraySizeMax)];
for (int j = 0; j < ans[i].length; j++) {
ans[i][j] = randInt(rand, valueMin, valueMax);
}
}
return ans;
}
public static int randInt(Random rand, int min, int max) {
return (int) (min + (rand.nextDouble() * ((long) max - (long) min)));
}
}

1) it is said that median of medians finds kth largest element.
However, my code finds kth smallest element.
This is not strictly true. Any selection algorithm can find either smallest or largest element because that's essentially the same task. It depends on how you compare elements and how you partition them (and you can always do something like length - 1 - result later). Your code indeed seems to find the kth smallest element, which is by the way the most typical and intuitive way of implementing a selection algorithm.
2) my implementation runs 1-20 times slower than quickselect, but the
median of medians algorithm should be asymptotically faster.
Not just asymptotically faster. Asymptotically faster in the worst case. In the average case, both are linear, but MoM has higher constant factors. Since you generate your tests randomly, you are very unlikely to hit the worst case. If you used randomized quickselect, then for any input it's unlikely to hit the worst case, otherwise the probability will depend on the pivot selection algorithm used.
With that in mind, and the fact that median of medians has high constant factors, you should not expect it to perform better than quickselect! It might outperform sorting, though, but even then—those logarithmic factors in sorting aren't that large for small inputs (lg 10000 is about 13-14).
Take my MoM solution for a LeetCode problem, for example. Arrays.sort sometimes outperforms MoM for arrays with 500 million elements. In the best case it runs about twice faster, though.
Therefore, MoM is mostly of theoretical interest. I could imagine a practical use case when you need 100% guarantee of not exceeding some time limit. Say, some real-time system on an aircraft, or spacecraft, or nuclear reactor. The time limit is not very tight, but exceeding it even by one nanosecond is catastrophic. But it's an extremely contrived example, and I doubt that it's actually the way it works.
Even if you can find a practical use case for MoM, you can probably use something like Introselect instead. It essentially starts with quickselect, and then switches to MoM if things don't look good. But testing it would be a nightmare—how would you come up with a test that actually forces the algorithm to switch (and therefore test the MoM part), especially if it's randomized?
Your code looks fine overall, but I'd make some helper methods package-private or even moved them to another class to test separately because such things are very hard to get right. And you may not notice the effect if the result is right. I'm not sure that your groups-of-five code is 100% correct, for example. Sometimes you use right - left where I'd expect to see element count, which should be right - left + 1.
Also, I would replace those ceil/floor calls with pure integer arithmetic equivalents. That is, Math.floor((i - left) / 5d)) => (i - left) / 5, Math.ceil((right - left) / 5d) => (right - left + 4) / 5 (this is the part where I don't like the right - left thing, by the way, but I'm not sure if it's wrong).

Related

java.lang.OutOfMemoryError: Java heap space while solving leetcode question

https://leetcode.com/problems/k-th-symbol-in-grammar/
I was solving the above leetcode question, here is my solution it runs perfectly except for the test case where n = 30, k = 434991989 in which it shows java.lang.OutOfMemoryError: Java heap space
public class kthGrammer{
public static void rowGenerator(int n, int[] row, int num){
if(n == num)
return;
int start = (row.length / 2) - (int)Math.pow(2, num - 1);
int pStart = (row.length / 2) - (int)Math.round(Math.pow(2, num - 3));
while(pStart <= (row.length / 2) + (int)Math.pow(2, num - 3)){
if(row[pStart] == 0){
row[start++] = 0;
row[start++] = 1;
}
else{
row[start++] = 1;
row[start++] = 0;
}
++pStart;
}
rowGenerator(n, row, num + 1);
return;
}
public static int kthGrammar(int n, int k) {
int[] row = new int[(int)Math.pow(2,n - 1)];
row[row.length / 2] = 0;
rowGenerator(n, row, 1);
return row[k - 1];
}
public static void main(String[] args) {
System.out.println("\nAnswer: " + kthGrammar(30, 434991989));
// System.out.println("\nAnswer: " + kthGrammar(2, 1));
// System.out.println("\nAnswer: " + kthGrammar(2, 2));
// System.out.println("\nAnswer: " + kthGrammar(3, 1));
}
}

Resources like LeetCode design their questions in the way that solution can rarely be achieved by using straight-forward approach (due to memory or CPU limitations), so some algorithmic research should be done to get the optimal solution. In your code you generate the whole row set, which is kinda big - 30th row contains 2^30 elements, 29th row contains 2^29 elements, and so on. Moreover if N would be e.g. 1000 than the whole structure wouldn't fit into the memory of entire computer cluster. That's why you get OutOfMemoryError
I can just give you a hint:
The idea behind this algorithm is that each row is twice bigger than the previous. So K-th element in R-th row is a "parent" of 2 elements in next row (R+1-th), and that elements have 2K-1 and 2K indices. This forms a pattern, so you can iterate backwards from Nth row by dividing current K by 2 each time until you reach 1st row, and doing some checks.

transforming an iterative function to recursive in java

i want to transform this function to recursive form could anyone help me thx
that function is to solve this stuff
X=1+(1+2)*2+(1+2+3)*2^2+(1+2+3+4)*2^3+ . . . +(1+2+3+4+. . . +n)*2^(n-1)
public static int calcX(int n) {
int x=1;
int tmp;
for(int i = 1 ; i <= n-1;i++) {
tmp=0;
for(int j = 1 ; j <= i + 1;j++) {
tmp+=j;
}
x+=tmp*Math.pow(2, i);
}
return x;
}
my attempt im new to recursive stuff
public static int calcXrecu(int n,int tmp,int i,int j) {
int x=1;
if(i <= n-1) {
if(j <= i) {
calcXrecu(n,tmp+j,i,j+1);
}
else {
x = (int) (tmp*Math.pow(2, i));
}
}
else {
x=1;
}
return x;
}

You have a sequence of sums which themselves are sums.
The nth term can be derived from the (n-1)th term like this:
a(n) = a(n-1) + (1+2+3+....+n) * 2^(n-1) [1]
and this is the recursive formula because it produces each term via the previous term.
Now you need another formula (high school math) for the sum of 1+2+3+....+n:
1+2+3+....+n = n * (n + 1) / 2 [2]
Now use [2] in [1]:
a(n) = a(n-1) + n * (n + 1) * 2^(n-2) [3]
so you have a formula with which you can derive each term from the previous term and this is all you need for your recursive method:
public static int calcXrecu(int n) {
if (n == 1) return 1;
return calcXrecu(n - 1) + n * (n + 1) * (int) Math.pow(2, n - 2);
}
This line:
if (n == 1) return 1;
is the exit point of the recursion.
Note that Math.pow(2, n - 2) needs to be converted to int because it returns Double.

In addition to #forpas answer, I also want to provide a solution using corecursion by utilizing Stream.iterate. This is obviously not a recursive solution, but I think it is good to know alternatives as well. Note that I use a Pair to represent the tuple of (index, value).
public static int calcXcorecu(final int n) {
return Stream.iterate(
Pair.of(1, 1), p -> {
final int index = p.getLeft();
final int prev = p.getRight();
final int next = prev + index * (index + 1) * (int) Math.pow(2, index - 2);
return Pair.of(index + 1, next);
})
// only need the n-th element
.skip(n)
.limit(1)
.map(Pair::getRight)
.findFirst()
.get();
}

optimize java method (finding all ways to reach a point on basis of dice)

i made this method to solve problem in which i need to cover a distance by taking a step from (1-6) as per dice and compute all possible ways to reach distance
i made this method
static int watchCount(int distance)
{
// Base cases
if (distance<0) return 0;
if (distance==0) return 1;
return watchCount(distance-1) +
watchCount(distance-2) +
watchCount(distance-3)+
watchCount(distance-4) +
watchCount(distance-5)+
watchCount(distance-6);
}
but for large values like >500 this method is taking very long any help to optimize would be appreciated.
thanks

You can use a cache like this (the same idea to #PiotrWilkin):
static int watchCount(int distance, Integer[] cache) {
// Base cases
if (distance < 0) {
return 0;
}
if (distance == 0) {
return 1;
}
if (cache[distance-1] == null) {
cache[distance-1] = watchCount(distance - 1, cache)
+ watchCount(distance - 2, cache)
+ watchCount(distance - 3, cache)
+ watchCount(distance - 4, cache)
+ watchCount(distance - 5, cache)
+ watchCount(distance - 6, cache);
}
return cache[distance-1];
}
EDIT iterative implementation:
public static int iterativeWatchCount(int n) {
if (n < 0) {
return 0;
}
int index = 0;
int[] cache = new int[6];
cache[cache.length - 1] = 1;
int sum = 1;
for (int i = 0; i < n; i++, index = (index + 1) % cache.length) {
sum = cache[0] + cache[1] + cache[2] + cache[3] + cache[4] + cache[5];
cache[index] = sum;
}
return sum;
}

This is a classical problem for dynamic programming. Create an array of size n (where n is the number you're looking for) and work your way back, updating the array by incrementing the number of ways to obtain the value. This way, you can do it in O(n) complexity (currently the complexity is exponential).

Combinatory issue due to Factorial overflow

I need a function which can calculate the mathematical combination of (n, k) for a card game.
My current attempt is to use a function based on usual Factorial method :
static long Factorial(long n)
{
return n < 2 ? 1 : n * Factorial(n - 1);
}
static long Combinatory(long n , long k )
{
return Factorial(n) / (Factorial(k) * Factorial(n - k));
}
It's working very well but the matter is when I use some range of number (n value max is 52 and k value max is 4), it keeps me returning a wrong value. E.g :
long comb = Combinatory(52, 2) ; // return 1 which should be actually 1326
I know that it's because I overflow the long when I make Factorial(52) but the range result I need is not as big as it seems.
Is there any way to get over this issue ?

Instead of using the default combinatory formula n! / (k! x (n - k)!), use the recursive property of the combinatory function.
(n, k) = (n - 1, k) + (n - 1, k - 1)
Knowing that : (n, 0) = 1 and (n, n) = 1.
-> It will make you avoid using factorial and overflowing your long.
Here is sample of implementation you can do :
static long Combinatory(long n, long k)
{
if (k == 0 || n == k )
return 1;
return Combinatory(n - 1, k) + Combinatory(n - 1, k - 1);
}
EDIT : With a faster iterative algorithm
static long Combinatory(long n, long k)
{
if (n - k < k)
k = n - k;
long res = 1;
for (int i = 1; i <= k; ++i)
{
res = (res * (n - i + 1)) / i;
}
return res;
}

In C# you can use BigInteger (I think there's a Java equivalent).
e.g.:
static long Combinatory(long n, long k)
{
return (long)(Factorial(new BigInteger(n)) / (Factorial(new BigInteger(k)) * Factorial(new BigInteger(n - k))));
}
static BigInteger Factorial(BigInteger n)
{
return n < 2 ? 1 : n * Factorial(n - 1);
}
You need to add a reference to System.Numerics to use BigInteger.

If this is not for a homework assignment, there is an efficient implementation in Apache's commons-math package
http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/util/ArithmeticUtils.html#binomialCoefficientDouble%28int,%20int%29
If it is for a homework assignment, start avoiding factorial in your implementation.
Use the property that (n, k) = (n, n-k) to rewrite your choose using the highest value for k.
Then note that you can reduce n!/k!(n-k)! to n * n-1 * n-2 .... * k / (n-k) * (n-k-1) ... * 1 means that you are multiplying every number from [k, n] inclusive, then dividing by every number [1,n-k] inclusive.
// From memory, please verify correctness independently before trusting its use.
//
public long choose(n, k) {
long kPrime = Math.max(k, n-k);
long returnValue = 1;
for(i = kPrime; i <= n; i++) {
returnValue *= i;
}
for(i = 2; i <= n - kPrime; i++) {
returnValue /= i;
}
return returnValue;
}
Please double check the maths, but this is a basic idea you could go down to get a reasonably efficient implementation that will work for numbers up to a poker deck.

The recursive formula is also known as Pascal's triangle, and IMO it's the easiest way to calculate combinatorials. If you're only going to need C(52,k) (for 0<=k<=52) I think it would be best to fill a table with them at program start. The following C code fills a table using this method:
static int64_t* pascals_triangle( int N)
{
int n,k;
int64_t* C = calloc( N+1, sizeof *C);
for( n=0; n<=N; ++n)
{ C[n] = 1;
for( k=n-1; k>0; --k)
{ C[k] += C[k-1];
}
}
return C;
}
After calling this with N=52, for example returns, C[k] will hold C(52,k) for k=0..52

Count the number of occurrences of a number in a sorted array

My teacher gave me the next task:
On a sorted array, find the number of occurrences of a number.
The complexity of the algorithm must be as small as possible.
This is what I have thought of:
public static int count(int[] a, int x)
{
int low = 0, high = a.length - 1;
while( low <= high )
{
int middle = low + (high - low) / 2;
if( a[middle] > x ) {
// Continue searching the lower part of the array
high = middle - 1;
} else if( a[middle] < x ) {
// Continue searching the upper part of the array
low = middle + 1;
} else {
// We've found the array index of the value
return x + SearchLeft(arr, x, middle) + SearchRight(arr, x, middle);
}
}
return 0;
}
SearchLeft and SearchRight iterate the array, until the number doesn't show.
I'm not sure if I have achieved writing the faster code for this problem, and I would like see other opinions.
Edit: After some help from comments and answers, this is my current attempt:
public static int count(int[] array, int value)
{
return SearchRightBound(array, value) - SearchLeftBound(array, value);
}
public static int SearchLeftBound(int[] array, int value)
{
int low = 0, high = array.length - 1;
while( low < high )
{
int middle = low + (high - low) / 2;
if(array[middle] < value) {
low = middle + 1;
}
else {
high = middle;
}
}
return low;
}
public static int SearchRightBound(int[] array, int value)
{
int low = 0, high = array.length - 1;
while( low < high )
{
int middle = low + (high - low) / 2;
if(array[middle] > value) {
high = middle;
}
else {
low = middle + 1;
}
}
return low;
}

SearchLeft and SearchRight iterate the array, until the number doesn't show.
That means if the entire array is filled with the target value, your algorithm is O(n).
You can make it O(log n) worst case if you binary-search for the first and for the last occurrence of x.
// search first occurrence
int low = 0, high = a.length - 1;
while(low < high) {
int middle = low + (high-low)/2;
if (a[middle] < x) {
// the first occurrence must come after index middle, if any
low = middle+1;
} else if (a[middle] > x) {
// the first occurrence must come before index middle if at all
high = middle-1;
} else {
// found an occurrence, it may be the first or not
high = middle;
}
}
if (high < low || a[low] != x) {
// that means no occurrence
return 0;
}
// remember first occurrence
int first = low;
// search last occurrence, must be between low and a.length-1 inclusive
high = a.length - 1;
// now, we always have a[low] == x and high is the index of the last occurrence or later
while(low < high) {
// bias middle towards high now
int middle = low + (high+1-low)/2;
if (a[middle] > x) {
// the last occurrence must come before index middle
high = middle-1;
} else {
// last known occurrence
low = middle;
}
}
// high is now index of last occurrence
return (high - first + 1);

Well this is essentially binary search + walking towards the boundaries of the solution interval. The only way you could possibly speed this is up is maybe cache the last values of low and high and then use binary search to find the boarders as well, but this will really only matter for very large intervals in which case it's unlikely that you jumped right into it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.