Java perfomance issue with Arrays.sort [duplicate] - java

This question already has answers here:
Why is processing a sorted array *slower* than an unsorted array? (Java's ArrayList.indexOf)
(3 answers)
Closed 9 months ago.
I've been solving one algorithmic problem and found solution, as I thought. But unexpectedly I bumped into a weird problem.
Let's assume i have the following code on java 8/17(replicates on both), intel 11th gen processor:
import java.util.Arrays;
import java.util.concurrent.ThreadLocalRandom;
public class DistanceYandex{
static class Elem implements Comparable<Elem>{
int value;
int index;
long dist;
public Elem(int value, int index){
this.value = value;
this.index = index;
}
#Override
public int compareTo(Elem o){
return Integer.compare(value, o.value);
}
}
public static void main(String[] args){
int n = 300_000;
int k = 3_000;
Elem[] elems = new Elem[n];
for(int i = 0; i < n; i++){
elems[i] = new Elem(ThreadLocalRandom.current().nextInt(), i);
}
solve(n, k, elems);
}
private static void solve(int n, int k, Elem[] elems){
Arrays.sort(elems); // interesting line
long time = System.nanoTime();
for(int i = 0; i < n; i++){
elems[i].dist = findDistForIth(elems, i, k);
}
// i omit output, because it's irrelevant
// Arrays.sort(elems, Comparator.comparingInt(elem -> elem.index));
// System.out.print(elems[0].dist);
// for(int i = 1; i < n; i++){
// System.out.print(" " + elems[i].dist);
// }
System.out.println((System.nanoTime() - time)/1_000_000_000.0);
}
private static long findDistForIth(Elem[] elems, int i, int k){
int midElem = elems[i].value;
int left = i - 1;
int right = i + 1;
long dist = 0;
for(int j = 0; j < k; j++){
if(left < 0){
dist += elems[right++].value - midElem;
}else if(right >= elems.length){
dist += midElem - elems[left--].value;
}else{
int leftAdd = midElem - elems[left].value;
int rightAdd = elems[right].value - midElem;
if(leftAdd < rightAdd){
dist+=leftAdd;
left--;
}else{
dist+=rightAdd;
right++;
}
}
}
return dist;
}
}
Point your eyes at solve function.
Here we have simple solution, that calls function findDistForIth n times and measures time it takes(I don't use JMH, because testing system for my problem uses simple one-time time measures). And before it captures start time, it sorts the array by natural order using built-in Arrays.sort function.
As you could notice, measured time doesn't include the time the array gets sorted. Also function findDistForIth's behaviour does not depend on whether input array is sorted or not(it mostly goes to third else branch). But if I comment out line with Arrays.sort I get significantly faster execution: instead of roughly 7.3 seconds, it takes roughly 1.6 seconds. More that 4 times faster!
I don't understand what's going on.
I thought maybe it is gc that's messing up here, I tried to increase memory I give to jvm to 2gb(-Xmx2048M -Xms2048M). Didn't help.
I tried to pass explicit comparator to Arrays.sort as second argument(Comparator.comparingInt(e -> e.value)) and deimplementing Comparable interface on Elem class. Didn't help.
I launched the profiler(Intellij Profiler)
With Arrays.sort included:
With Arrays.sort excluded:
But it didn't give me much information...
I tried building it directly to .jar and launching via java cmd(before i did it via intellij). It also didn't help.
Do anybody know what's goind on?
This problem also replicates in online compiler: https://onlinegdb.com/MPyNIknB8T

May be you need to sort your data using red black tree sorting algo which implemented in SortedSet, Arrays.sort use mergesort sorting algo which works well for small number of data

Related

Why sentinel search slower than linear?

I decided to reduce the number of comparisons required to find an element in an array. Here we replace the last element of the list with the search element itself and run a while loop to see if there exists any copy of the search element in the list and quit the loop as soon as we find the search element. See the code snippet for clarification.
import java.util.Random;
public class Search {
public static void main(String[] args) {
int n = 10000000;
int key = 10000;
int[] arr = generateRandomSize(n);
long start = System.nanoTime();
int find = sentinels(arr, key);
long end = System.nanoTime();
System.out.println(find);
System.out.println(end - start);
arr = generateRandomSize(n);
start = System.nanoTime();
find = linear(arr, key);
end = System.nanoTime();
System.out.println(find);
System.out.println(end - start);
}
public static int[] generateRandomSize(int n) {
int[] arr = new int[n];
Random rand = new Random();
for (int i = 0; i < n; ++i) {
arr[i] = rand.nextInt(5000);
}
return arr;
}
public static int linear(int[] a, int key) {
for(int i = 0; i < a.length; ++i) {
if (a[i] == key) {
return i;
}
}
return -1;
}
public static int sentinels(int[] a, int key) {
int n = a.length;
int last = a[n-1];
a[n-1] = key;
int i = 0;
while (a[i] != key) {
++i;
}
a[n-1] = last;
if ((i < n - 1) || a[n-1] == key ) {
return i;
}
return -1;
}
}
So using sentinel search we are not doing 10000000 comparisons like i < arr.length. But why linear search always shows up better performance?
You'd have to look at the byte code, and even deeper to see what hotspot is making from this. But I am quite sure that this statement is not true:
using sentinel search we are not doing 10000000 comparisons like i <
arr.length
Why? Because when you access a[i], i has to be bounds checked. In the linear case on the other hand, the optimiser can deduce that it can omit the bounds check since it "knows" that i>=0 (because of the loop structure) and also i<arr.length because it has already been tested in the loop condition.
So the sentinel approach just adds overhead.
This makes me think of a smart C++ optimisation (called "Template Meta Programming" and "Expression Templates") I did about 20 years ago that led to faster execution times (at cost of a much higher compilation time), and after the next compiler version was released, I discovered that the new version was able to optimise the original source to produce the exact same assembly - in short I should have rather used my time differently and stayed with the more readable (=easier to maintain) version of the code.

Why my java program is showing StackOverflowError?

I have written a program to sort elements of an array based on the principle of quicksort. So what the program does is that it accepts an array, assumes the first element as the pivot and then compares it with rest of the elements of the array. If the element found greater then it will store at the last of another identical array(say b) and if the element is less than the smaller than it puts that element at the beginning of the array b. in this way the pivot will find its way to the middle of the array where the elements that are on the left-hand side are smaller and at the right-hand side are greater than the pivot. Then the elements of array b are copied to the main array and this whole function is called via recursion. This is the required code.
package sorting;
import java.util.*;
public class AshishSort_Splitting {
private static Scanner dogra;
public static void main(String[] args)
{
dogra=new Scanner(System.in);
System.out.print("Enter the number of elements ");
int n=dogra.nextInt();
int[] a=new int[n];
for(int i=n-1;i>=0;i--)
{
a[i]=i;
}
int start=0;
int end=n-1;
ashishSort(a,start,end);
for(int i=0;i<n;i++)
{
System.out.print(+a[i]+"\n");
}
}
static void ashishSort(int[]a,int start,int end)
{
int p;
if(start<end)
{
p=ashishPartion(a,start,end);
ashishSort(a,start,p-1);
ashishSort(a,p+1,end);
}
}
public static int ashishPartion(int[] a,int start,int end)
{
int n=start+end+1;
int[] b=new int[n];
int j=start;
int k=end;
int equal=a[start];
for(int i=start+1;i<=end;i++)
{
if(a[i]<equal)
{
b[j]=a[i];
j++;
}
else if(a[i]>equal)
{
b[k]=a[i];
k--;
}
}
b[j]=equal;
for(int l=0;l<=end;l++)
{
a[l]=b[l];
}
return j;
}
}
this code works fine when I enter the value of n up to 13930, but after that, it shows
Exception in thread "main" java.lang.StackOverflowError
at sorting.AshishSort_Splitting.ashishSort(AshishSort_Splitting.java:28)
at sorting.AshishSort_Splitting.ashishSort(AshishSort_Splitting.java:29)
I know the fact the error caused due to bad recursion but I tested my code multiple times and didn't find any better alternative. please help. thanks in advance.
EDIT: can someone suggest a way to overcome this.
I see perfrmance issues first. I see in your partition method:
int n = start+end+1
Right there, if the method was called on an int[1000] with start=900 and end=999, you are allocating an int[1900]... Not intended, I think...!
If you are really going to trash memory instead of an in-place partitioning,
assume
int n = end-start+1
instead for a much smaller allocation, and j and k indexes b[], they would be j=0 and k=n, and return start + j.
Second, your
else if(a[i]<equal)
is not necessary and causes a bug. A simple else suffice. If you don't replace the 0's in b[j..k] you'll be in trouble when you refill a[].
Finally, your final copy is bogus, from [0 to end] is beyond the bounds of the invocation [start..end], AND most importantly, there is usually nothing of interest in b[nearby 0] with your b[] as it is. The zone of b[] (in your version) is [start..end] (in my suggested version it would be [0..n-1])
Here is my version, but it still has the O(n) stack problem that was mentioned in the comments.
public static int ashishPartion(int[] a, int start, int end) {
int n = end-start + 1;
int[] b = new int[n];
int bj = 0;
int bk = n-1;
int pivot = a[start];
for (int i = start + 1; i <= end; i++) {
if (a[i] < pivot) {
b[bj++] = a[i];
} else {
b[bk--] = a[i];
}
}
b[bj] = pivot;
System.arraycopy(b, 0, a, start, n);
return start+bj;
}
If you are free to choose a sorting algo, then a mergesort would be more uniform on performance, with logN stack depth. Easy to implement.
Otherwise, you will have to de-recurse your algo, using a manual stack and that is a nice homework that I won't do for you... LOL

Creating quicksort without recursion and stack

I have a task to write quicksort (on only posivite numbers) algorythm in Java (I can't use any imports but Scanner) but without recursion and without stack.
I have two question about it :
I do understeand iterative quicksort with stack and recursive version but i cannot imagine how to do it without it.
I have heard about some 'in place' implementation but i dont really get it - is it solution for my problem?
I would appreciate if anyone could show me a way to do it ( dont post implementation if you can, I just want to understeand it not copy someone's code) or recommend some book where I can find it ( or some similar problem ).
Is implementing sort by insertion for some small arrays a good idea? If so how big should be N in this code :
if (arraySize < N)
insertionSort
else
quickSort
fi
Apparently my task was to find only posivite numbers, here is my solution:
public static void quickSort(final int size) {
int l = 0;
int r = size - 1;
int q, i = 0;
int tmpr = r;
while (true) {
i--;
while (l < tmpr) {
q = partition(l, tmpr);
arr[tmpr] = -arr[tmpr];
tmpr = q - 1;
++i;
}
if (i < 0)
break;
l++;
tmpr = findNextR(l, size);
arr[tmpr] = -arr[tmpr];
}
}
private static int findNextR(final int l, final int size) {
for (int i = l; i < size; ++i) {
if (arr[i] < 0)
return i;
}
return size - 1;
}
private static int partition(int l, int r) {
long pivot = arr[(l + r) / 2];
while (l <= r) {
while (arr[r] > pivot)
r--;
while (arr[l] < pivot)
l++;
if (l <= r) {
long tmp = arr[r];
arr[r] = arr[l];
arr[l] = tmp;
l++;
r--;
}
}
return l;
}
My array to sort is an static array in my class.
It is based on finding and creating negative numbers.
Partition is created by using middle element in array but using median is also good (it depends on array).
I hope someone will find this usefull.
Just as a reference the Java8 implementation of Arrays.sort(int[]) uses a threshold of 47, anything less than that is sorted using insertion. Their quick sort implementation is however very complex with some initial overhead, so look upon 47 as an upper limit.
A Google of "non-recursive quicksort" produced a slew of answers ... including this one: Non recursive QuickSort "Your language may vary," but the basic principle won't.
I personally think that, if you're going to sort something, you might as well use Quicksort in all cases . . .
Unless, of course, you can simply use a sort() function in your favorite target-language and leave it to the language implementors to have chosen a clever algorithm (uhhhh, it's probably Quicksort...) for you. If you don't have to specify an algorithm to do such a common task, "don't!" :-)

tukey's ninther for different shufflings of same data

While implementing improvements to quicksort partitioning,I tried to use Tukey's ninther to find the pivot (borrowing almost everything from sedgewick's implementation in QuickX.java)
My code below gives different results each time the array of integers is shuffled.
import java.util.Random;
public class TukeysNintherDemo{
public static int tukeysNinther(Comparable[] a,int lo,int hi){
int N = hi - lo + 1;
int mid = lo + N/2;
int delta = N/8;
int m1 = median3a(a,lo,lo+delta,lo+2*delta);
int m2 = median3a(a,mid-delta,mid,mid+delta);
int m3 = median3a(a,hi-2*delta,hi-delta,hi);
int tn = median3a(a,m1,m2,m3);
return tn;
}
// return the index of the median element among a[i], a[j], and a[k]
private static int median3a(Comparable[] a, int i, int j, int k) {
return (less(a[i], a[j]) ?
(less(a[j], a[k]) ? j : less(a[i], a[k]) ? k : i) :
(less(a[k], a[j]) ? j : less(a[k], a[i]) ? k : i));
}
private static boolean less(Comparable x,Comparable y){
return x.compareTo(y) < 0;
}
public static void shuffle(Object[] a) {
Random random = new Random(System.currentTimeMillis());
int N = a.length;
for (int i = 0; i < N; i++) {
int r = i + random.nextInt(N-i); // between i and N-1
Object temp = a[i];
a[i] = a[r];
a[r] = temp;
}
}
public static void show(Comparable[] a){
int N = a.length;
if(N > 20){
System.out.format("a[0]= %d\n", a[0]);
System.out.format("a[%d]= %d\n",N-1, a[N-1]);
}else{
for(int i=0;i<N;i++){
System.out.print(a[i]+",");
}
}
System.out.println();
}
public static void main(String[] args) {
Integer[] a = new Integer[]{17,15,14,13,19,12,11,16,18};
System.out.print("data= ");
show(a);
int tn = tukeysNinther(a,0,a.length-1);
System.out.println("ninther="+a[tn]);
}
}
Running this a cuople of times gives
data= 11,14,12,16,18,19,17,15,13,
ninther=15
data= 14,13,17,16,18,19,11,15,12,
ninther=14
data= 16,17,12,19,18,13,14,11,15,
ninther=16
Will tuckey's ninther give different values for different shufflings of the same dataset? when I tried to find the median of medians by hand ,I found that the above calculations in the code are correct.. which means that the same dataset yield different results unlike a median of the dataset.Is this the proper behaviour? Can someone with more knowledge in statistics comment?
Tukey's ninther examines 9 items and calculates the median using only those.
For different random shuffles, you may very well get a different Tukey's ninther, because different items may be examined. After all, you always examine the same array slots, but a different shuffle may have put different items in those slots.
The key here is that Tukey's ninther is not the median of the given array. It is an attempted appromixation of the median, made with very little effort: we only have to read 9 items and make 12 comparisons to get it. This is much faster than getting the actual median, and has a smaller chance of resulting in an undesirable pivot compared to the 'median of three'. Note that the chance still exists.
Does this answer you question?
On a side note, does anybody know if quicksort using Tukey's ninther still requires shuffling? I'm assuming yes, but I'm not certain.

heapsort working 99%

was wondering if i could get some quick help with a heapsort implementation. I have it working and sorting fine but in the output it is always everything is sorted except the first number. It's probably just a check somewhere but i have gone over my code and tried changing values but nothing produced the results i needed. Any advice to where i went wrong?
here is my source code:
code removed, problem was solved!
thanks guys!
private static void movedown(double [] a, int k, int c) {
while (2*k <= c-1) {
int j = 2*k+1;
if (j <= c-1 && less(a[j], a[j+1])) j++;
if (!less(a[k], a[j])) break;
exch(a, k, j);
k = j;
}
}
public static void heapsort(double [] a, int count) {
for (int k = count/2; k >= 0; k--)
movedown(a, k, count);
while (count >= 1) {
exch(a, 0, count--);
movedown(a, 0, count);
}
}
I have fixed your bug and tested it on my machine. It should work. Just a couple minor changes in these two methods.
To summarize what you didn't get right:
In heapsort method, the count you passed in is zero-based index. However, when you built the heap you only looped to k = 1, i.e., one more iteration to go.
In movedown method, you should have known the left child index is 2*k+1 while the right child index is 2*k+2.
That you didn't keep consistent with your indexing choices(i.e., 0-based vs. 1-based) resulted in the bug I guess.

Categories