Counting sort and usage in Java applications [closed]

Counting sort and usage in Java applications [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I did an algoritms refresher and I (re)read about a sorting algorithm that runs in linear time, namely Counting Sort.
To be honest I had forgotten about it.
I understand the structure and logic and the fact that it runs in linear time is a very attactive quality.
But I have the folowing question:
As I understanding the concrete implementation of the algorithm relies on 2 things:
1) The range of input numbers is small (otherwise the intermediate array will be huge and with many gaps).
2) We actually know the range of numbers.
Taking that these 2 assumptions are correct (please correct me otherwise), I was wondering what is the best application domain that this algorithm applies to.
I mean specifically in Java, is an implementation like the following Java/Counting sort sufficient:
public static void countingSort(int[] a, int low, int high)
{
int[] counts = new int[high - low + 1]; // this will hold all possible values, from low to high
for (int x : a)
counts[x - low]++; // - low so the lowest possible value is always 0
int current = 0;
for (int i = 0; i < counts.length; i++)
{
Arrays.fill(a, current, current + counts[i], i + low); // fills counts[i] elements of value i + low in current
current += counts[i]; // leap forward by counts[i] steps
}
}
or it is not a trivial matter to come up with the high and low?
Is there a specific application in Java that counting sort is best suited for?
I assume there are subtleties like these otherwise why would anyone bother with all the O(nlogn) algorithms?

Algorithms are not about the language, so this is language-agnostic. As you have said - use counting sort when the domain is small. If you have only three numbers - 1, 2, 3 it is far better to sort them with counting sort, than a quicksort, heapsort or whatever which are O(nlogn). If you have a specific question, feel free to ask.

It is incorrect to say counting sort is O(n) in the general case, the "small range of element" is not a recommendation but a key assumption.
Counting sort assumes that each of the elements is an integer in the
range 1 to k, for some integer k. When k = O(n), the Counting-sort
runs in O(n) time.
In the general situation, the range of key k is independent to the number of element n and can be arbitrarily large. For example, the following array:
{1, 1000000000000000000000000000000000000000000000000000000000000000000000000000}
As for the exact break even point value of k and n where counting sort outperform traditional sort, it is hugely dependent on the implementation and best done via benchmarking (trying both out).

Related

Java - Space complexity of this code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What is the space complexity of the code?
I think it is O(1), since I don't technically store all the inputs into one array, i sort of split it up into 2 arrays.
This code is supposed to take an array with duplicates, and find the two numbers that don't have duplicates.
My Code:
public int[] singleNumber(int[] nums) {
int[] res = new int[2];
HashSet<Integer> temp = new HashSet<>();
HashSet<Integer> temp2 = new HashSet<>();
for (int i : nums) {
if (!temp.contains(i)) temp.add(i);
else temp2.add(i);
}
for (int i : nums) {
if (!temp2.contains(i)) {
if (res[0] == 0) res[0] = i;
else res[1] = i;
}
}
return res;
}

The space complexity is "best case" O(1) and "worst-case" O(N). The best case is when all of the numbers in nums are the same, and the worst case occurs in a variety of situations ... including when there close to N/2 duplicates, which is your intended use-case.
The time complexity is O(N) in all cases.
Here is my reasoning.
Time complexity.
It we assume that the hash function is well-behaved, then the time complexity of HashSet.add and HashSet.contains are both O(1).
The time complexity of Integer.valueOf(int) is also O(1).
These O(1) operations are performed at most "some constant" times in two O(N) loops, making the entire computation O(N) in time.
Space complexity.
This is a bit more complex, but let us just consider the worst case where all int values are unique. This statement
if (!temp.contains(i)) temp.add(i);
else temp2.add(i);
is going to add Integer.valueOf(i) to either temp or temp2. And in this particular case, they will all end up in temp. (Think about it ...) So that means we end up with N unique entries in the temp set, and none in the temp2 set.
Now the space required for a HashSet with N entries is O(N). (The constant of proportionality is large ... but we are taling about space complexity here so that is not pertinent.)
The best case occurs when all of the int values are the same. Then you end up with one entry in temp and one entry in temp2. (The same value is repeatedly added to the temp2 set ... which does nothing.) The space usage of two maps with one entry each is O(1).
But (I hear you ask) what about the objects created by Integer.valueOf(int)? I argue that they don't count:
Unless they are made reachable (via one of the HashSet objects), the Integer objects will be garbage collected. Therefore they don't count as space usage in the sense that it is normally considered in Java1.
A smart compiler could actually optimize away the need for Integer objects entirely. (Not current generation HotSpot Java compilers, but in the future we could see such things in the production compilers.)
1 - If you start considering temporary (i.e. immediately unreachable) Java objects as space utilization and sum up this utilization, you will get meaningless results; e.g. an application with O(N^2) "space utilization" that runs in an O(N) sized heap. Temporary objects do count, but only while they are reachable. In your example, the contribution is O(1).

how to improve the efficiency? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
As a school assingment I have a really big for (it goes from 1 to a 13 digits number) with a few BigInteger operations inside. To reduce the loop I can skip all the even numbers and to avoid unnecessary BigInteger operations I can check for multiples of 3, 5, 7 and 11. Here is an exemple:
for(long i = min; i < max; i += 2){
if( i%3 != 0 && i%5 != 0 && i%7 != 0 && i%11 != 0){
BigInteger aux = new BigInteger(Long.toString(i));
BigInteger[] aux2 = k.divideAndRemainder(aux);
if(aux2[1].longValueExact() == 0){
list.add(aux);
list.add(aux2[0]);
}
}
Now, this loop would take months to finish, so I thought of breaking the for in multiple threads, each one covering a window of the original 13-digits-number.
My first question is: with a i7-3770 processor, how many threads could I have to be as fast as efficiently possible?
edit: the assingment is to otimize a problem that requires a lot of CPU. In this case, to find all divisors of a 23 digits number (the "k" in k.divideAndRemainder(aux)), thats why I'm using BigIntegers. The 13 digits number used by the for is the root of k.

Java is fine for multithreading. BigInteger is not exactly performant, but that's not the problem.
On that processor, up to 8 threads can utilize the cores completely. I doubt you run a custom OS that allows you to cede full CPU control to an application, so you want to keep at least 1 logical core for the OS and associated services, as well as 1 for Java misc threads.
If your program takes months to complete on a single thread, then using 4 threads efficiently will make it take weeks to complete. While that's a big speedup, I don't think that's quite enough for you, is it?
This is a school assignment, as you've said yourself. Just like with Project Euler, there are ways to do it, and there are ways to do it right. The first ones take days of computing on powerful machines after 10 minutes programming, the latter ones take days of thinking and seconds of computing. Your problem is not the hardware or language, it's the algorithm.
Now for the actual stuff that you can do better. From obvious to less so.
BigInteger aux = new BigInteger(Long.toString(i)); Are you mad? Why would you convert a long to a String and then parse it back to a number?! This is stupid slow. Just use BigInteger.valueOf(long val).
BigInteger is slower than primitives. Why do you need it if your values (up to 13 digits, as you've said yourself) fit comfortably in a long? If the only reason is the divideAndRemainder method - write one yourself! It's going to be 2 lines of code...
You seem to be looking for divisors of a particular number. (The filtering of multiples of 3,5,7,11 makes me think you're looking for prime ones and will do some filtering later) There is a number of speedups for such algorithms, main one being (and it's not clear whether you're using it) is making the maximum checked number the square root of the target, rather than the number itself, half of it, or whatever arbitrary bound people make up. Other ones can be found in the wiki or more complex wiki

Calculating How Many Balls in Bins Over Several Values Using Dynamic Programming [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Regarding the classic problem of putting N identical balls into M distinct bins and printing all the combinations: What if you would want to extend the problem by printing all cases 0< M, N
The brute force method could be done something like this:
for (int i =0; i<M; i++)
{
for (int j =0; j <N; j++)
{
PrintAllCombinations(j,i)
}
}
Now if we study the output of the first couple m and n, we see that the output of each previous iteration is a subset of the next. It seems to me that we can apply a dynamic algorithm to exploit this phenomenon. However, because we still need to partition every n, for example n=3 = 3 +0, 2+1, 1+2. we still need to do alot of redundant combination calculations. Any ideas fir improvments?

Let S[i][j] be the number of combinations for i balls in j bins.
S[0][j] = 1 for all j since the only combination is to have all bins empty.
S[i][1] = 1 for all i since the only combination is to put all the balls in the one bin.
For every other i, j S[i][j] = sum(x = 0 -> i, S[i-x][j-1]). That is for every other position you can compute the number of combinations by assigning every possible number of balls to the last bin and sum the number of combinations you get.
If you want to print out the combinations you can replace the count with the actual combinations and append the value x when you take the internal combinations in the sum. That will take a lot of memory without a lot of gain in speed. Just do the recursion and repeat the computation since you're bound by the number of solutions anyway.

Big Oh for (n log n) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am currently studying basic algorithms for Big Oh. I was wondering if anyone can show me what the code for (n log n) in Java using Big Oh would be like or direct me to any SO page where one exists.
Since I am just a beginner, I can only imagine the code before I write it. So, theoretically (at least), it should contain one for loop where we have something of n times. Then for the log n, we can use the while loop. So then the loop is executed n times and the while loop is executed log base 2 times. At least that is how I am imagining it in my head but seeing the code would clear things up.

int n = 100
for(int i = 0; i < n; i++) //this loop is executed n times, so O(n)
{
for(int j = n; j > 0; j/=2) //this loop is executed O(log n) times
{
}
}
Explanation:
The outer for loop should be clear; it is executed n times. Now to the inner loop. In the inner loop, you take n and always divide it by 2. So, you ask yourself: How many times can I divide n by 2?
It turns out that this is O (log n). In fact, the base of log is 2, but in Big-O notation, we remove the base since it only adds factors to our log that we are not interested in.
So, you are executing a loop n times, and within that loop, you are executing another loop log(n) times. So, you have O(n) * O(log n) = O(n log n).

A very popular O(n log n) algorithm is merge sort. http://en.wikipedia.org/wiki/Merge_sort for example of the algorithm and pseudocode. The log n part of the algorithm is achieved through breaking down the problem into smaller subproblems, in which the height of the recursion tree is log n.
A lot of sorting algortihms has the running time of O(n log n). Refer to http://en.wikipedia.org/wiki/Sorting_algorithm for more examples.

Algorithms with a O(.) time complexity involving log n's typically involve some form of divide and conquer.
For example, in MergeSort the list is halved, each part is individually merge-sorted and then the two halves are merged together. Each list is halved.
Whenever you have work being halved or reduced in size by some fixed factor, you'll usually end up with a log n component of the O(.).
In terms of code, take a look at the algorithm for MergeSort. The important feature, of typical implementations, is that it is recursive (note that TopDownSplitMerge calls itself twice in the code given on Wikipedia).
All good standard sorting algorithms have O(n log n) time complexity and it's not possible to do better in the worst case, see Comparison Sort.
To see what this looks like in Java code, just search! Here's one example.

http://en.wikipedia.org/wiki/Heapsort
Simple example is just like you described - execute n times some operation that takes log(n) time.
Balanced binary trees have log(n) height, so some tree algorithms will have such complexity.

How would I hash a rectangle? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a Rectangle class. It has a length, breadth and area (all ints). I want to hash it such that every rectangle with the same length and breadth hashes to the same value. What is a way to do this?
EDIT: I understand its a broad question. Thats why I asked for "a" way to do it. Not the best way.

A good and simple scheme is to calculate the hash for a pair of integers as follows:
hash = length * CONSTANT + width
Empirically, you will get best results (i.e. the fewest number collisions) if CONSTANT is a prime number. A lot of people1 recommend a value like 31, but the best choice depends on the most likely range of the length and width value. If they are strictly bounded, and small enough, then you could do better than 31.
However, 31 is probably good enough for practical purposes2. A few collisions at this level is unlikely to make a significant performance difference, and even a perfect hashing function does not eliminate collisions at the hash table level ... where you use the modulus of the hash value.
1 - I'm not sure where this number comes from, or whether there are empirical studies to back it up ... in the general case. I suspect it comes from hashing of (ASCII) strings. But 31 is prime ... and it is a Mersenne prime (2^7 - 1) which means it could be computed using a shift and a subtraction if hardware multiple is slow.
2 - I'm excluding cases where you need to worry about someone deliberately creating hash function collisions in an attempt to "break" something.

You can use the Apache Commons library, which has a HashCodeBuilder class. Assuming you have a Rectangle class with a width and a height, you can add the following method:
#Override
public int hashCode(){
return new HashCodeBuilder().append(width).append(height).append(children).toHashCode();
}

What you want (as clarified in your comment on the question) is not possible. There are N possible hashCodes, one for each int, where N is approximately 4.2 billion. Assuming rectangles must have positive dimensions, there are ((N * N) / 4) possible rectangles. How do you propose to make them fit into N hashCodes? When N is > 4, you have more possible rectangles than hashCodes.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.