Java - Threaded Radix Sort

Java - Threaded Radix Sort - java

I have been working on different variations of the Radix Sort. At first I used chaining, which was really slow. Then I moved onto using a count sort while using val % (10 * pass), and most recently turning it into the respective bytes and count sorting those, which allows me to sort by negative values also.
I wanted to try it with multithreading, and can only get it to work about half the time. I was wondering if someone can help look at my code, and see where I'm going wrong with the threading. I have each thread count sort each byte. Thanks:
public class radixSort {
public int[] array;
public int arraySize, arrayRange;
public radixSort (int[] array, int size, int range) {
this.array = array;
this.arraySize = size;
this.arrayRange = range;
}
public int[] RadixSort() {
Thread[] threads = new Thread[4];
for (int i=0;i<4;i++)
threads[i] = new Thread(new Radix(arraySize, i));
for (int i=0;i<4;i++)
threads[i].start();
for (int i=0;i<4;i++)
try {
threads[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
return array;
}
class Radix implements Runnable {
private int pass, size;
private int[] tempArray, freqArray;
public Radix(int size, int pass) {
this.pass = pass;
this.size = size;
this.tempArray = new int[size];
this.freqArray = new int[256];
}
public void run() {
int temp, i, j;
synchronized(array) {
for (i=0;i<size;i++) {
if (array[i] <= 0) temp = array[i] ^ 0x80000000;
else temp = array[i] ^ ((array[i] >> 31) | 0x80000000);
j = temp >> (pass << 3) & 0xFF;
freqArray[j]++;
}
for (i=1;i<256;i++)
freqArray[i] += freqArray[i-1];
for (i=size-1;i>=0;i--) {
if (array[i] <= 0) temp = array[i] ^ 0x80000000;
else temp = array[i] ^ ((array[i] >> 31) | 0x80000000);
j = temp >> (pass << 3) & 0xFF;
tempArray[--freqArray[j]] = array[i];
}
for (i=0;i<size;i++)
array[i] = tempArray[i];
}
}
}
}

There is a basic problem with this approach. To get a benefit from multithreading, you need to give each thread a non-overlapping task compared to the other treads. By synchonizing on the array you have made it so only one thread does work at a time, meaning you get all the overhead of threads with none of the benefit.
Think of ways to partition the task so that threads work in parallel. For example, after the first pass, all the item with a 1 high bit will be in one part of the array, and those with a zero high-bit will be in the other. You could have one thread work on each part of the array without synchronizing.
Note that your runnable has to completely change so that it does one pass at a specified subset of the array then spawns threads for the next pass.

Besides wrong class and method names (class should start with capital letter, method shouldn't), I can see that you are synchronizing all thread works on the array. So it's in fact not parallel at all.

I am pretty sure that you can't really parallelize RadixSort, at least in the way you are trying to. Someone pointed out that you can do it by divide-and-conquer, as you first order by the highest bits, but in fact, RadixSort works by comparing the lower-order bits first, so you can't really divide-and-conquer. The array can basically be completely permuted after each pass.
Guys, prove me wrong, but i think it's inherently impossible to parallelize this algorithm like you try to. Maybe you can parallelize the (count) sorting that is done inside of each pass, but be aware that ++ is not an atomic operation.

Related

Java perfomance issue with Arrays.sort [duplicate]

This question already has answers here:
Why is processing a sorted array *slower* than an unsorted array? (Java's ArrayList.indexOf)
(3 answers)
Closed 9 months ago.
I've been solving one algorithmic problem and found solution, as I thought. But unexpectedly I bumped into a weird problem.
Let's assume i have the following code on java 8/17(replicates on both), intel 11th gen processor:
import java.util.Arrays;
import java.util.concurrent.ThreadLocalRandom;
public class DistanceYandex{
static class Elem implements Comparable<Elem>{
int value;
int index;
long dist;
public Elem(int value, int index){
this.value = value;
this.index = index;
}
#Override
public int compareTo(Elem o){
return Integer.compare(value, o.value);
}
}
public static void main(String[] args){
int n = 300_000;
int k = 3_000;
Elem[] elems = new Elem[n];
for(int i = 0; i < n; i++){
elems[i] = new Elem(ThreadLocalRandom.current().nextInt(), i);
}
solve(n, k, elems);
}
private static void solve(int n, int k, Elem[] elems){
Arrays.sort(elems); // interesting line
long time = System.nanoTime();
for(int i = 0; i < n; i++){
elems[i].dist = findDistForIth(elems, i, k);
}
// i omit output, because it's irrelevant
// Arrays.sort(elems, Comparator.comparingInt(elem -> elem.index));
// System.out.print(elems[0].dist);
// for(int i = 1; i < n; i++){
// System.out.print(" " + elems[i].dist);
// }
System.out.println((System.nanoTime() - time)/1_000_000_000.0);
}
private static long findDistForIth(Elem[] elems, int i, int k){
int midElem = elems[i].value;
int left = i - 1;
int right = i + 1;
long dist = 0;
for(int j = 0; j < k; j++){
if(left < 0){
dist += elems[right++].value - midElem;
}else if(right >= elems.length){
dist += midElem - elems[left--].value;
}else{
int leftAdd = midElem - elems[left].value;
int rightAdd = elems[right].value - midElem;
if(leftAdd < rightAdd){
dist+=leftAdd;
left--;
}else{
dist+=rightAdd;
right++;
}
}
}
return dist;
}
}
Point your eyes at solve function.
Here we have simple solution, that calls function findDistForIth n times and measures time it takes(I don't use JMH, because testing system for my problem uses simple one-time time measures). And before it captures start time, it sorts the array by natural order using built-in Arrays.sort function.
As you could notice, measured time doesn't include the time the array gets sorted. Also function findDistForIth's behaviour does not depend on whether input array is sorted or not(it mostly goes to third else branch). But if I comment out line with Arrays.sort I get significantly faster execution: instead of roughly 7.3 seconds, it takes roughly 1.6 seconds. More that 4 times faster!
I don't understand what's going on.
I thought maybe it is gc that's messing up here, I tried to increase memory I give to jvm to 2gb(-Xmx2048M -Xms2048M). Didn't help.
I tried to pass explicit comparator to Arrays.sort as second argument(Comparator.comparingInt(e -> e.value)) and deimplementing Comparable interface on Elem class. Didn't help.
I launched the profiler(Intellij Profiler)
With Arrays.sort included:
With Arrays.sort excluded:
But it didn't give me much information...
I tried building it directly to .jar and launching via java cmd(before i did it via intellij). It also didn't help.
Do anybody know what's goind on?
This problem also replicates in online compiler: https://onlinegdb.com/MPyNIknB8T

May be you need to sort your data using red black tree sorting algo which implemented in SortedSet, Arrays.sort use mergesort sorting algo which works well for small number of data

how to improve this code?

I have developed a code for expressing the number in terms of the power of the 2 and I am attaching the same code below.
But the problem is that the expressed output should of minimum length.
I am getting output as 3^2+1^2+1^2+1^2 which is not minimum length.
I need to output in this format:
package com.algo;
import java.util.Scanner;
public class GetInputFromUser {
public static void main(String[] args) {
// TODO Auto-generated method stub
int n;
Scanner in = new Scanner(System.in);
System.out.println("Enter an integer");
n = in.nextInt();
System.out.println("The result is:");
algofunction(n);
}
public static int algofunction(int n1)
{
int r1 = 0;
int r2 = 0;
int r3 = 0;
//System.out.println("n1: "+n1);
r1 = (int) Math.sqrt(n1);
r2 = (int) Math.pow(r1, 2);
// System.out.println("r1: "+r1);
//System.out.println("r2: "+r2);
System.out.print(r1+"^2");
r3 = n1-r2;
//System.out.println("r3: "+r3);
if (r3 == 0)
return 1;
if(r3 == 1)
{
System.out.print("+1^2");
return 1;
}
else {
System.out.print("+");
algofunction(r3);
return 1;
}
}
}

Dynamic programming is all about defining the problem in such a way that if you knew the answer to a smaller version of the original, you could use that to answer the main problem more quickly/directly. It's like applied mathematical induction.
In your particular problem, we can define MinLen(n) as the minimum length representation of n. Next, say, since we want to solve MinLen(12), suppose we already knew the answer to MinLen(1), MinLen(2), MinLen(3), ..., MinLen(11). How could we use the answer to those smaller problems to figure out MinLen(12)? This is the other half of dynamic programming - figuring out how to use the smaller problems to solve the bigger one. It doesn't help you if you come up with some smaller problem, but have no way of combining them back together.
For this problem, we can make the simple statement, "For 12, it's minimum length representation DEFINITELY has either 1^2, 2^2, or 3^2 in it." And in general, the minimum length representation of n will have some square less than or equal to n as a part of it. There is probably a better statement you can make, which would improve the runtime, but I'll say that it is good enough for now.
This statement means that MinLen(12) = 1^2 + MinLen(11), OR 2^2 + MinLen(8), OR 3^2 + MinLen(3). You check all of them and select the best one, and now you save that as MinLen(12). Now, if you want to solve MinLen(13), you can do that too.
Advice when solo:
The way I would test this kind of program myself is to plug in 1, 2, 3, 4, 5, etc, and see the first time it goes wrong. Additionally, any assumptions I happen to have thought were a good idea, I question: "Is it really true that the largest square number less than n will be in the representation of MinLen(n)?"
Your code:
r1 = (int) Math.sqrt(n1);
r2 = (int) Math.pow(r1, 2);
embodies that assumption (a greedy assumption), but it is wrong, as you've clearly seen with the answer for MinLen(12).
Instead you want something more like this:
public ArrayList<Integer> minLen(int n)
{
// base case of recursion
if (n == 0)
return new ArrayList<Integer>();
ArrayList<Integer> best = null;
int bestInt = -1;
for (int i = 1; i*i <= n; ++i)
{
// Check what happens if we use i^2 as part of our representation
ArrayList<Integer> guess = minLen(n - i*i);
// If we haven't selected a 'best' yet (best == null)
// or if our new guess is better than the current choice (guess.size() < best.size())
// update our choice of best
if (best == null || guess.size() < best.size())
{
best = guess;
bestInt = i;
}
}
best.add(bestInt);
return best;
}
Then, once you have your list, you can sort it (no guarantees that it came in sorted order), and print it out the way you want.
Lastly, you may notice that for larger values of n (1000 may be too large) that you plug in to the above recursion, it will start going very slow. This is because we are constantly recalculating all the small subproblems - for example, we figure out MinLen(3) when we call MinLen(4), because 4 - 1^2 = 3. But we figure it out twice for MinLen(7) -> 3 = 7 - 2^2, but 3 also is 7 - 1^2 - 1^2 - 1^2 - 1^2. And it gets much worse the larger you go.
The solution to this, which lets you solve up to n = 1,000,000 or more, very quickly, is to use a technique called Memoization. This means that once we figure out MinLen(3), we save it somewhere, let's say a global location to make it easy. Then, whenever we would try to recalculate it, we check the global cache first to see if we already did it. If so, then we just use that, instead of redoing all the work.
import java.util.*;
class SquareRepresentation
{
private static HashMap<Integer, ArrayList<Integer>> cachedSolutions;
public static void main(String[] args)
{
cachedSolutions = new HashMap<Integer, ArrayList<Integer>>();
for (int j = 100000; j < 100001; ++j)
{
ArrayList<Integer> answer = minLen(j);
Collections.sort(answer);
Collections.reverse(answer);
for (int i = 0; i < answer.size(); ++i)
{
if (i != 0)
System.out.printf("+");
System.out.printf("%d^2", answer.get(i));
}
System.out.println();
}
}
public static ArrayList<Integer> minLen(int n)
{
// base case of recursion
if (n == 0)
return new ArrayList<Integer>();
// new base case: problem already solved once before
if (cachedSolutions.containsKey(n))
{
// It is a bit tricky though, because we need to be careful!
// See how below that we are modifying the 'guess' array we get in?
// That means we would modify our previous solutions! No good!
// So here we need to return a copy
ArrayList<Integer> ans = cachedSolutions.get(n);
ArrayList<Integer> copy = new ArrayList<Integer>();
for (int i: ans) copy.add(i);
return copy;
}
ArrayList<Integer> best = null;
int bestInt = -1;
// THIS IS WRONG, can you figure out why it doesn't work?:
// for (int i = 1; i*i <= n; ++i)
for (int i = (int)Math.sqrt(n); i >= 1; --i)
{
// Check what happens if we use i^2 as part of our representation
ArrayList<Integer> guess = minLen(n - i*i);
// If we haven't selected a 'best' yet (best == null)
// or if our new guess is better than the current choice (guess.size() < best.size())
// update our choice of best
if (best == null || guess.size() < best.size())
{
best = guess;
bestInt = i;
}
}
best.add(bestInt);
// check... not needed unless you coded wrong
int sum = 0;
for (int i = 0; i < best.size(); ++i)
{
sum += best.get(i) * best.get(i);
}
if (sum != n)
{
throw new RuntimeException(String.format("n = %d, sum=%d, arr=%s\n", n, sum, best));
}
// New step: Save the solution to the global cache
cachedSolutions.put(n, best);
// Same deal as before... if you don't return a copy, you end up modifying your previous solutions
//
ArrayList<Integer> copy = new ArrayList<Integer>();
for (int i: best) copy.add(i);
return copy;
}
}
It took my program around ~5s to run for n = 100,000. Clearly there is more to be done if we want it to be faster, and to solve for larger n. The main issue now is that in storing the entire list of results of previous answers, we use up a lot of memory. And all of that copying! There is more you can do, like storing only an integer and a pointer to the subproblem, but I'll let you do that.
And by the by, 1000 = 30^2 + 10^2.

"Last 100 bytes" Interview Scenario

I got this question in an interview the other day and would like to know some best possible answers(I did not answer very well haha):
Scenario: There is a webpage that is monitoring the bytes sent over a some network. Every time a byte is sent the recordByte() function is called passing that byte, this could happen hundred of thousands of times per day. There is a button on this page that when pressed displays the last 100 bytes passed to recordByte() on screen (it does this by calling the print method below).
The following code is what I was given and asked to fill out:
public class networkTraffic {
public void recordByte(Byte b){
}
public String print() {
}
}
What is the best way to store the 100 bytes? A list? Curious how best to do this.

Something like this (circular buffer) :
byte[] buffer = new byte[100];
int index = 0;
public void recordByte(Byte b) {
index = (index + 1) % 100;
buffer[index] = b;
}
public void print() {
for(int i = index; i < index + 100; i++) {
System.out.print(buffer[i % 100]);
}
}
The benefits of using a circular buffer:
You can reserve the space statically. In a real-time network application (VoIP, streaming,..)this is often done because you don't need to store all data of a transmission, but only a window containing the new bytes to be processed.
It's fast: can be implemented with an array with read and write cost of O(1).

I don't know java, but there must be a queue concept whereby you would enqueue bytes until the number of items in the queue reached 100, at which point you would dequeue one byte and then enqueue another.
public void recordByte(Byte b)
{
if (queue.ItemCount >= 100)
{
queue.dequeue();
}
queue.enqueue(b);
}
You could print by peeking at the items:
public String print()
{
foreach (Byte b in queue)
{
print("X", b); // some hexadecimal print function
}
}

Circular Buffer using array:
Array of 100 bytes
Keep track of where the head index is i
For recordByte() put the current byte in A[i] and i = i+1 % 100;
For print(), return subarray(i+1, 100) concatenate with subarray(0, i)
Queue using linked list (or the java Queue):
For recordByte() add new byte to the end
If the new length to be more than 100, remove the first element
For print() simply print the list

Here is my code. It might look a bit obscure, but I am pretty sure this is the fastest way to do it (at least it would be in C++, not so sure about Java):
public class networkTraffic {
public networkTraffic() {
_ary = new byte[100];
_idx = _ary.length;
}
public void recordByte(Byte b){
_ary[--_idx] = b;
if (_idx == 0) {
_idx = _ary.length;
}
}
private int _idx;
private byte[] _ary;
}
Some points to note:
No data is allocated/deallocated when calling recordByte().
I did not use %, because it is slower than a direct comparison and using the if (branch prediction might help here too)
--_idx is faster than _idx-- because no temporary variable is involved.
I count backwards to 0, because then I do not have to get _ary.length each time in the call, but only every 100 times when the first entry is reached. Maybe this is not necessary, the compiler could take care of it.
if there were less than 100 calls to recordByte(), the rest is zeroes.

Easiest thing is to shove it in an array. The max size that the array can accommodate is 100 bytes. Keep adding bytes as they are streaming off the web. After the first 100 bytes are in the array, when the 101st byte comes, remove the byte at the head (i.e. 0th). Keep doing this. This is basically a queue. FIFO concept. Ater the download is done, you are left with the last 100 bytes.
Not just after the download but at any given point in time, this array will have the last 100 bytes.
#Yottagray Not getting where the problem is? There seems to be a number of generic approaches (array, circular array etc) & a number of language specific approaches (byteArray etc). Am I missing something?

Multithreaded solution with non-blocking I/O:
private static final int N = 100;
private volatile byte[] buffer1 = new byte[N];
private volatile byte[] buffer2 = new byte[N];
private volatile int index = -1;
private volatile int tag;
synchronized public void recordByte(byte b) {
index++;
if (index == N * 2) {
//both buffers are full
buffer1 = buffer2;
buffer2 = new byte[N];
index = N;
}
if (index < N) {
buffer1[index] = b;
} else {
buffer2[index - N] = b;
}
}
public void print() {
byte[] localBuffer1, localBuffer2;
int localIndex, localTag;
synchronized (this) {
localBuffer1 = buffer1;
localBuffer2 = buffer2;
localIndex = index;
localTag = tag++;
}
int buffer1Start = localIndex - N >= 0 ? localIndex - N + 1 : 0;
int buffer1End = localIndex < N ? localIndex : N - 1;
printSlice(localBuffer1, buffer1Start, buffer1End, localTag);
if (localIndex >= N) {
printSlice(localBuffer2, 0, localIndex - N, localTag);
}
}
private void printSlice(byte[] buffer, int start, int end, int tag) {
for(int i = start; i <= end; i++) {
System.out.println(tag + ": "+ buffer[i]);
}
}

Just for the heck of it. How about using an ArrayList<Byte>? Say why not?
public class networkTraffic {
static ArrayList<Byte> networkMonitor; // ArrayList<Byte> reference
static { networkMonitor = new ArrayList<Byte>(100); } // Static Initialization Block
public void recordByte(Byte b){
networkMonitor.add(b);
while(networkMonitor.size() > 100){
networkMonitor.remove(0);
}
}
public void print() {
for (int i = 0; i < networkMonitor.size(); i++) {
System.out.println(networkMonitor.get(i));
}
// if(networkMonitor.size() < 100){
// for(int i = networkMonitor.size(); i < 100; i++){
// System.out.println("Emtpy byte");
// }
// }
}
}

Exact algorithm for task scheduling on N identical processors?

I'm looking for exact algorithm which find the best solution on task schedule in N identical processors.
The time of this algorithm is not important, the most important is one best solution (miminum time of all processors when the last task will be finished).
In theory equation describing this algorithm is as follows: P||Cmax
If anyone has an algorithm (especially in Java) or pseudocode I will be grathefull for help.
I tried write my own exact algoritm but id doesn't work :(. In the code below the permUtil is a class which corresponds for permutations.
Method args:
- tasks --> all task where index identity the task and value time
- op --> assignment processor (processor which assign a tasks)
// we have a global array op processors proc where index is identity and value is task schedule time on this processor
public void schedule(Byte[] tasks, int op)
{
PermUtil<Byte> permA = new PermUtil<Byte>(tasks);
Byte[] a;
// permutation of all tasks
while ((a = permA.next()) != null)
{
// assign tasks
for(int i=1; i< a.length; i++)
{
// get the b set from i to end
Byte[] b = Arrays.copyOfRange(a, i, a.length);
// all permutations off b set
PermUtil<Byte> permB = new PermUtil<Byte>(b);
while ((b = permB.next()) != null)
{
// task on assign processor
proc[op] = sum(Arrays.copyOfRange(a, 0, i));
if (op < proc.length)
schedule(b, ++op);
else
{
proc[++op] = sum(b);
}
}
}
}
}

Here's a blueprint of iterating over all the possible assignments.
In a real implementation you should replace long with BigInteger,
and move the array initialization outside the inner loop.
public void processOne(int nProcs, int nTasks, int[] assignment) {
/* ... */
}
public void checkAll(int nProcs, int nTasks) {
long count = power(nProcs, nTasks);
/* Iterate over all the possible assignments */
for (long j = 0; j < count; j++) {
int[] assignment = new int[nTasks];
for (int i = 0; i < nTasks; i++)
assignment[i] = (int) (j / power(nProcs, i) % nProcs);
processOne(nProcs, nTasks, assignment);
}
}
The trick is to encode an assignment in a number. Since an assignment represents nTasks decisions, each with nProcs outcomes, it can be thought of as a number in base nProcs having nTasks digits. Every such number corresponds to a valid assignment and every assignment has a unique number in such range. It's easy to iterate over all the assignments, since we're basically iterating over a range of integers.
All you have to do is to fill in the processOne(int, int, int[]) function, which should be rather straightforward.

Improving a prime sieve algorithm

I'm trying to make a decent Java program that generates the primes from 1 to N (mainly for Project Euler problems).
At the moment, my algorithm is as follows:
Initialise an array of booleans (or a bitarray if N is sufficiently large) so they're all false, and an array of ints to store the primes found.
Set an integer, s equal to the lowest prime, (ie 2)
While s is <= sqrt(N)
Set all multiples of s (starting at s^2) to true in the array/bitarray.
Find the next smallest index in the array/bitarray which is false, use that as the new value of s.
Endwhile.
Go through the array/bitarray, and for every value that is false, put the corresponding index in the primes array.
Now, I've tried skipping over numbers not of the form 6k + 1 or 6k + 5, but that only gives me a ~2x speed up, whilst I've seen programs run orders of magnitudes faster than mine (albeit with very convoluted code), such as the one here
What can I do to improve?
Edit: Okay, here's my actual code (for N of 1E7):
int l = 10000000, n = 2, sqrt = (int) Math.sqrt(l);
boolean[] nums = new boolean[l + 1];
int[] primes = new int[664579];
while(n <= sqrt){
for(int i = 2 * n; i <= l; nums[i] = true, i += n);
for(n++; nums[n]; n++);
}
for(int i = 2, k = 0; i < nums.length; i++) if(!nums[i]) primes[k++] = i;
Runs in about 350ms on my 2.0GHz machine.

While s is <= sqrt(N)
One mistake people often do in such algorithms is not precomputing square root.
while (s <= sqrt(N)) {
is much, much slower than
int limit = sqrt(N);
while (s <= limit) {
But generally speaking, Eiko is right in his comment. If you want people to offer low-level optimisations, you have to provide code.
update Ok, now about your code.
You may notice that number of iterations in your code is just little bigger than 'l'. (you may put counter inside first 'for' loop, it will be just 2-3 times bigger) And, obviously, complexity of your solution can't be less then O(l) (you can't have less than 'l' iterations).
What can make real difference is accessing memory effectively. Note that guy who wrote that article tries to reduce storage size not just because he's memory-greedy. Making compact arrays allows you to employ cache better and thus increase speed.
I just replaced boolean[] with int[] and achieved immediate x2 speed gain. (and 8x memory) And I didn't even try to do it efficiently.
update2
That's easy. You just replace every assignment a[i] = true with a[i/32] |= 1 << (i%32) and each read operation a[i] with (a[i/32] & (1 << (i%32))) != 0. And boolean[] a with int[] a, obviously.
From the first replacement it should be clear how it works: if f(i) is true, then there's a bit 1 in an integer number a[i/32], at position i%32 (int in Java has exactly 32 bits, as you know).
You can go further and replace i/32 with i >> 5, i%32 with i&31. You can also precompute all 1 << j for each j between 0 and 31 in array.
But sadly, I don't think in Java you could get close to C in this. Not to mention, that guy uses many other tricky optimizations and I agree that his could would've been worth a lot more if he made comments.

Using the BitSet will use less memory. The Sieve algorithm is rather trivial, so you can simply "set" the bit positions on the BitSet, and then iterate to determine the primes.

Did you also make the array smaller while skipping numbers not of the form 6k+1 and 6k+5?
I only tested with ignoring numbers of the form 2k and that gave me ~4x speed up (440 ms -> 120 ms):
int l = 10000000, n = 1, sqrt = (int) Math.sqrt(l);
int m = l/2;
boolean[] nums = new boolean[m + 1];
int[] primes = new int[664579];
int i, k;
while (n <= sqrt) {
int x = (n<<1)+1;
for (i = n+x; i <= m; nums[i] = true, i+=x);
for (n++; nums[n]; n++);
}
primes[0] = 2;
for (i = 1, k = 1; i < nums.length; i++) {
if (!nums[i])
primes[k++] = (i<<1)+1;
}

The following is from my Project Euler Library...Its a slight Variation of the Sieve of Eratosthenes...I'm not sure, but i think its called the Euler Sieve.
1) It uses a BitSet (so 1/8th the memory)
2) Only uses the bitset for Odd Numbers...(another 1/2th hence 1/16th)
Note: The Inner loop (for multiples) begins at "n*n" rather than "2*n" and also multiples of increment "2*n" are only crossed off....hence the speed up.
private void beginSieve(int mLimit)
{
primeList = new BitSet(mLimit>>1);
primeList.set(0,primeList.size(),true);
int sqroot = (int) Math.sqrt(mLimit);
primeList.clear(0);
for(int num = 3; num <= sqroot; num+=2)
{
if( primeList.get(num >> 1) )
{
int inc = num << 1;
for(int factor = num * num; factor < mLimit; factor += inc)
{
//if( ((factor) & 1) == 1)
//{
primeList.clear(factor >> 1);
//}
}
}
}
}
and here's the function to check if a number is prime...
public boolean isPrime(int num)
{
if( num < maxLimit)
{
if( (num & 1) == 0)
return ( num == 2);
else
return primeList.get(num>>1);
}
return false;
}

You could do the step of "putting the corresponding index in the primes array" while you are detecting them, taking out a run through the array, but that's about all I can think of right now.

I wrote a simple sieve implementation recently for the fun of it using BitSet (everyone says not to, but it's the best off the shelf way to store huge data efficiently). The performance seems to be pretty good to me, but I'm still working on improving it.
public class HelloWorld {
private static int LIMIT = 2140000000;//Integer.MAX_VALUE broke things.
private static BitSet marked;
public static void main(String[] args) {
long startTime = System.nanoTime();
init();
sieve();
long estimatedTime = System.nanoTime() - startTime;
System.out.println((float)estimatedTime/1000000000); //23.835363 seconds
System.out.println(marked.size()); //1070000000 ~= 127MB
}
private static void init()
{
double size = LIMIT * 0.5 - 1;
marked = new BitSet();
marked.set(0,(int)size, true);
}
private static void sieve()
{
int i = 0;
int cur = 0;
int add = 0;
int pos = 0;
while(((i<<1)+1)*((i<<1)+1) < LIMIT)
{
pos = i;
if(marked.get(pos++))
{
cur = pos;
add = (cur<<1);
pos += add*cur + cur - 1;
while(pos < marked.length() && pos > 0)
{
marked.clear(pos++);
pos += add;
}
}
i++;
}
}
private static void readPrimes()
{
int pos = 0;
while(pos < marked.length())
{
if(marked.get(pos++))
{
System.out.print((pos<<1)+1);
System.out.print("-");
}
}
}
}
With smaller LIMITs (say 10,000,000 which took 0.077479s) we get much faster results than the OP.

I bet java's performance is terrible when dealing with bits...
Algorithmically, the link you point out should be sufficient

Have you tried googling, e.g. for "java prime numbers". I did and dug up this simple improvement:
http://www.anyexample.com/programming/java/java_prime_number_check_%28primality_test%29.xml
Surely, you can find more at google.

Here is my code for Sieve of Erastothenes and this is actually the most efficient that I could do:
final int MAX = 1000000;
int p[]= new int[MAX];
p[0]=p[1]=1;
int prime[] = new int[MAX/10];
prime[0]=2;
void sieve()
{
int i,j,k=1;
for(i=3;i*i<=MAX;i+=2)
{
if(p[i])
continue;
for(j=i*i;j<MAX;j+=2*i)
p[j]=1;
}
for(i=3;i<MAX;i+=2)
{
if(p[i]==0)
prime[k++]=i;
}
return;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Threaded Radix Sort - java

Besides wrong class and method names (class should start with capital letter, method shouldn't), I can see that you are synchronizing all thread works on the array. So it's in fact not parallel at all.

Related

Java perfomance issue with Arrays.sort [duplicate]

how to improve this code?

"Last 100 bytes" Interview Scenario

Exact algorithm for task scheduling on N identical processors?

Improving a prime sieve algorithm

Categories

Resources