I have the below code, I wanted to know what is the time complexity of this code when I am using PriorityQueue.
I have a listOfNumbers of size N
Queue<Integer> q = new PriorityQueue<>();
q.addAll(listOfNumbers);
while(q.size()>1) {
q.add(q.poll()+q.poll()); // add sum of 2 least elements back to Queue
}
As per this post : Time Complexity of Java PriorityQueue (heap) insertion of n elements?
O(log n) time for the enqueing and dequeing methods (offer, poll, remove() and add)
Now how to calculate the time when I am adding the element back to Queue again.
The running time of your program is log(n) + log(n-1) + ... + log(1).
This is log(n!) (by repeated application of the log rule log(a) + log(b) = log(ab)).
log(n!) is Theta(n log n) see Is log(n!) = Θ(n·log(n))?
So your program runs in Theta(n log n) time.
On q.add(q.poll()+q.poll()); the number of element in queue is always O(N). So, the enqueue still works in O(log(N)).
I am thinking about implementing a lock free circular array. One problem is maintaining the head and tail pointers in a lock free manner. The code I have in mind is:
int circularIncrementAndGet(AtomicInteger i) {
i.compareAndSet(array.length - 1, -1);
return i.incrementAndGet();
}
Then I would do something like:
void add(double value) {
int idx = circularIncrementAndGet(tail);
array[idx] = value;
}
(Note that if the array is full old values will be overwritten, I am fine with that).
Does anyone sees a problem with this design? I suspect there might be a race condition I am not seeing.
A simpler approach is to use a power of 2 size and do the following.
final double[] array;
final int sizeMask;
final AtomicInteger i = new AtomicInteger();
public CircularBuffer(int size) {
assert size > 1 && ((size & (size -1)) == 0); // test power of 2.
array = new double[size];
sizeMask = size -1;
}
void add(double value) {
array[i.getAndIncrement() & sizeMask] = value;
}
Check out disruptor : http://lmax-exchange.github.io/disruptor/, it's an open-source lock-free circular buffer in Java.
Yes, there is a race condition.
Say i = array.length - 2, and two threads enter circularIncrementAndGet():
Thread 1: i.compareAndSet(array.length - 1, -1) results in i = array.length - 2
Thread 2: i.compareAndSet(array.length - 1, -1) results in i = array.length - 2
Thread 1: i.incrementAndGet() results in i = array.length - 1
Thread 2: i.incrementAndGet() results in i = array.length
leading to an ArrayIndexOutOfBoundsException when Thread 2 reaches array[idx] = value (and on all subsequent calls to add() until i overflows).
The solution proposed by #Peter Lawrey does not suffer from this problem.
If you stick with the following constraints:
Only one thread is allowed to modify the head pointer at any time
Only one thread is allowed to modify the tail pointer at any time
Dequeue-on-empty gives a return value indicating nothing was done
Enqueue-on-full gives a return value indicating nothing was done
You don't keep any count of how many values are stored in the queue.
You 'waste' one index in the array that will never be used, so that you can tell when the array is full or empty without having to keep count.
It is possible to implement a circular array/queue.
The enqueuing thread owns the tail pointer. The dequeueing thread owns the head pointer. Except for one condition, these two threads don't share any state so far, and so there are no problems.
That condition is testing for emptyness or fullness.
Consider empty to mean that head == tail; Consider full to mean tail == head - 1 modulo array size. Enqueue has to check to see if the queue is full, dequeue has to check to see if the queue is empty. You need to waste one index in the array to detect the difference between full and empty - if you enqueued into that last bucket, then full would be head == tail and empty would be head == tail and now you deadlock - you think you're empty and full at the same time, so no work would get done.
In performing these checks, its possible that one value could be updated while being compared. However since these two values are monotonically increasing, there is no correctness problem:
If, in the dequeue method, the head == tail computes to be true during the comparison, but tail moves forward just afterward, no problem - you thought the array was empty when it actually wasn't, but no big deal, you'll just return false from the dequeue method and try again.
If, in the enqueue method, the tail == head - 1 computes to be true, but just after so the head increments, then you'll think the array was full when it really wasn't, but again, no big deal, you'll just return false from enqueue and try again.
This is the design used behind the implementation I found in Dr. Dobb's years ago, and it has served me well:
http://www.drdobbs.com/parallel/lock-free-queues/208801974
I am trying to parallelize a prime number counter as an exercise.
I have refactored the original code and separated the long loops from others so that I can parallelize them.
Now I have the following code and multithreading is looking difficult as I need to keep track of the primes found (in order) and count the number of primes found.
nthPrime(long n) gets the number of primes to search for. returns the nth prime.
count is an ArryList
public static long nthPrime(long n) {
count.add((long) 1);
if (n < 2) {
count.add((long) 3);
return getCount();
}
count.add((long) 3);
if (n == 2) {
return getCount();
}
step = 4;
candidate = 5;
checker(n, step, candidate);
return getCount();
}
private static long checker(long n, int step, long candidate) {
while (count.size() < n) {
if (Checker.isPrime(candidate)) {
// checks the number for possible prime
count.add(candidate);
}
step = 6 - step;
candidate += step;
}
return getCount();
}
Any ideas on using util.concurrent or threading to parallelize this?
Thanks
Two pieces of advice:
Before you start, you need to turn your existing non-parallel code into something that 1) works and 2) is readable. Attempting to pararllelize it in its current form is going to lead to failure.
I can see bugs (I think)
I can't see evidence that you understand the basic sieving algorithm ... which is what it appears you are trying to implement here.
Anyone can take an algorithm split it into bits and use multiple Java threads to execute the bits. The difficulty is coming up with a scheme which works, and where your efforts actually give a worthwhile speedup. That requires:
a good understanding of the problem and the applicable algorithms,
identifying the part of the problem / algorithm that are amenable to parallelization,
understanding of the overheads and potential bottlenecks in Java multithreading, and
understanding of the correctness issues; e.g. where and how to synchronize.
I can't give you a lesson on these things in the space of a StackOverflow Answer. You need a text book.
I was making my way through project Euler, and I came across a combination problem. Combination logic means working out factorials. So, I decided to create a factorial method. And then I hit upon a problem - since I could quite easily use both iteration and recursion to do this, which one should I go for? I quickly wrote 2 methods - iterative:
public static long factorial(int num) {
long result = 1;
if(num == 0) {
return 1;
}
else {
for(int i = 2; i <= num; i++) {
result *= i;
}
return result;
}
and recursive:
public static long factorial(int num) {
if(num == 0) {
return 1;
}
else {
return num * factorial(num - 1);
}
}
If I am (obviously) talking about speed and functionality here, which one should I use? And, in general, is one of the techniques generally better than the other (so if I come across this choice later, what should I go for)?
Both are hopelessly naive. No serious application of factorial would use either one. I think both are inefficient for large n, and neither int nor long will suffice when the argument is large.
A better way would be to use a good gamma function implementation and memoization.
Here's an implementation from Robert Sedgewick.
Large values will require logarithms.
Whenever you get an option to chose between recursion and iteration, always go for iteration because
1.Recursion involves creating and destroying stack frames, which has high costs.
2.Your stack can blow-up if you are using significantly large values.
So go for recursion only if you have some really tempting reasons.
I was actually analyzing this problem by time factor.
I've done 2 simple implementations:
Iterative:
private static BigInteger bigIterativeFactorial(int x) {
BigInteger result = BigInteger.ONE;
for (int i = x; i > 0; i--)
result = result.multiply(BigInteger.valueOf(i));
return result;
}
And Recursive:
public static BigInteger bigRecursiveFactorial(int x) {
if (x == 0)
return BigInteger.ONE;
else
return bigRecursiveFactorial(x - 1).multiply(BigInteger.valueOf(x));
}
Tests both running on single thread.
It turns out that Iterative is slightly faster only with small arguments. When I put n bigger than 100 recursive solution was faster.
My conclussion? You never can say that iterative solution is faster than recursive on JVM. (Still talking only about time)
If You're intrested, whole way I get this conclussion is HERE
If You're intrested in deeper understanding difference between this 2 approaches, I found really nice description on knowledge-cess.com
There's no "this is better, that is worse" for this question. Because modern computers are so strong, in Java it tends to be a personal preference as to which you use. You are doing many more checks and computations in the iterative version, however you are piling more methods onto the stack in the recursive version. Pros and cons to each, so you have to take it case by case.
Personally, I stick with iterative algorithms to avoid the logic of recursion.
Maybe I'm being misled by my profiler (Netbeans), but I'm seeing some odd behavior, hoping maybe someone here can help me understand it.
I am working on an application, which makes heavy use of rather large hash tables (keys are longs, values are objects). The performance with the built in java hash table (HashMap specifically) was very poor, and after trying some alternatives -- Trove, Fastutils, Colt, Carrot -- I started working on my own.
The code is very basic using a double hashing strategy. This works fine and good and shows the best performance of all the other options I've tried thus far.
The catch is, according to the profiler, lookups into the hash table are the single most expensive method in the entire application -- despite the fact that other methods are called many more times, and/or do a lot more logic.
What really confuses me is the lookups are called only by one class; the calling method does the lookup and processes the results. Both are called nearly the same number of times, and the method that calls the lookup has a lot of logic in it to handle the result of the lookup, but is about 100x faster.
Below is the code for the hash lookup. It's basically just two accesses into an array (the functions that compute the hash codes, according to profiling, are virtually free). I don't understand how this bit of code can be so slow since it is just array access, and I don't see any way of making it faster.
Note that the code simply returns the bucket matching the key, the caller is expected to process the bucket. 'size' is the hash.length/2, hash1 does lookups in the first half of the hash table, hash2 does lookups in the second half. key_index is a final int field on the hash table passed into the constructor, and the values array on the Entry objects is a small array of longs usually of length 10 or less.
Any thoughts people have on this are much appreciated.
Thanks.
public final Entry get(final long theKey) {
Entry aEntry = hash[hash1(theKey, size)];
if (aEntry != null && aEntry.values[key_index] != theKey) {
aEntry = hash[hash2(theKey, size)];
if (aEntry != null && aEntry.values[key_index] != theKey) {
return null;
}
}
return aEntry;
}
Edit, the code for hash1 & hash2
private static int hash1(final long key, final int hashTableSize) {
return (int)(key&(hashTableSize-1));
}
private static int hash2(final long key, final int hashTableSize) {
return (int)(hashTableSize+((key^(key>>3))&(hashTableSize-1)));
}
Nothing in your implementation strikes me as particularly inefficient. I'll admit I don't really follow your hashing/lookup strategy, but if you say it's performant in your circumstances, I'll believe you.
The only thing that I would expect might make some difference is to move the key out of the values array of Entry.
Instead of having this:
class Entry {
long[] values;
}
//...
if ( entry.values[key_index] == key ) { //...
Try this:
class Entry {
long key;
long values[];
}
//...
if ( entry.key == key ) { //...
Instead of incurring the cost of accessing a member, plus doing bounds checking, then getting a value of the array, you should just incur the cost of accessing the member.
Is there a random-access data type faster than an array?
I was interested in the answer to this question, so I set up a test environment. This is my Array interface:
interface Array {
long get(int i);
void set(int i, long v);
}
This "Array" has undefined behaviour when indices are out of bounds. I threw together the obvious implementation:
class NormalArray implements Array {
private long[] data;
public NormalArray(int size) {
data = new long[size];
}
#Override
public long get(int i) {
return data[i];
}
#Override
public void set(int i, long v) {
data[i] = v;
}
}
And then a control:
class NoOpArray implements Array {
#Override
public long get(int i) {
return 0;
}
#Override
public void set(int i, long v) {
}
}
Finally, I designed an "array" where the first 10 indices are hardcoded members. The members are set/selected through a switch:
class TenArray implements Array {
private long v0;
private long v1;
private long v2;
private long v3;
private long v4;
private long v5;
private long v6;
private long v7;
private long v8;
private long v9;
private long[] extras;
public TenArray(int size) {
if (size > 10) {
extras = new long[size - 10];
}
}
#Override
public long get(final int i) {
switch (i) {
case 0:
return v0;
case 1:
return v1;
case 2:
return v2;
case 3:
return v3;
case 4:
return v4;
case 5:
return v5;
case 6:
return v6;
case 7:
return v7;
case 8:
return v8;
case 9:
return v9;
default:
return extras[i - 10];
}
}
#Override
public void set(final int i, final long v) {
switch (i) {
case 0:
v0 = v; break;
case 1:
v1 = v; break;
case 2:
v2 = v; break;
case 3:
v3 = v; break;
case 4:
v4 = v; break;
case 5:
v5 = v; break;
case 6:
v6 = v; break;
case 7:
v7 = v; break;
case 8:
v8 = v; break;
case 9:
v9 = v; break;
default:
extras[i - 10] = v;
}
}
}
I tested it with this harness:
import java.util.Random;
public class ArrayOptimization {
public static void main(String[] args) {
int size = 10;
long[] data = new long[size];
Random r = new Random();
for ( int i = 0; i < data.length; i++ ) {
data[i] = r.nextLong();
}
Array[] a = new Array[] {
new NoOpArray(),
new NormalArray(size),
new TenArray(size)
};
for (;;) {
for ( int i = 0; i < a.length; i++ ) {
testSet(a[i], data, 10000000);
testGet(a[i], data, 10000000);
}
}
}
private static void testGet(Array a, long[] data, int iterations) {
long nanos = System.nanoTime();
for ( int i = 0; i < iterations; i++ ) {
for ( int j = 0; j < data.length; j++ ) {
data[j] = a.get(j);
}
}
long stop = System.nanoTime();
System.out.printf("%s/get took %fms%n", a.getClass().getName(),
(stop - nanos) / 1000000.0);
}
private static void testSet(Array a, long[] data, int iterations) {
long nanos = System.nanoTime();
for ( int i = 0; i < iterations; i++ ) {
for ( int j = 0; j < data.length; j++ ) {
a.set(j, data[j]);
}
}
long stop = System.nanoTime();
System.out.printf("%s/set took %fms%n", a.getClass().getName(),
(stop - nanos) / 1000000.0);
}
}
The results were somewhat surprising. The TenArray performs non-trivially faster than a NormalArray does (for sizes <= 10). Subtracting the overhead (using the NoOpArray average) you get TenArray as taking ~65% of the time of the normal array. So if you know the likely max size of your array, I suppose it is possible to exceed the speed of an array. I would imagine switch uses either less bounds checking or more efficient bounds checking than does an array.
NoOpArray/set took 953.272654ms
NoOpArray/get took 891.514622ms
NormalArray/set took 1235.694953ms
NormalArray/get took 1148.091061ms
TenArray/set took 1149.833109ms
TenArray/get took 1054.040459ms
NoOpArray/set took 948.458667ms
NoOpArray/get took 888.618223ms
NormalArray/set took 1232.554749ms
NormalArray/get took 1120.333771ms
TenArray/set took 1153.505578ms
TenArray/get took 1056.665337ms
NoOpArray/set took 955.812843ms
NoOpArray/get took 893.398847ms
NormalArray/set took 1237.358472ms
NormalArray/get took 1125.100537ms
TenArray/set took 1150.901231ms
TenArray/get took 1057.867936ms
Now whether you can in practice get speeds faster than an array I'm not sure; obviously this way you incur any overhead associated with the interface/class/methods.
Most likely you are partially misled in your interpretation of the profilers results. Profilers are notoriously overinflating the performance impact of small, frequently called methods. In your case, the profiling overhead for the get()-method is probably larger than the actual processing spent in the method itself. The situation is worsened further, since the instrumentation also interferes with the JIT's capability to inline methods.
As a rule of thumb for this situation - if the total processing time for a piece of work of known length increases more then two- to threefold when running under the profiler, the profiling overhead will give you skewed results.
To verify your changes actually do have impact, always measure performance improvements without the profiler, too. The profiler can hint you about bottlenecks, but it can also deceive you to look at places where nothing is wrong.
Array bounds checking can have a surprisingly large impact on performance (if you do comparably little else), but it can also be hard to clearly separate from general memory access penalties. In some trivial cases, the JIT might be able to eliminate them (there have been efforts towards bounds check elimination in Java 6), but this is AFAIK mostly limited to simple loop constructs like for(x=0; x<array.length; x++).
Under some circumstances you may be able to replace array access by simple member access, completely avoiding the bound checks, but its limited to the rare cases where you access you array exclusively by constant indices. I see no way to apply it to your problem.
The change suggested by Mark Peters is most likely not solely faster because it eliminates a bounds check, but also because it alters the locality properties of your data structures in a more cache friendly way.
Many profilers tell you very confusing things, partly because of how they work, and partly because people have funny ideas about performance to begin with.
For example, you're wondering about how many times functions are called, and you're looking at code and thinking it looks like a lot of logic, therefore slow.
There's a very simple way to think about this stuff, that makes it very easy to understand what's going on.
First of all, think in terms of the percent of time a routine or statement is active, rather than the number of times it is called or the average length of time it takes. The reason for that is it is relatively unaffected by irrelevant issues like competing processes or I/O, and it saves you having to multiply the number of calls by the average execution time and divide by the total time just to see if it is a big enough to even care about. Also, percent tells you, bottom line, how much fixing it could potentially reduce the overall execution time.
Second, what I mean by "active" is "on the stack", where the stack includes the currently running instruction and all the calls "above" it back to "call main". If a routine is responsible for 10% of the time, including routines that it calls, then during that time it is on the stack. The same is true of individual statements or even instructions. (Ignore "self time" or "exclusive time". It's a distraction.)
Profilers that put timers and counters on functions can only give you some of this information. Profilers that only sample the program counter tell you even less. What you need is something that samples the call stack and reports to you by line (not just by function) the percent of stack samples containing that line. It's also important that they sample the stack a) during I/O or other blockage, but b) not while waiting for user input.
There are profilers that can do this. I'm not sure about Java.
If you're still with me, let me throw out another ringer. You're looking for things you can optimize, right? and only things that have a large enough percent to be worth the trouble, like 10% or more? Such a line of code costing 10% is on the stack 10% of the time. That means if 20,000 samples are taken, it is on about 2,000 of them. If 20 samples are taken, it is on about 2 of them, on average. Now, you're trying to find the line, right? Does it really matter if the percent is off a little bit, as long as you find it? That's another one of those happy myths of profilers - that precision of timing matters. For finding problems worth fixing, 20,000 samples won't tell you much more than 20 samples will.
So what do I do? Just take the samples by hand and study them. Code worth optimizing will simply jump out at me.
Finally, there's a big gob of good news. There are probably multiple things you could optimize. Suppose you fix a 20% problem and make it go away. Overall time shrinks to 4/5 of what it was, but the other problems aren't taking any less time, so now their percentage is 5/4 of what it was, because the denominator got smaller. Percentage-wise they got bigger, and easier to find. This effect snowballs, allowing you to really squeeze the code.
You could try using a memoizing or caching strategy to reduce the number of actual calls. Another thing you could try if you're very desperate is a native array, since indexing those is unbelievably fast, and JNI shouldn't invoke toooo much overhead if you're using parameters like longs that don't require marshalling.