I have the below code, I wanted to know what is the time complexity of this code when I am using PriorityQueue.
I have a listOfNumbers of size N
Queue<Integer> q = new PriorityQueue<>();
q.addAll(listOfNumbers);
while(q.size()>1) {
q.add(q.poll()+q.poll()); // add sum of 2 least elements back to Queue
}
As per this post : Time Complexity of Java PriorityQueue (heap) insertion of n elements?
O(log n) time for the enqueing and dequeing methods (offer, poll, remove() and add)
Now how to calculate the time when I am adding the element back to Queue again.
The running time of your program is log(n) + log(n-1) + ... + log(1).
This is log(n!) (by repeated application of the log rule log(a) + log(b) = log(ab)).
log(n!) is Theta(n log n) see Is log(n!) = Θ(n·log(n))?
So your program runs in Theta(n log n) time.
On q.add(q.poll()+q.poll()); the number of element in queue is always O(N). So, the enqueue still works in O(log(N)).
Related
Does the time complexity change in these two implementation of getting the count of nodes in a Linkedlist ?
private int getCountIterative() {
Node start = head;
int count = 0;
while (start != null)
{
count++;
start = start.next;
}
return count;
}
private int getCountRecursive(Node node) {
if (node == null)
return 0;
return 1 + getCountRecursive(node.next);
}
No, the time complexity won't change.
However the performance and overall run time will usually be worse for recursive solution because Java doesn't perform Tail Call Optimization.
TL;DR: it's the same complexitiy
To calculate the complexity of an operation (like a search or sort algorithm - or your example, the count), you need to identify the dominating operation.
For searching and sorting, it's usually comparisons. What is your dominating operation? Let's assume it's node.next, the lookup of the next node.
Then, both approaches have O(n) operations - so it's the same complexity.
Please be aware that this time complexity is a simplification. There are factors ignored, like the overhead of function calls. So, it's the same complexity, but that doesn't necessarily tell you which version is faster.
I implemented a basic sorting algorithm in Java, and compared its performance to those of native methods (Arrays.sort() and Arrays.parallelSort()). The program is as follows.
public static void main(String[] args) {
// Randomly populate array
int[] array = new int[999999];
for (int i = 0; i < 999999; i++)
array[i] = (int)Math.ceil(Math.random() * 100);
long start, end;
start = System.currentTimeMillis();
Arrays.sort(array);
end = System.currentTimeMillis();
System.out.println("======= Arrays.sort: done in " + (end - start) + " ms ========");
start = System.currentTimeMillis();
Arrays.parallelSort(array);
end = System.currentTimeMillis();
System.out.println("======= Arrays.parallelSort: done in " + (end - start) + " ms ========");
start = System.currentTimeMillis();
orderArray(array);
end = System.currentTimeMillis();
System.out.println("======= My way: done in " + (end - start) + " ms ========");
}
private static int[] orderArray(int[] arrayToOrder) {
for (int i = 1; i < arrayToOrder.length; i++) {
int currentElementIndex = i;
while (currentElementIndex > 0 && arrayToOrder[currentElementIndex] < arrayToOrder[currentElementIndex-1]) {
int temp = arrayToOrder[currentElementIndex];
arrayToOrder[currentElementIndex] = arrayToOrder[currentElementIndex-1];
arrayToOrder[currentElementIndex-1] = temp;
currentElementIndex--;
}
}
return arrayToOrder;
}
When I run this program, my custom algorithm consistently outperforms the native queries, by orders of magnitude, on my machine. Here is a representative output I got:
======= Arrays.sort: done in 67 ms ========
======= Arrays.parallelSort: done in 26 ms ========
======= My way: done in 4 ms ========
This is independent of:
The number of elements in the array (999999 in my example)
The number of times the sort is performed (I tried inside a for loop and iterated a large number of times)
The data type (I tried with an array of double instead of int and saw no difference)
The order in which I call each ordering algorithm (does not affect the overall difference of performance)
Obviously, there's no way my algorithm is actually better than the ones provided with Java. I can only think of two possible explanations:
There is a flaw in the way I measure the performance
My algorithm is too simple and is missing some corner cases
I expect the latter is true, seen as I used a fairly standard way of measuring performance with Java (using System.currentTimeMillis()). However, I have extensively tested my algorithm and can find no fallacies as of yet - an int has predefined boundaries (Integer.MIN_VALUE and MAX_VALUE) and cannot be null, I can't think of any possible corner case I've not covered.
My algorithm's time complexity (O(n^2)) and the native methods' (O(n log(n)))), which could obviously cause an impact. Again, however, I believe my complexity is sufficient...
Could I get an outsider's look on this, so I know how I can improve my algorithm?
Many thanks,
Chris.
You're sorting an array in place, but you didn't re-scramble the array between each trail. This means you're sorting the best case scenario. In between each call to to an array sorting method you can re-create the array.
for (int i = 0; i < TEST_SIZE; i++)
array[i] = (int)Math.ceil(Math.random() * 100);
After doing this you will notice your algorithm is about 100 times slower.
That said, this is not the best way to compare the methods in the first place. At a minimum you should be sorting the same original array for each different algorithm. You should also perform multiple iterations over each algorithm and average the response. The result from a single trial will be spurious and not reliable as a good comparison.
I am coming up a very bold idea. That is I want to use a HashMap instead of Database to store data for a Chat App.
So, when the user send a chat message, the chat message of that particular user will be stored into a HashMap using storeMsg().
Each user will have a separated chat room. Every 5 seconds, the chat room of that particular user will send a getMsg() method to retrieve the latest message inside that chat room. After it retrieve the message, it will remove all the messages relating to that chat room of that particular user so that we can avoid the overhead.
So, only users exist in that chat room can see the messages, the message can be just appended one by one. The new users who enter that chat room lately will not be able to see the previous messages. This is similar to peer to peer Chat.
Each user has a unique String username such as "tomhan12", "Mary2","123cat", etc.
public void storeMsg(String userName, String message){
hMap.put(userName, message);
}
public String getMsg(String userName){
return hMap.get(userName);
}
So, my question is that if the hMap has Keys that are Strings & if that hMap has like millions of entries, then will speed of hMap.get(str) be affected?
Can we convert the String userName into a unique integer number & then "hMap.put(thatUniqueIntegerNumber, message)" for higher performance? or the HashMap did that for us so we don't need to do that?
HashMap's get has an expected constant running time, which means its running time shouldn't depend on the size of the HashMap. This, of course, relies on a decent implementation of the hashCode method of your key, but your key is String, so it shouldn't be a problem.
That said, using a large HashMap (or any other large data structure) consumes a large amount of memory, so you should pay attention that you are not running into lack of memory issues, which would slow down your application.
HashMap get() method provides O(1) time complexity if key hashCode() function has good distribution (it's true for strings). The size of the map does not affect operation performance (well technically when map gets bigger, collisions occur more often, but that's another story).
Replacing String keys with Integer keys will not give you any significant performance boost.
According to the javadoc:
https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
Since HashMap stores its values in hash buckets, you can generally get between O(1) and O(N) for a lookup depending on the amount of hash collisions the map hash.
Lets test this performance:
To test the performance of the Map, we going to run a test that first inserts 100/100000 items to the map, and then we call get("0-9") on the map in a loop to test the performance of the lookup. We use the following code to do this:
import java.util.HashMap;
public class HashMapTest {
public static void test(int items, boolean print) {
System.gc();
System.gc();
HashMap<String,Object> map = new HashMap<>();
for(int i = 0; i < items; i++) {
map.put("" + i, map);
}
long start = System.nanoTime();
for(int i = 0; i < 100000; i++) {
map.get("0");
map.get("1");
map.get("2");
map.get("3");
map.get("4");
map.get("5");
map.get("6");
map.get("7");
map.get("8");
map.get("9");
}
long end = System.nanoTime();
long time = end - start;
if(print) {
System.out.println("items: "+ items + " time: "+ time);
}
}
public static void main(String ... args) {
// warmup
for(int i = 0; i < 2; i++) {
test(100, false);
}
for(int i = 0; i < 2; i++) {
test(1000000, false);
}
// Real test:
for(int i = 0; i < 10; i++) {
test(100, true);
}
for(int i = 0; i < 10; i++) {
test(1000000, true);
}
}
}
Test results
items: 100 time: 11102830
items: 100 time: 12228567
items: 100 time: 34309933
items: 100 time: 36976824
items: 100 time: 34290557
items: 100 time: 19819022
items: 100 time: 14747533
items: 100 time: 15818922
items: 100 time: 15026368
items: 100 time: 16830762
items: 1000000 time: 12421862
items: 1000000 time: 13931351
items: 1000000 time: 13083504
items: 1000000 time: 11453028
items: 1000000 time: 13265455
items: 1000000 time: 11030050
items: 1000000 time: 11362288
items: 1000000 time: 11521082
items: 1000000 time: 11198296
items: 1000000 time: 11303685
items 100 min: 11102830
items 100 max: 36976824
items 1000000 min: 11030050
items 1000000 max: 13931351
If we analyze the test results, we see no "real" improvement in the access time i we have a factor of 1000 more items.
Theoretically HashMap#get(...) has O(1) guaranteed, if the map isn't overpopulated and items are distributed properly among the buckets. Practical this is implementation-dependant, but usually the Map slows down a bit, if it's overpopulated. In general a HashMap should have a load-factor below 0.7 to avoid overpopulation and keep the performance optimal. The slow-down will be small though (except for some extreme cases).
Question: does anybody know of a Java implementation (I have too little time/knowledge to develop my own right now) of a collection with the following characteristics?
fast add
fast random-access remove
fast minimum value
duplicates
Condensed (oversimplified) version of use case is:
I have a class that keeps track of 'time', call it TimeClass
Events start at monotonically increasing times (times are not unique), but can finish in any order
When events start they report their start time to TimeClass
When events finish they again report their start time to TimeClass
TimeClass adds an event's start time to a collection* when the event starts (fast add)
TimeClass removes an event's start time from that collection* when the event finishes (fast random-access remove)
TimeClass is capable of reporting the lowest not-yet-finished start time (fast minimum value)
* think of collection as: Collection<Time> implements Comparable<Time>
Because I'm not sure what the runtime behavior of my system (the system in which TimeClass lives) will be, I've quickly benchmarked the following scenarios using these collections: TreeMultiSet (Guava), MinMaxPriorityQueue (Guava), ArrayList.
Note, depending on the collection used, min value is achieved in different ways (remember elements are added in increasing order): TreeMultiSet.firstEntry().getElement(), MinMaxPriorityQueue.peekFirst(), ArrayList.get(0).
ADD 1,000,000:
TreeMultiSet: 00:00.897 (m:s.ms)
List: 00:00.068 (m:s.ms)
MinMaxPriorityQueue: 00:00.658 (m:s.ms)
ADD 1, REMOVE 1, REPEAT 1,000,000 TIMES:
TreeMultiSet: 00:00.673 (m:s.ms)
List: 00:00.416 (m:s.ms)
MinMaxPriorityQueue: 00:00.469 (m:s.ms)
ADD 10,000 IN SEQUENTIAL ORDER, REMOVE ALL IN SEQUENTIAL ORDER:
TreeMultiSet: 00:00.068 (m:s.ms)
List: 00:00.031 (m:s.ms)
MinMaxPriorityQueue: 00:00.048 (m:s.ms)
ADD 10,000 IN SEQUENTIAL ORDER, REMOVE ALL IN RANDOM ORDER:
TreeMultiSet: 00:00.046 (m:s.ms)
List: 00:00.352 (m:s.ms)
MinMaxPriorityQueue: 00:00.888 (m:s.ms)
Current thoughts:
I'm leaning towards using TreeMultiSet as it has the most stable performance and seems to degrade most gracefully. I WOULD LOVE MORE SUGGESTIONS
Thanks
--EDIT--
Example pseudo code of ADD ALL IN SEQUENTIAL ORDER, REMOVE ALL IN RANDOM ORDER:
benchmark(){
int benchmarkSize = 1000000;
int benchmarkRepetitions = 100;
Duration totalDuration = Duration.fromMilli(0);
TimeClass timeClass = new TimeClassImplementation();
for (int i = 0; i < benchmarkRepetitions; i++)
totalDuration += benchmarkRun(timeClass,benchmarkSize);
System.out.println(totalDuration);
}
Duration benchmarkRun(TimeClass timeClass, int count){
List<Time> times = createMonotonicallyIncreasingTimes(count)
// monotonically increasing times to add from
List<Time> timesToAddFrom = copy(times)
// random times to remove from
List<Time> timesToRemoveFrom = shuffleUniformly(copy(times))
Time startTime = now()
// add all times
for(Time time: timesToAddFrom) {
Time min = timeClass.addTimeAndGetMinimumValue(time);
// don't use min value
}
// remove all times
for(Time time: timesToRemoveFrom) {
Time min = timeClass.removeTimeAndGetMinimumValue(time);
// don't use min value
}
Time finishTime = now()
return finishTime - startTime;
}
Your best bet is a tree map:
http://docs.oracle.com/javase/7/docs/api/java/util/TreeMap.html
O(log n) pretty much for all operations.. you can get your keys back sorted.
There is also a MinMaxPriorityQueue from Google (guava)
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/MinMaxPriorityQueue.html
the remove is O(n) though, all other operations are O(log n)
I'm writing an algorithm which do a big loop over an integer array from the end to the beginning with a if condition inside. At the first time the condition is false the loop can be terminated.
So, with a for loop, if condition is false it continues to iterate with simple variables changes.
With a while loop with the condition as while parameter, the loop will stop once condition false and should save some iterations.
However, the while loop remains a little slower than the for loop!
But, if I put a int value as counter, and count iterations, the For loop as expected performed much more iterations.
However this time, the time execution of the mofified For method with the counter will be much more slower than the while method with a counter!
Any explanations?
here the code with a for loop:
for (int i = pairs.length - 1; i >= 0; i -= 2) {
//cpt++;
u = pairs[i];
v = pairs[i - 1];
duv = bfsResult.distanceMatrix.getDistance(u, v);
if (duv > delta) {
execute();
}
}
time execution: 6473
time execution with a counter: 8299
iterations counted: 2584401
here the code with the while loop:
int i = pairs.length - 1;
u = pairs[i];
v = pairs[i - 1];
duv = bfsResult.distanceMatrix.getDistance(u, v);
while (duv > delta) {
//cpt++;
execute();
u = pairs[i -= 2];
v = pairs[i - 1];
duv = bfsResult.distanceMatrix.getDistance(u, v);
}
time execution: 6632
time execution with a counter: 7163
iterations counted: 9793
Time is in ms, I repeated the experiment several times with different size intances, the measures remained almost the same. The execute() method updates the delta value. Method getDistance() is just a matrix int[][] access.
Thanks for any help.
Before you try to perform any performance tests on java I highly recommend you reading this article
http://www.ibm.com/developerworks/java/library/j-benchmark1/index.html
In a few words - when running for some time Hotspot-enabled JVM can optimize your code which will affect the results of tests. So you need proper technique to test performance of your code.
To ease the pain there is a library used for performing proper tests: http://ellipticgroup.com/html/benchmarkingArticle.html
You can find links to both parts of the article on this page.
Update: to help you start quicker with this here is what you just need to do:
Download bb.jar, jsci-core.jar, mt-13.jar found on the page
Put them on classpath
Rewrite your code so that while loop approach and for loop approach both go in separate implementations of Runnable or Callable interface
In your main method just invoke
System.out.println(new Benchmark(new WhileApproach()));
to show execution time for while-loop and obviously
System.out.println(new Benchmark(new ForApproach()));
to get info for for-loop
You do not have the same termination condition. For the while loop it's:
duv > delta
and for the for loop it's
i >= 0
The two scenarios are not equivalent. My guess is that the while loop condition becomes false way sooner than the for condition and therefore it executes less iterations.
When duv>delta the while-loop stops, but the for-loop continues. Both get the same result, but for continues checking. You should modify the for-loop like this:
if (duv > delta)
{
execute();
}
else break;