I have a Map<Integer,Integer> freqMap where each value is the frequency of the key
Now I want to build a priorityQueue from this map
I was expecting a constructor like PriorityQueue pq = new PriorityQueue(freqMap.keySet(), comparator);
But there is no such constructor.
Although I can construct it using a comparator and then add the keySet elements using addAll
But that will internally add elements one by one. In case I want to build a max heap out of the key set with comparator
as the values. I am not sure how can I do this.
My Thoughts:
one way could be that instead of Integer I make a custom class and wrap those integers and make the class implement a Comparable interface. Then when I pass that collection as a parameter to the priorityQueue constructor it should construct the priority Queue in O(n) time.
Whereas if I use the addAll method then probably it will take O(n log n) time. I am not sure though if my reasoning is correct here. It does seem a little complicated to just use a wrapper class that implements comparable for this tiny purpose.
The comparable will compare based on values so the key with the highest value should be on top.
construct the priority Queue in O(n) time
The only case when you can construct a new PriorityQueue in a linear time O(n) is when you already have another PriorityQueue.
In such case, the copy-constructor internally would invoke method initFromPriorityQueue() which would create a copy of the underlying array of the given PriorityQueue and would assign this copy to the underlying array of this (newly created) queue. The copying of elements would cost O(n).
But when you have a collection that is not a PriorityQueue there's no way to ensure that the elements are ordered as required. That means that you need to enqueue them one-by-one, there's no workaround. For each element, it would have a cost of O(log n). And the overall time complexity would be linear-logarithmic O(n log n).
Here's a quote from the documentation regarding the time complexity of operations:
Implementation note:
this implementation provides O(log(n)) time for
the enqueuing and dequeuing methods (offer, poll, remove() and add);
Since a Binary Heap (PriorityQueue is an implementation of the Binary Heap data structure) has the worst case time complexity O(log n) for inserting a new element, then inserting n would run in linear-logarithmic time.
Regarding the mechanism behind addAll(), as in many other collections it delegates to the method add() which a logarithmic worst case time complexity (see the implementation note quoted above).
Note
All the information provided above is relevant for the PriorityQueue class from the JDK, which is implemented as a Binary Heap (don't confuse this class with the Priority queue data structure).
There are many ways to implement Heap data structure. And some of them like Fibonacci Heap have amortized O(1) time complexity for insertion, which allows populating them with n elements in linear time O(n). In such implementation would be included in the JDK in the future, then almost certainly it would not replace the current PriorityQueue implementation, but rather would be introduced as a new class (that's how Java is being developed since it's earlier days: new things come, almost nothing goes away).
Related
Of my knowledge, there are the following implementations:
ArrayList
LinkedList
Vector
Stack
(based on http://tutorials.jenkov.com/java-collections/list.html pls correct if wrong)
ArrayList is a dynamic array implementation, so, as array, get is O(1), LinkedList has O(1) for get from Head, Vector and Stack are based on ArrayList, hence, O(1).
So in EVERY case get(0) on any built-in (cause you can make your own, for a specific purpose on making get(0) TS of O(n!)) implementation of List if O(1)?
Is get(0) on java.util.List always O(1)?
Let us assume that there is a parameter N which stands for the length of the list1.
For the 4 implementations of List that you mentioned, get(0) is indeed an O(1) operation:
ArrayList, Vector and Stack all implement get(i) using array subscripting and that is an O(1) operation.
LinkedList.get(i) involves i link traversals which is O(i). But if i is a constant, that reduces to O(1).
However there are other "built in" implementations of List. Indeed, there are a considerable number of them if you include the various non-public implementations, such as the List classes that implement sublists, unmodifiable lists, and so on. Generalizing from those 4 to "all of them" is not sound2.
But get(0) won't be O(1) for all possible implementations of List.
Consider a simple linked list where the elements are chained in the reverse order. Since get(0) needs to traverse to the end of the list, which is N link traversals: O(N).
Consider a list that is fully populated from the rows in a database query's result set the first time that you attempt to retrieve a list element. The first get call will be at least O(N) because you are fetching N rows. (It could be worse than O(N) if the database query is not O(N).) So the worst case complexity for any call to get is O(N) ... or worse.
Indeed, with a some ingenuity, one could invent a custom list where get(0) has any Big-O complexity that you care to propose.
1 - I am being deliberately vague here. On the one hand, we need to identify a variable N denoting the "problem" size for complexity analysis to make sense. (The length of the list is the obvious choice.) On the other hand, the length of a List is a surprisingly "rubbery" concept when you consider all of the possible ways to implement the interface.
2 - I assume that you are asking this question because you want to write some library code that relies on List.get(0) being O(1). Since you can't prevent someone from using your library with a non-builtin list implementation, your "assume it is builtin" constraint in your question doesn't really help ... even if we could check all possible (past, current or future) builtin List implementations for you.
Ignoring custom implementations, and only looking at built-in implementations, like suggested at the end of the question, you still cannot say that get(0) will be O(1) regardless of list size.
As an example, calling get(0) on a sublist based on a LinkedList will be O(n):
List<Integer> list = new LinkedList<>(Arrays.asList(1,2,3,4,5,6,7,8,9));
List<Integer> subList = list.subList(4, 8);
Integer num = subList.get(0); // <===== O(n), not O(1)
In that code, subList.get(0) internally calls list.get(4), which has O(n) time complexity.
Yes, for all implementations of List you mentioned get(0) is O(1).
I have this my code:
#Nullable
#Value.Default
default List<String> write() {
return new LinkedList<>();
}
And DeepCode IntelliJ plugin indicates that LinkedList can lead to unnecessary performance overhead if the List is randomly accessed. And ArrayList should be used instead.
What is this performance overhead that LinkedList have over ArrayList? Is it really much of a difference as what DeepCode suggest?
LinkedList and ArrayList have different performance characteristics as described in the JavaDoc. Inserting into a LinkedList is cheap, especially at the front and back. Traversing a LinkedList in sequence, e.g. with streams or foreach, is (relatively) cheap.
On the other hand, random access e.g. with get(n) is slow, as it takes O(n).
ArrayLists on the other hand do random access in O(1). Inserting on the other hand runs in amortized constant time:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
The main advantage of LinkedList is that it allows for fast inserting and deletion at the front/end and via the iterator. A typical usage scenario is if you use the LinkedList as a Queue or Deque (it actually implements those two interfaces as well).
So, it depends on what you are doing. If you have frequent random access, use ArrayList. If you have frequent sequential access (via the iterator) and adding/removing from front/back, e.g. because you use it as a Queue, use LinkedList.
If you add at arbitrary positions, e.g. via add(int index, E element), a LinkedList has to traverse the list first, making insertion O(n), having no benefit over ArrayList (which has to shift the subsequent elements down and eventually resize the underlying array, which again, is amortized O(n) as well).
In practice, I'd only choose LinkedList if there is a clear need for it, otherwise I'd use ArrayList as the default choice. Note that if you know the number of elements, you can size an ArrayList properly and thus avoid resizing, making the disadvantages of ArrayList even smaller.
https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html
https://docs.oracle.com/javase/8/docs/api/java/util/LinkedList.html
See also https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/ (thanks to #Leprechaun for providing this resource)
Correct me if I'm wrong, but I think the PriorityQueue(Collection c) constructor will create a min-heap from a collection in time O(n). However, I couldn't find a constructor where I can pass both a collection and a comparator (in order to convert the min-heap to a max-heap). So I was wondering if there is a way to construct a max-heap from an array (say, int array) in O(n) using PriorityQueue?
No, having a set of elements arranged in a min-heap does not provide any advantage for rearranging them into a max-heap. Also, you seem to be assuming that the PriorityQueue constructors that accept a collection have O(n) asymptotic complexity. That's plausible -- even likely -- but it is not documented, so it is not safe to rely on it.
In Java Collection Framework, a lot of implmentations mention about their performance in the Javadoc. For example, HashSet's says:
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets).
And ArrayList's says:
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time.
But LinkedList's says nothing about its performance.
I believe pop and push method of LinkedList runs in constant time as linked list in computer science does. But I'm worry that It is okay to suppose that when I implement a method which has a LinkedList parameter.
Does LinkedList has any reason not to say its performance?
The javadoc for HashMap.get(Object) doesn't guarantee O(1) performance either, yet it is well known for being so.
Javadoc is about the contract, not the implementation. API contracts typically concern themselves with behaviour, not performance SLAs.
Specific implementation choices that impact the contract may be documented in javadoc, but that still falls under the subject of "contract".
I have an unsorted Collection of objects [that are comparable], is it possible to get a sub list of the collection of the list without having to call sort?
I was looking at the possibility of doing a SortedList with a limited capacity, but that didn't look like the right option.
I could easily write this, but I was wondering if there was another way.
I am not able to modify the existing collection's structure.
Since you don't want to call sort(), it seems like you are trying to avoid an O(n log(n)) runtime cost. There is actually a way to do that in O(n) time -- you can use a selection algorithm.
There are methods to do this in the Guava libraries (Google's core Java libraries); look in Ordering and check out:
public <E extends T> List<E> Ordering.leastOf(Iterable iterable, int k)
public <E extends T> List<E> Ordering.greatestOf(Iterable iterable, int k)
These are implementations of quickselect, and since they're written generically, you could just call them on your Set and get a list of the k smallest things. If you don't want to use the entire Guava libraries, the docs link to the source code, and I think it should be straightforward to port the methods to your project.
If you don't want to deviate too far from the standard libraries, you can always use a sorted set like TreeSet, though this gets you logarithmic insert/remove time instead of the nice O(1) performance of the hash-based Set, and it ends up being O(n log(n)) in the end. Others have mentioned using heaps. This will also get you O(n log(n)) running time, unless you use some of the fancier heap variants. There's a fibonacci heap implementation in GraphMaker if you're looking for one of those.
Which of these makes sense really depends on your project, but I think that covers most of the options.
I would probably create a sorted set. Insert the first N items from your unsorted collection into your sorted set. Then for the remainder of your unsorted collection:
insert each item in the sorted set
delete the largest item from the sorted set
Repeat until you've processed all items in the unsorted collection
Yes, you can put all of them into a max heap data structure with a fixed size of N, conditionally, if the item is smaller than the largest in the max heap (by checking with the get() "peek" method). Once you have done so they will, by definition, be the N smallest. Optimal implementations will perform with O(M)+lg(N) or O(M) (where M is the size of the set) performance, which is theoretically fastest. Here's some pseudocode:
MaxHeap maxHeap = new MaxHeap(N);
for (Item x : mySetOfItems) {
if (x < maxHeap.get()) {
maxHeap.add(x);
}
}
The Apache Commons Collections class PriorityBuffer seems to be their flagship binary heap data structure, try using that one.
http://en.wikipedia.org/wiki/Heap_%28data_structure%29
don't you just want to make a heap?