create PriorityQueue in O(n ) with custom comparator - java

I was trying to implement MST with Priorityqueue with custom comparator, but I am facing problem in building min-heap with it in O(n) time. The problem is only one constructor of Priorityqueue allows to create PriorityQueue in O(n), but it does not take any comparator as argument. I want it to use my custom comparator. Is there workaround for this problem ? PriorityQueue.addAll() will lose the purpose of using Min-heap for MST as it is O(nlogn) method. Here is my code.
ArrayList <edge>ar=new ArrayList<>();
for(int i=0;i<e;i++)
{
int u=ss.nextInt();
int v=ss.nextInt();
int w=ss.nextInt();
ar.add(new edge(u,v,w));
}
PriorityQueue <edge>pr=new PriorityQueue<edge>(ar);
And the comparator that I want to use:-
PriorityQueue <edge>ar=new PriorityQueue(11,new Comparator() {
#Override
public int compare(Object o1, Object o2) {
edge n1=(edge) o1;
edge n2=(edge) o2;
if(n1.w<n2.w)
{
return -1;
}
else if(n1.w==n2.w)
{
if((n1.u+n1.v+n1.w)<=(n2.u+n2.v+n2.w))
{
return -1;
}
else
{
return 1;
}
}
else
{
return 1;
}
}
});

If you have not min-heap-ordered your list elsewhere, you will not be able to new PriorityQueue(...) anything and somehow avoid the hit of creating your heap. The math here says it's O(n) for the average case, but it is still more than just iterating.
PriorityQueue<edge> pr = new PriorityQueue<edge>(ar, comp) {
PriorityQueue(List<edge> ar, Comparator<edge> c) {
this(c);
for(int i = 0; i < queue.length; i++) {
queue[i] = ar.get(i);
}
this.size = queue.length;
heapify(); // O(n), except that heapify is private and thus you can't call it!!!
}
}
Now I haven't tested this, it's just off the top of my head with some guidance from the PriorityQueue source, but it should point you in the right direction.
But sometime you have to pay the piper and create the heap-order and that be more than just iteration. It should still be on O(n) though, because of heapify.
Another option is to have edge implement Comparable<edge>. Then you can just PriorityQueue<edge> pr = new PriorityQueue(ar);
If you cannot control edge implements Comparable<edge> then you could compose a container class:
class EdgeContainer implements Comparable<EdgeContainer> {
private static final Comparator<edge> comp = ; // that comparator above
private final edge edge;
EdgeContainer(Edge edge) { this.edge = edge; }
public int compareTo(EdgeContainer e) { return comp.compare(edge, e.edge); }
public edge getEdge() { return edge; }
}
List <EdgeContainer>ar=new ArrayList<>();
for(int i=0;i<e;i++)
{
int u=ss.nextInt();
int v=ss.nextInt();
int w=ss.nextInt();
ar.add(new EdgeContainer(new edge(u,v,w)));
}
PriorityQueue<EdgeContainer> qr = new PriorityQueue(ar);

Java's PriorityQueue takes O(n) time to create a priority queue out of a collection passed to it. The mathematical proof has been given in CLSR chapter 6.4 (page 157 in 3rd edition). Intuitively, as the underlying array is mutated into a heap using siftDown or siftUp, the size of the number of the elements to loop over for the next sift operation also decreases leading to an O(n) time complexity.
But as discussed in the comments and as you have mentioned in your question, you cannot achieve this time complexity by using addAll(). The reason is adAll() is inherited from AbstractQueue and works by adding elements in the collection one by one to the queue which can lead to O(nlogn) time complexity.
So if having O(n) time complexity is an absolute requirement, you will have no option but to implement the Comparator interface for the class of objects contained in the collection. #corsiKa's answer nicely details this approach. Also note that even if you pass a collection directly to PriorityQueue, it will convert it to an array which is basically another O(n) operation.

Related

best collection for removing element

In my program i have a collection of edges ,they have to be ordered by the weight.
Somewhere in the program i have to process the collection , and each time i have to remove the maximum of the collection.
I have already used an ArrayList but i'm looking for a better solution(time efficiency):
public class Edge implements Comparable<Edge> {
private int weight;
public void setWeight(int weight) {
this.weight = weight;**
}
#Override
public int compareTo(Edge o) {
return o.weight - this.weight;
}
}
what i did :
private ArrayList<Edge> listOfEdges = new ArrayList<>();
// i suppose here adding some edges in the list
Collections.sort(listOfEdges);
for (int i = 0; i < listOfEdges.size(); i++) {
System.out.println(listOfEdges.get(i).getWeight() + " ");
}
how can i get&remove the maximum of the list.
I have tested a treeSet but the edges can have the same weight, So what is the perfect Sorted Collection that accept a duplicate values.
Thank you
In my program i have a collection of edges ,they have to be ordered by the weight... I have already used an ArrayList but i'm looking for a better solution(time efficiency):
A binary tree like structure, such as a heap or priority queue, is what you are looking for. Once you have your object ordering (by the Comparable interface) specified, the maximum can be obtained in O(1) time, and removed in O(log n) time for n edges.
how can i get&remove the maximum of the list.
peek and pop are the respective methods of a queue object implementation

Perform operation on n random distinct elements from Collection using Streams API

I'm attempting to retrieve n unique random elements for further processing from a Collection using the Streams API in Java 8, however, without much or any luck.
More precisely I'd want something like this:
Set<Integer> subList = new HashSet<>();
Queue<Integer> collection = new PriorityQueue<>();
collection.addAll(Arrays.asList(1,2,3,4,5,6,7,8,9));
Random random = new Random();
int n = 4;
while (subList.size() < n) {
subList.add(collection.get(random.nextInt()));
}
sublist.forEach(v -> v.doSomethingFancy());
I want to do it as efficiently as possible.
Can this be done?
edit: My second attempt -- although not exactly what I was aiming for:
List<Integer> sublist = new ArrayList<>(collection);
Collections.shuffle(sublist);
sublist.stream().limit(n).forEach(v -> v.doSomethingFancy());
edit: Third attempt (inspired by Holger), which will remove a lot of the overhead of shuffle if coll.size() is huge and n is small:
int n = // unique element count
List<Integer> sublist = new ArrayList<>(collection);
Random r = new Random();
for(int i = 0; i < n; i++)
Collections.swap(sublist, i, i + r.nextInt(source.size() - i));
sublist.stream().limit(n).forEach(v -> v.doSomethingFancy());
The shuffling approach works reasonably well, as suggested by fge in a comment and by ZouZou in another answer. Here's a generified version of the shuffling approach:
static <E> List<E> shuffleSelectN(Collection<? extends E> coll, int n) {
assert n <= coll.size();
List<E> list = new ArrayList<>(coll);
Collections.shuffle(list);
return list.subList(0, n);
}
I'll note that using subList is preferable to getting a stream and then calling limit(n), as shown in some other answers, because the resulting stream has a known size and can be split more efficiently.
The shuffling approach has a couple disadvantages. It needs to copy out all the elements, and then it needs to shuffle all the elements. This can be quite expensive if the total number of elements is large and the number of elements to be chosen is small.
An approach suggested by the OP and by a couple other answers is to choose elements at random, while rejecting duplicates, until the desired number of unique elements has been chosen. This works well if the number of elements to choose is small relative to the total, but as the number to choose rises, this slows down quite a bit because of the likelihood of choosing duplicates rises as well.
Wouldn't it be nice if there were a way to make a single pass over the space of input elements and choose exactly the number wanted, with the choices made uniformly at random? It turns out that there is, and as usual, the answer can be found in Knuth. See TAOCP Vol 2, sec 3.4.2, Random Sampling and Shuffling, Algorithm S.
Briefly, the algorithm is to visit each element and decide whether to choose it based on the number of elements visited and the number of elements chosen. In Knuth's notation, suppose you have N elements and you want to choose n of them at random. The next element should be chosen with probability
(n - m) / (N - t)
where t is the number of elements visited so far, and m is the number of elements chosen so far.
It's not at all obvious that this will give a uniform distribution of chosen elements, but apparently it does. The proof is left as an exercise to the reader; see Exercise 3 of this section.
Given this algorithm, it's pretty straightforward to implement it in "conventional" Java by looping over the collection and adding to the result list based on the random test. The OP asked about using streams, so here's a shot at that.
Algorithm S doesn't lend itself obviously to Java stream operations. It's described entirely sequentially, and the decision about whether to select the current element depends on a random decision plus state derived from all previous decisions. That might make it seem inherently sequential, but I've been wrong about that before. I'll just say that it's not immediately obvious how to make this algorithm run in parallel.
There is a way to adapt this algorithm to streams, though. What we need is a stateful predicate. This predicate will return a random result based on a probability determined by the current state, and the state will be updated -- yes, mutated -- based on this random result. This seems hard to run in parallel, but at least it's easy to make thread-safe in case it's run from a parallel stream: just make it synchronized. It'll degrade to running sequentially if the stream is parallel, though.
The implementation is pretty straightforward. Knuth's description uses random numbers between 0 and 1, but the Java Random class lets us choose a random integer within a half-open interval. Thus all we need to do is keep counters of how many elements are left to visit and how many are left to choose, et voila:
/**
* A stateful predicate that, given a total number
* of items and the number to choose, will return 'true'
* the chosen number of times distributed randomly
* across the total number of calls to its test() method.
*/
static class Selector implements Predicate<Object> {
int total; // total number items remaining
int remain; // number of items remaining to select
Random random = new Random();
Selector(int total, int remain) {
this.total = total;
this.remain = remain;
}
#Override
public synchronized boolean test(Object o) {
assert total > 0;
if (random.nextInt(total--) < remain) {
remain--;
return true;
} else {
return false;
}
}
}
Now that we have our predicate, it's easy to use in a stream:
static <E> List<E> randomSelectN(Collection<? extends E> coll, int n) {
assert n <= coll.size();
return coll.stream()
.filter(new Selector(coll.size(), n))
.collect(toList());
}
An alternative also mentioned in the same section of Knuth suggests choosing an element at random with a constant probability of n / N. This is useful if you don't need to choose exactly n elements. It'll choose n elements on average, but of course there will be some variation. If this is acceptable, the stateful predicate becomes much simpler. Instead of writing a whole class, we can simply create the random state and capture it from a local variable:
/**
* Returns a predicate that evaluates to true with a probability
* of toChoose/total.
*/
static Predicate<Object> randomPredicate(int total, int toChoose) {
Random random = new Random();
return obj -> random.nextInt(total) < toChoose;
}
To use this, replace the filter line in the stream pipeline above with
.filter(randomPredicate(coll.size(), n))
Finally, for comparison purposes, here's an implementation of the selection algorithm written using conventional Java, that is, using a for-loop and adding to a collection:
static <E> List<E> conventionalSelectN(Collection<? extends E> coll, int remain) {
assert remain <= coll.size();
int total = coll.size();
List<E> result = new ArrayList<>(remain);
Random random = new Random();
for (E e : coll) {
if (random.nextInt(total--) < remain) {
remain--;
result.add(e);
}
}
return result;
}
This is quite straightforward, and there's nothing really wrong with this. It's simpler and more self-contained than the stream approach. Still, the streams approach illustrates some interesting techniques that might be useful in other contexts.
Reference:
Knuth, Donald E. The Art of Computer Programming: Volume 2, Seminumerical Algorithms, 2nd edition. Copyright 1981, 1969 Addison-Wesley.
You could always create a "dumb" comparator, that will compare elements randomly in the list. Calling distinct() will ensure you that the elements are unique (from the queue).
Something like this:
static List<Integer> nDistinct(Collection<Integer> queue, int n) {
final Random rand = new Random();
return queue.stream()
.distinct()
.sorted(Comparator.comparingInt(a -> rand.nextInt()))
.limit(n)
.collect(Collectors.toList());
}
However I'm not sure it will be more efficient that putting the elements in the list, shuffling it and return a sublist.
static List<Integer> nDistinct(Collection<Integer> queue, int n) {
List<Integer> list = new ArrayList<>(queue);
Collections.shuffle(list);
return list.subList(0, n);
}
Oh, and it's probably semantically better to return a Set instead of a List since the elements are distincts. The methods are also designed to take Integers, but there's no difficulty to design them to be generic. :)
Just as a note, the Stream API looks like a tool box that we could use for everything, however that's not always the case. As you see, the second method is more readable (IMO), probably more efficient and doesn't have much more code (even less!).
As an addendum to the shuffle approach of the accepted answer:
If you want to select only a few items from a large list and want to avoid the overhead of shuffling the entire list you can solve the task as follows:
public static <T> List<T> getRandom(List<T> source, int num) {
Random r=new Random();
for(int i=0; i<num; i++)
Collections.swap(source, i, i+r.nextInt(source.size()-i));
return source.subList(0, num);
}
What it does is very similar to what shuffle does but it reduces it’s action to having only num random elements rather than source.size() random elements…
You can use limit to solve your problem.
http://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#limit-long-
Collections.shuffle(collection);
int howManyDoYouWant = 10;
List<Integer> smallerCollection = collection
.stream()
.limit(howManyDoYouWant)
.collect(Collectors.toList());
List<Integer> collection = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
int n = 4;
Random random = ThreadLocalRandom.current();
random.ints(0, collection.size())
.distinct()
.limit(n)
.mapToObj(collection::get)
.forEach(System.out::println);
This will of course have the overhead of the intermediate set of indexes and it will hang forever if n > collection.size().
If you want to avoid any non-constatn overhead, you'll have to make a stateful Predicate.
It should be clear that streaming the collection is not what you want.
Use the generate() and limit methods:
Stream.generate(() -> list.get(new Random().nextInt(list.size())).limit(3).forEach(...);
If you want to process the whole Stream without too much hassle, you can simply create your own Collector using Collectors.collectingAndThen():
public static <T> Collector<T, ?, Stream<T>> toEagerShuffledStream() {
return Collectors.collectingAndThen(
toList(),
list -> {
Collections.shuffle(list);
return list.stream();
});
}
But this won't perform well if you want to limit() the resulting Stream. In order to overcome this, one could create a custom Spliterator:
package com.pivovarit.stream;
import java.util.List;
import java.util.Random;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.function.Supplier;
public class ImprovedRandomSpliterator<T> implements Spliterator<T> {
private final Random random;
private final T[] source;
private int size;
ImprovedRandomSpliterator(List<T> source, Supplier<? extends Random> random) {
if (source.isEmpty()) {
throw new IllegalArgumentException("RandomSpliterator can't be initialized with an empty collection");
}
this.source = (T[]) source.toArray();
this.random = random.get();
this.size = this.source.length;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
int nextIdx = random.nextInt(size);
int lastIdx = size - 1;
action.accept(source[nextIdx]);
source[nextIdx] = source[lastIdx];
source[lastIdx] = null; // let object be GCed
return --size > 0;
}
#Override
public Spliterator<T> trySplit() {
return null;
}
#Override
public long estimateSize() {
return source.length;
}
#Override
public int characteristics() {
return SIZED;
}
}
and then:
public final class RandomCollectors {
private RandomCollectors() {
}
public static <T> Collector<T, ?, Stream<T>> toImprovedLazyShuffledStream() {
return Collectors.collectingAndThen(
toCollection(ArrayList::new),
list -> !list.isEmpty()
? StreamSupport.stream(new ImprovedRandomSpliterator<>(list, Random::new), false)
: Stream.empty());
}
public static <T> Collector<T, ?, Stream<T>> toEagerShuffledStream() {
return Collectors.collectingAndThen(
toCollection(ArrayList::new),
list -> {
Collections.shuffle(list);
return list.stream();
});
}
}
And then you could use it like:
stream
.collect(toLazyShuffledStream()) // or toEagerShuffledStream() depending on the use case
.distinct()
.limit(42)
.forEach( ... );
A detailed explanation can be found here.
If you want a random sample of elements from a stream, a lazy alternative to shuffling might be a filter based on the uniform distribution:
...
import org.apache.commons.lang3.RandomUtils
// If you don't know ntotal, just use a 0-1 ratio
var relativeSize = nsample / ntotal;
Stream.of (...) // or any other stream
.parallel() // can work in parallel
.filter ( e -> Math.random() < relativeSize )
// or any other stream operation
.forEach ( e -> System.out.println ( "I've got: " + e ) );

PriorityQueue has objects with the same priority

I'm using a priority queue to sort and use a large number of custom objects. The objects have a "weight" that is their natural ordering. However, different objects that are inserted into the priority queue may have the same "weight". In such cases, I want the priority queue to order them in the same order in which they were put into the queue.
For example, if I add in CustomObjects A,B,C,D in that order, all with the same "weight", than the priority queue should return them in that order as well - even if I poll one or more of the objects before adding in the others.
Here is the CompareTo for my custom object:
public int compareTo(CustomObject o) {
int thisWeight = this.weight;
int thatWeight = o.weight;
if(thisWeight < thatWeight){
return -1;
}
else{
return 1;
}
}
While I thought that this would maintain that initial order, it doesn't. This occurs when I input A,B,C with weight 1; poll A; and add D,E also with weight 1. Somehow, D and E are sorted after B, but before C.
I am aware that the Iterator for PriorityQueues doesn't return the correct ordering, so I am limited in my ability to look at the ordering - however I can see the order that the elements leave the queue and it clearly doesn't follow the path that I want it to.
Suggestions?
If you need to have an ordering according the insertion order you need to use an extra element for timestamp.
I.e. on insertions and equal weight use timestamp to see which element was inserted first.
So CustomObject should be something like:
class CustomObject {
int weight;
long timestamp;
}
And the comparison should be:
public int compareTo (CustomObject o) {
int thisWeight = this.weight;
int thatWeight = o.weight;
if (thisWeight != thatWeight) {
return thisWeight - thatWeight;
}
else {
return this.timestamp - o.timestamp;
}
}
The smaller timestamp means it was inserted earlier so you keep in the insertion order.
You could also use a "logical" time by maintaining a counter that you update on each add or remove.
You could use an automatically-incremented sequence number as a secondary key, and use it to break ties.
Javadoc for PriorityBlockingQueue includes an example of this technique:
Operations on this class make no guarantees about the ordering of elements with equal priority. If you need to enforce an ordering, you can define custom classes or comparators that use a secondary key to break ties in primary priority values. For example, here is a class that applies first-in-first-out tie-breaking to comparable elements. To use it, you would insert a new FIFOEntry(anEntry) instead of a plain entry object.
class FIFOEntry<E extends Comparable<? super E>>
implements Comparable<FIFOEntry<E>> {
final static AtomicLong seq = new AtomicLong();
final long seqNum;
final E entry;
public FIFOEntry(E entry) {
seqNum = seq.getAndIncrement();
this.entry = entry;
}
public E getEntry() { return entry; }
public int compareTo(FIFOEntry<E> other) {
int res = entry.compareTo(other.entry);
if (res == 0 && other.entry != this.entry)
res = (seqNum < other.seqNum ? -1 : 1);
return res;
}
}

Returning two object from a function without an ArrayList

I'm not sure how to word this correctly, but I'm told to write a method that would return the largest course object (the course with the most students). If there are two courses that have the same number of students, it would return both.
The second part of the problem is what troubles me, because I'm not allowed to make another ArrayList other than the ones he specified (which is already used). Is there a way to keep track of two+ objects without using an list/hash?
This is what I've done so far, but it only returns one course object.
public Course largestEnrollment(){
int size = 0;
Course p = null;
for (Integer c : courseList.keySet()){
if (courseList.get(c).getClassList().size() > size){
p = courseList.get(c);
size = courseList.get(c).getClassList().size();
}
return p;
}
return null;
}
Return an array of Course objects:
public Course[] largestEnrollment(){
You'll need to decide how to manipulate the array inside your for loop.
Sort the ArrayList based on size. Then you can return a sub-list of the largest courses.
If you don't have so many Course (e.g. <1k), you could implement Comparable or write a Comparator for your Course object. So that you could just from the map get all Values(Course) in collection, then sort the collection, just from the end of the sorted collection take those elements with same values(size).
I mentioned the size of the collection because sort make the O(n) problem into O(nlogn). But if the size is small, it is a convenient way to go.
Anyway, you have to change the method return type to a collection or an array.
Sort then return a sublist:
public List<Course> largestEnrollment(List<Course> courses) {
Collections.sort(courses, new Comparator<Course>() {
#Override
public int compare(Course o1, Course o2) {
return o1.getClassList().size() - o2.getClassList().size();
}
});
for (int indexOfLargest = 1; indexOfLargest < courses.size(); indexOfLargest ++) {
if (courses.get(indexOfLargest - 1).getClassList().size() > courses.get(indexOfLargest).getClassList().size())
return courses.subList(0, indexOfLargest);
}
return courses;
}

What is the best way to get the count/length/size of an iterator?

Is there a "computationally" quick way to get the count of an iterator?
int i = 0;
for ( ; some_iterator.hasNext() ; ++i ) some_iterator.next();
... seems like a waste of CPU cycles.
Using Guava library:
int size = Iterators.size(iterator);
Internally it just iterates over all elements so its just for convenience.
If you've just got the iterator then that's what you'll have to do - it doesn't know how many items it's got left to iterate over, so you can't query it for that result. There are utility methods that will seem to do this efficiently (such as Iterators.size() in Guava), but underneath they're just consuming the iterator and counting as they go, the same as in your example.
However, many iterators come from collections, which you can often query for their size. And if it's a user made class you're getting the iterator for, you could look to provide a size() method on that class.
In short, in the situation where you only have the iterator then there's no better way, but much more often than not you have access to the underlying collection or object from which you may be able to get the size directly.
Your code will give you an exception when you reach the end of the iterator. You could do:
int i = 0;
while(iterator.hasNext()) {
i++;
iterator.next();
}
If you had access to the underlying collection, you would be able to call coll.size()...
EDIT
OK you have amended...
You will always have to iterate. Yet you can use Java 8, 9 to do the counting without looping explicitely:
Iterable<Integer> newIterable = () -> iter;
long count = StreamSupport.stream(newIterable.spliterator(), false).count();
Here is a test:
public static void main(String[] args) throws IOException {
Iterator<Integer> iter = Arrays.asList(1, 2, 3, 4, 5).iterator();
Iterable<Integer> newIterable = () -> iter;
long count = StreamSupport.stream(newIterable.spliterator(), false).count();
System.out.println(count);
}
This prints:
5
Interesting enough you can parallelize the count operation here by changing the parallel flag on this call:
long count = StreamSupport.stream(newIterable.spliterator(), *true*).count();
Using Guava library, another option is to convert the Iterable to a List.
List list = Lists.newArrayList(some_iterator);
int count = list.size();
Use this if you need also to access the elements of the iterator after getting its size. By using Iterators.size() you no longer can access the iterated elements.
If all you have is the iterator, then no, there is no "better" way. If the iterator comes from a collection you could as that for size.
Keep in mind that Iterator is just an interface for traversing distinct values, you would very well have code such as this
new Iterator<Long>() {
final Random r = new Random();
#Override
public boolean hasNext() {
return true;
}
#Override
public Long next() {
return r.nextLong();
}
#Override
public void remove() {
throw new IllegalArgumentException("Not implemented");
}
};
or
new Iterator<BigInteger>() {
BigInteger next = BigInteger.ZERO;
#Override
public boolean hasNext() {
return true;
}
#Override
public BigInteger next() {
BigInteger current = next;
next = next.add(BigInteger.ONE);
return current;
}
#Override
public void remove() {
throw new IllegalArgumentException("Not implemented");
}
};
There is no more efficient way, if all you have is the iterator. And if the iterator can only be used once, then getting the count before you get the iterator's contents is ... problematic.
The solution is either to change your application so that it doesn't need the count, or to obtain the count by some other means. (For example, pass a Collection rather than Iterator ...)
for Java 8 you could use,
public static int getIteratorSize(Iterator iterator){
AtomicInteger count = new AtomicInteger(0);
iterator.forEachRemaining(element -> {
count.incrementAndGet();
});
return count.get();
}
To get the size of an Iterable
Iterable<Users> users = usersRepository.findUsersByLocation("IND");
Now assert the size of users of Type Iterable
assertEquals(2, ((Collection<Users>)users).size());
To get quickly the size :
[...iterator].length
iterator object contains the same number of elements what your collection contained.
List<E> a =...;
Iterator<E> i = a.iterator();
int size = a.size();//Because iterators size is equal to list a's size.
But instead of getting the size of iterator and iterating through index 0 to that size, it is better to iterate through the method next() of the iterator.

Categories