Are there any efficient (without synchronize everything) implementations of java.util.concurrent.BlockingQueue that allow combining of entries?
By combining I mean to merge incoming item with existing "equal" entry on the queue (if there is one), otherwise item is added at the end as usual.
Check this answer out: Concurrent Set Queue. It might be a duplicate of your question if all that you mean by merging is ignoring elements that are equal to something that's already on the queue.
BlockingQueue sports a contains method. Feel free to use it, but don't forget to synchronize. contains is O(n) f.x. in LinkedBlockingDeque, so you may try a more efficient approach with a HashSet.
I can't see combining events if timestamp and source are two of its attributes. Unless the same user sent two events within a nanosecond or less of each other, I don't believe they can be considered equal and should not be combined.
Related
I am writing an application for Android mobile phones.
I have a java.util.ArrayList that contains objects which require a custom java.util.Comparator to be sorted properly.
I know I can use java.util.Collections.sort() but the amount of data is such that I want to sort my ArrayList in an android.os.AsyncTask.
I do not want to use several AsyncTask objects that would each sort a subset of the data.
Any AsyncTask can be cancelled so I want to regularly call AsyncTask.isCancelled() while I sort. If it returns true, I give up on sorting (and on my whole data set).
I Googled but could not find an AsyncTask-friendly way to sort while regularly checking for cancellation.
I may be able to call isCancelled() in my implementation of java.util.Comparator.compare() and throw my own subclass of java.lang.RuntimeException if it returns true. Then try{java.util.Collections.sort(ArrayList, Comparator);} catch () {} for that specific exception. I don't feel entirely comfortable with that approach.
Alternatively, I can use an intermediary java.util.TreeSet and write 2 loops that each check for cancellation before every iteration. The first loop would add all the items of the ArrayList to the TreeSet (and my Comparator implementation keeps them sorted on insertion). The second loop would add all the object in the TreeSet back into the ArrayList, in the correct order, thanks to the TreeSet natural java.util.Iterator. This approach uses some extra memory but will assuredly work.
The last approach I can think of is to implement myself what I have actually been looking for in Android: A generic java.util.List (preferably quick)sort using a java.util.Comparator and an android.os.AsyncTask that regularly checks for cancellation.
Has anybody found that?
Do you have any other solution to this problem?
EDIT:
Although I haven't given any thought to what the sorting method signature would look like, I would also be perfectly happy with using a android.os.CancellationSignal to decide when to abandon the sorting.
I’ll try to describe my thought process here. If anybody had better offers at any point…
Lets re-affirm what we are trying to achieve here.
We need a sorting algorithm with the following properties
Runs on a single task
Be in place i.e. not use extra memory
We should be able to cancel the sort at will, i.e. return immediately or very close to it when we decide it's no longer needed.
Be efficient
Would not use exceptions to control a perfectly normal flow of your application. You are right about not feeling comfortable about that one ☺
There is no native android tool to do that AFAIK
Let's focus for a second on requirement 3.
Here is a quote from asycTask documentation, The section regarding cancelling a task
Blockquote
To ensure that a task is cancelled as quickly as possible, you should always check the return value of isCancelled() periodically from doInBackground(Object[]), if possible (inside a loop for instance.) ".
Meaning, an iterative sorting algorithm, where on each iteration you must check for the isCancalled() flag, will fill this requirment. The problem is simple iterative sorting algorithms , such is Insertion sort, often are not very efficient. It shouldn’t matter too much for small inputs, but since you say your typical input is a huge array list, and that triggered our quest anyway, we need to keep things as efficient as possible.
Since you did mention quick sort, I was thinking, it has got everything we need! It’s efficient, it’s in place, it runs on a single task. There is only one shortfall. It is, in it’s classic form, recursive, meaning it won’t return immediately upon cancellation. Luckily a brief Google search yields many results that can help including this one. In Brief, you can find there a variant for quicksort that is iterative. This is done by replacing the recursive callstack by a stack that stores the same indexes that recursive implementation would use to preform "partition" with.
Take this Algorithm, add a check if asyncTask.isCancelled() on each iteration, and you got yourself a solution that answers all the requirements.
I have a collection of 'effects' I draw on an 'object' in a GUI ( gradients, textures, text etc ). The nature of the underlying system means that this effect collection can be accessed by multiple threads. The majority of operations are reads, so at the moment I'm using a CopyOnWriteArrayList which works ok.
But now I need to sort the collection of effects based on their draw order whenever I add a new effect or change an effect's draw order. I also need to be able to iterate through the collection in forwards & reverse ( iterater.next() & iterator.previous() ).
After some research I've found CopyOnWriteArrayLists don't like being sorted:
Behaviour of CopyOnWriteArrayList
If you tried to sort a CopyOnWriteArrayList you'll see the list throws an UsupportedOperationException (the sort invokes set on the collection N times). You should only use this read when you are doing upwards of 90+% reads.
I also found a suggestion of using ConcurrentSkipListSet, as it handles concurrency & sorting, but looking at the JavaDoc I'm worried about this:
Beware that, unlike in most collections, the size method is not a constant-time operation.
And I use & rely on the size() method quite a bit.
I could implement a system of manual synchronization on the effect collection as a standard ArrayList, but this is a very big refactor & I'd rather exhaust all other possibilities, if anyone has any ideas? Thanks for sticking with me this far.
Probably the best way to go is to manually synchronize at the point where you are sorting your collection. You could do something like (pseudo code) :
synchronize {
convert copyonwritearrylist to normal list.
sort normallist.
convert normal list to copyonwritearraylist and replace the shared instance
}
Alternatively you might just use a normal ArrayList and roll out your own solution using ReentrantReadWriteLock This should work OK in case you have more reads than writes.
I need to hold a large number of elements (500k or so) in a list or a set I need to do high performance traversal, addition and removal. This will be done in a multithreaded environment and I don't care if I gets to see the updates done after traversal began (weakly consistent), what Java collection is right for this scenario?
I need to hold a large number of
elements (500k or so) in a list or a
set I need to do high performance
traversal, addition and removal.
...
This will be done in a multithreaded
environment
ConcrrentSkipListMap - it's not a List but List semantics are practically useless in concurrent environment.
It will have the elements sorted in a tree alike structure and not accessible via hashing, so you need some natural ordering (or external via comparator)
If you need only add/remove at the ends of a Queue - ConcurrentLinkedQueue.
Synchronized collections are not suited for multi-threaded environment if you expect even moderate contention. They require full lock holding during the entire traverse operation as well. I'd advise against ConcurrentHashMap, either.
In the end: if you are going for real multi-CPU like 64+ and expect high contention and don't want natural ordering follow the link: http://sourceforge.net/projects/high-scale-lib
Here is a very good article on selecting a collection depending on your application
http://www.developer.com/java/article.php/3829891/Selecting-the-Best-Java-Collection-Class-for-Your-Application.htm
you can try this as well
http://www.javamex.com/tutorials/collections/how_to_choose.shtml
If traversal == read, and add/remove == update, I'd say that it's not often that a single collection is optimized for both operations.
But your best bet is likely to be a HashMap.
Multithreaded - so look at j.u.concurrent. Maybe ConcurrentHashMap used as a Set - e.g. use put(x, x) instead of add(x).
If you do addition and removal often, then something "linked" is probably the best choice. That way everytime you add/remove only an index has to be updated, in contrast to an ArrayList for example where the whole Array has to be "moved". The problem is that you are asking for the holy grail of Collections.
Taking a look at the Concurrent Collections might help.
But what do you mean by "traversal"?
If you need to add or remove items in the middle of a list quickly, LinkedList is a good choice. To use it in multithreaded enviroment, you need to synchronise it like this:
List l = Collections.synchronisedList(new LinkedList());
On other hand, due to large size of data, is it possible to store the data in database? And use memory collection as cache.
are duplicate items allowed?
is yes, Set can't be used. you can use SortedSet otherwise.
The JDK ships with CopyOnWrite* implementations for Set and List, but none for Map and I've often lamented this fact. I know there are other collections implementations out there that have them, but it would be nice if one shipped as standard. It seems like an obvious omission and I'm wondering if there was a good reason for it. Anyone any idea why this was left out?
I guess this depends on your use case, but why would you need a CopyOnWriteMap when you already have a ConcurrentHashMap?
For a plain lookup table with many readers and only one or few updates it is a good fit.
Compared to a copy on write collection:
Read concurrency:
Equal to a copy on write collection. Several readers can retrieve elements from the map concurrently in a lock-free fashion.
Write concurrency:
Better concurrency than the copy on write collections that basically serialize updates (one update at a time). Using a concurrent hash map you have a good chance of doing several updates concurrently. If your hash keys are evenly distributed.
If you do want to have the effect of a copy on write map, you can always initialize a ConcurrentHashMap with a concurrency level of 1.
The easiest implementation of a set would usually be to use an underlying map. They even have a Collections.newSetFromMap() method [maybe from 1.6 only].
What they should have done was have a CopyOnWriteMap and the CopyOnWriteSet being equivalent to Collections.newSetFromMap(new CopyOnWriteMap()).
But as you can see the CopyOnWriteArraySet is actually backed by an array not a map. And wouldn't Collections.newSetFromMap(ConcurrentHashMap()) be acceptable for your usecase?
java.util.PriorityQueue allows a Comparator to be passed at construction time. When inserting elements, they are ordered according to the priority specified by the comparator.
What happens when the priority of an element changes after it has been inserted? When does the PriorityQueue reorder elements? Is it possible to poll an element that does not actually have minimal priority?
Are there good implementations of a priority queue which allow efficient priority updates?
You should remove the element, change it, and re-insert, since ordering occurs when it is inserted. Although it involves several steps, it is efficient might be good enough. (I just noticed the comment about removal being O(n).)
One problem is that it will also re-order when you remove the element, which is redundant if you are just going to re-insert it a moment later. If you implement your own priority queue from scratch, you could have an update() that skips this step, but extending Java's class won't work because you are still limited to the remove() and add() provided by the base.
I would expect the PriorityQueue to not reorder things - and it could get very confused if it tries to do a binary search to find the right place to put any new entries.
Generally speaking I'd expect changing the priority of something already in a queue to be a bad idea, just like changing the values making up a key in a hash table.