I am looking for a fast queue implementation in Java. I see that LinkedList implements the Queue interface, but it will only be as fast as a LinkedList right? Is there a way to have a queue that will be faster especially for add (I only need poll, add and check for empty).
Down the line I may also need a PriorityQueue but not yet.
If multiple threads are going to be accessing the queue then consider using an ArrayBlockingQueue. Otherwise take a look at ArrayDeque. From the ArrayDeque API:
This class is likely to be faster than
Stack when used as a stack, and faster
than LinkedList when used as a queue.
Specifically an array-based queue implementation reduces the need to resize the underlying array if the existing array has sufficient capacity, thus making additions to the queue generally faster than LinkedList. Be aware that ArrayBlockingQueue is a bounded implementation whereas ArrayDeque will resize as required.
The flip-side is that LinkedList will typically provide a much more compact representation, particularly in cases where your queue grows and shrinks by a large amount. For example, if you added 10,000,000 elements to an ArrayDeque and then removed 9,999,999 elements, the underlying array would still be of length 10,000,000 whereas a LinkedList would not suffer from this problem.
In reality, for single-threaded access to a non-blocking queue I tend to favour LinkedList. I imagine the performance differences are so negligable you wouldn't notice the difference anyway.
I see that LinkedList implements the Queue interface, but it will only be as fast as a LinkedList right?
Eyeballing the source code, LinkedList is O(1) for Queue.add, Queue.poll, and Queue.peek operations.
I hope that's fast enough.
If performance of a linked list was really a problem, an alternative would be to implement a "circular queue" in an array, i.e. a queue where the start and end point move as entries are added and deleted. I can give more details if you care. When I was using languages that did not have a library of collections, this was how I always implemented queues because it was easier to write than a linked list and it was faster. But with built-in collections, the effort of writing and debugging my own collection for a special case is not worth the trouble 99% of the time: When it's already written, the fact that I could write it a different way faster than I could re-write it the way Java does is pretty much an irrelevant fact. And any performance gain is likely to be too small to be worth the trouble. I sub-type existing collections to get special behavior I need now and then, but I'm hard-pressed to think of the last time that I wrote one from scratch.
You may want to have a look at https://dzone.com/articles/gaplist-lightning-fast-list which introduces GapList. This new list implementation combines the strengths of both ArrayList and LinkedList.
It therefore implements the Deque interface, but can also be presized like the above mentioned ArrayDeque. In addition, you also get all the possibilities of the List interface for free. Get it at https://github.com/magicwerk/brownies-collections.
Start with really simplistic rotating Queue implementation with "C/C++ like" attitude and fixed size.
class SimpleQueue<E>
{
int index = 0;
int head = 0;
int size = 100;
int counter = 0;
E[] data ;
#SuppressWarnings("unchecked")
SimpleQueue()
{
data = (E[]) new Object[size];
}
public void add(E e)
{
data[index]=e;
index=(index+1)%size;
counter++;
}
public E poll()
{
E value = data[head];
head=(head+1)%size;
counter--;
return value;
}
public boolean empty()
{ return counter==0; }
//Test
public static void main(String[] args)
{
SimpleQueue<Integer> s = new SimpleQueue<Integer>();
System.out.println(s.empty());
for(int i=0; i< 10; i++)
s.add(i);
System.out.println(s.empty());
for(int i=0; i<10; i++)
System.out.print(s.poll()+",");
System.out.println("\n"+s.empty());
}
}
And then improve it.
Related
I want to store 50000 or more strings and I need to perform several operations like retrieval of a specific string, deletion of a specific string, etc. I have been given only two options to select from and these are array list and array to store them. From a performance point of view which one is better?
Neither. If you want retrieval of specific strings (e.g. get the string "Foo") and deleting specific strings (e.g. delete "Foo"), I would consider using a Set.
An array list or an array will give you O(N) retrieval (unless you keep it sorted). A Set will typically give you at least O(lg N) time for finding a specific item.
ArrayList is backed by an array so performance wise you should see no difference.
If there is no error in your requirements, and indeed you have to choose among only an arraylist and a raw array, I would suggest an arraylist since you have all the APIs to manipulate the data available which you would have to write yourself for a raw array of Strings.
an array is more efficient performance wise than an arraylist but unless you know how many elements you will be placing into an array an arraylist would be a better option since the size of the arraylist can grow as needed whereas a static array cannot.
An array will always have better performance than an ArrayList. In part, because when using an array you don't have to pay the extra cost of type-casting its elements (using generics doesn't mean that typecasts disappear, only that they're hidden from plain view).
To make my point: Trove and fastutil are a couple of very fast Java collections libraries, which rely on the fact of providing type-specific collections and not Object-based implementations like ArrayList does.
Also, there's a cost for using a get() method for accessing elements (albeit small) and a cost for resizing operations, which can be important in huge ArrayLists with many insertions and deletions. Of course, this doesn't happen with arrays because by their very nature have a fixed size, that's both an advantage and a disadvantage.
Answering your question: if you know in advance the number of elements that you're going to need, and those elements aren't going to change much (insertions, deletion) then your best bet is to use an array. If some modification operations are needed and performance is of paramount importance, try using either Trove or fastutil.
Retrieval of specific string,deleting specific string...i think ArrayList is not the best solution. Take a look at HashSet or LinkedHashSet.
If you look at the source code of ArrayList you will see:
107 /**
108 * The array buffer into which the elements of the ArrayList are stored.
109 * The capacity of the ArrayList is the length of this array buffer.
110 */
111 private transient Object[] elementData;
it is using an array internally.
So ArrayList could never be faster than using an array.
Provided you intially size the ArrayList correctly, the main difference will come from additions, which do a range check that you could get rid of with an array. But we are talking about a few CPU cycles here.
Apart from that there should be no noticeable difference. For example, the indexOf method in ArrayList looks like this:
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
I have 100,000 objects in the list .I want to remove few elements from the list based on condition.Can anyone tell me what is the best approach to achieve interms of memory and performance.
Same question for adding objects also based on condition.
Thanks in Advance
Raju
Your container is not just a List. List is an interface that can be implemented by, for example ArrayList and LinkedList. The performance will depend on which of these underlying classes is actually instantiated for the object you are polymorphically referring to as List.
ArrayList can access elements in the middle of the list quickly, but if you delete one of them you need to shift a whole bunch of elements. LinkedList is the opposite i nthis respect., requiring iteration for the access but deletion is just a matter of reassigning pointers.
Your performance depends on the implementation of List, and the best choice of implementation depends on how you will be using the List and which operations are most frequent.
If you're going to be iterating a list and applying tests to each element, then a LinkedList will be most efficient in terms of CPU time, because you don't have to shift any elements in the list. It will, however consume more memory than an ArrayList, because each list element is actually held in an entry.
However, it might not matter. 100,000 is a small number, and if you aren't removing a lot of elements the cost to shift an ArrayList will be low. And if you are removing a lot of elements, it's probably better to restructure as a copy-with filter.
However, the only real way to know is to write the code and benchmark it.
Collections2.filter (from Guava) produces a filtered collection based on a predicate.
List<Number> myNumbers = Arrays.asList(Integer.valueOf(1), Double.valueOf(1e6));
Collection<Number> bigNumbers = Collections2.filter(
myNumbers,
new Predicate<Number>() {
public boolean apply(Number n) {
return n.doubleValue() >= 100d;
}
});
Note, that some operations like size() are not efficient with this scheme. If you tend to follow Josh Bloch's advice and prefer isEmpty() and iterators to unnecessary size() checks, then this shouldn't bite you in practice.
LinkedList could be a good choice.
LinkedList does "remove and add elements" more effective than ArrayList. and no need to call such method as ArrayList.trimToSize() to remove useless memory. But LinkedList is a dual-linked list, each element is wrapped as an Entry which needs extra memory.
I'm trying to write a data structure which is a combination of Stack and HashSet with fast push/pop/membership (I'm looking for constant time operations). Think of Python's OrderedDict.
I tried a few things and I came up with the following code: HashInt and SetInt. I need to add some documentation to the source, but basically I use a hash with linear probing to store indices in a vector of the keys. Since linear probing always puts the last element at the end of a continuous range of already filled cells, pop() can be implemented very easy without a sophisticated remove operation.
I have the following problems:
the data structure consumes a lot of memory (some improvement is obvious: stackKeys is larger than needed).
some operations are slower than if I have used fastutil (eg: pop(), even push() in some scenarios). I tried rewriting the classes using fastutil and trove4j, but the overall speed of my application halved.
What performance improvements would you suggest for my code?
What open-source library/code do you know that I can try?
You've already got a pretty good implementation. The only improvement obvious to me is that you do more work than you need to by searching when popping. You should store in the stack not the key itself but the index into the key array. This gives you trivially fast pops at the expense of only one more pointer indirection when you want to peek the last item.
Just size your stack to LOAD_FACTOR*(heap array size), in addition to that, and you should have about as fast an implementation as you could expect with as little memory as you can manage given your speed requirements.
I think that what you want is (almost) already available in the libraries: LinkedHashSet is a hash-set with an underlying doubly linked list (which makes it iterable). LinkedHashMap even has a removeEldestEntry which sounds very similar to a pop-method.
How is the performance of a naive solution like:
class HashStack<T> {
private HashMap<T, Integer> counts = new HashMap<T, Integer>();
private Stack<T> stack = new Stack<T>();
public void push(T t) {
stack.push(t);
counts.put(t, 1 + getCount(t));
}
public T pop() {
T t = stack.pop();
counts.put(t, counts.get(t) - 1);
return t;
}
private int getCount(T t) {
return counts.containsKey(t) ? counts.get(t) : 0;
}
public boolean contains(T t) {
return getCount(t) > 0;
}
public String toString() {
return stack.toString();
}
}
I would suggest using TreeSet<T> as it provides guaranteed O(log n) cost for add, remove, and contains.
Is there any performance testing results available in comparing traditional for loop vs Iterator while traversing a ArrayList,HashMap and other collections?
Or simply why should I use Iterator over for loop or vice versa?
Assuming this is what you meant:
// traditional for loop
for (int i = 0; i < collection.size(); i++) {
T obj = collection.get(i);
// snip
}
// using iterator
Iterator<T> iter = collection.iterator();
while (iter.hasNext()) {
T obj = iter.next();
// snip
}
// using iterator internally (confirm it yourself using javap -c)
for (T obj : collection) {
// snip
}
Iterator is faster for collections with no random access (e.g. TreeSet, HashMap, LinkedList). For arrays and ArrayLists, performance differences should be negligible.
Edit: I believe that micro-benchmarking is root of pretty much evil, just like early optimization. But then again, I think it's good to have a feeling for the implications of such quite trivial things. Hence I've run a small test:
iterate over a LinkedList and an ArrayList respecively
with 100,000 "random" strings
summing up their length (just something to avoid that compiler optimizes away the whole loop)
using all 3 loop styles (iterator, for each, for with counter)
Results are similar for all but "for with counter" with LinkedList. All the other five took less than 20 milliseconds to iterate over the whole list. Using list.get(i) on a LinkedList 100,000 times took more than 2 minutes (!) to complete (60,000 times slower). Wow! :) Hence it's best to use an iterator (explicitly or implicitly using for each), especially if you don't know what type and size of list your dealing with.
The first reason to use an iterator is obvious correctness. If you use a manual index, there may be very innocuous off-by-one errors that you can only see if you look very closely: did you start at 1 or at 0? Did you finish at length - 1? Did you use < or <=? If you use an iterator, it is much easier to see that it is really iterating the whole array. "Say what you do, do what you say."
The second reason is uniform access to different data structures. An array can be accessed efficiently through an index, but a linked list is best traversed by remembering the last element accessed (otherwise you get a "Shlemiel the painter"). A hashmap is even more complicated. By providing a uniform interface from these and other data structures (e.g., you can also do tree traversals), you get obvious correctness again. The traversing logic has to be implemented only once, and the code using it can concisely "say what it does, and do what it says."
Performance is similar in most cases.
However, whenever a code receives a List, and loops on it, there is well-known case:
the Iterator is way better for all List implementations that do not implement RandomAccess (example: LinkedList).
The reason is that for these lists, accessing an element by index is not a constant time operation.
So you can also consider the Iterator as more robust (to implementation details).
As always, performance should not be hide readability issues.
The java5 foreach loop is a big hit on that aspect :-)
Yes, it does make a difference on collections which are not random access based like LinkedList. A linked list internally is implemented by nodes pointing to the next(starting at a head node).
The get(i) method in a linked list starts from the head node and navigates through the links all the way to the i'th node. When you iterate on the linked list using a traditional for loop, you start again from the head node each time, thus the overall traversal becomes quadratic time.
for( int i = 0; i< list.size(); i++ ) {
list.get(i); //this starts everytime from the head node instead of previous node
}
While the for each loop iterates over the iterator obtained from the linked list and calls its next() method. The iterator maintains the states of the last access and thus does not start all the way from head everytime.
for( Object item: list ) {
//item element is obtained from the iterator's next method.
}
One of the best reasons to use an iterator over the i++ syntax is that not all data structures will support random access let alone have it perform well. You should also be programming to the list or collection interface so that if you later decided that another data structure would be more efficient you'd be able to swap it out without massive surgery. In that case (the case of coding to an interface) you won't necessarily know the implementation details and it's probably wiser to defer that to the data structure itself.
One of the reasons I've learned to stick with the for each is that it simplifies nested loops, especially over 2+ dimensional loops. All the i's, j's, and k's that you may end up manipulating can get confusing very quickly.
Use JAD or JD-GUI against your generated code, and you will see that there is no real difference. The advantage of the new iterator form is that it looks cleaner in your codebase.
Edit: I see from the other answers that you actually meant the difference between using get(i) versus an iterator. I took the original question to mean the difference between the old and new ways of using the iterator.
Using get(i) and maintaining your own counter, especially for the List classes is not a good idea, for the reasons mentioned in the accepted answer.
I don't believe that
for (T obj : collection) {
calculates .size() each time thru the loop and is therefore faster than
for (int i = 0; i < collection.size(); i++) {
+1 to what sfussenegger said. FYI, whether you use an explicit iterator or an implicit one (i.e. for each) won't make a performance difference because they compile to the same byte code.
I'm trying to use a PriorityQueue to order objects using a Comparator.
This can be achieved easily, but the objects class variables (with which the comparator calculates priority) may change after the initial insertion. Most people have suggested the simple solution of removing the object, updating the values and reinserting it again, as this is when the priority queue's comparator is put into action.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
You have to remove and re-insert, as the queue works by putting new elements in the appropriate position when they are inserted. This is much faster than the alternative of finding the highest-priority element every time you pull out of the queue. The drawback is that you cannot change the priority after the element has been inserted. A TreeMap has the same limitation (as does a HashMap, which also breaks when the hashcode of its elements changes after insertion).
If you want to write a wrapper, you can move the comparison code from enqueue to dequeue. You would not need to sort at enqueue time anymore (because the order it creates would not be reliable anyway if you allow changes).
But this will perform worse, and you want to synchronize on the queue if you change any of the priorities. Since you need to add synchronization code when updating priorities, you might as well just dequeue and enqueue (you need the reference to the queue in both cases).
I don't know if there is a Java implementation, but if you're changing key values alot, you can use a Fibonnaci heap, which has O(1) amortized cost to decrease a key value of an entry in the heap, rather than O(log(n)) as in an ordinary heap.
One easy solution that you can implement is by just adding that element again into the priority queue. It will not change the way you extract the elements although it will consume more space but that also won't be too much to effect your running time.
To proof this let's consider dijkstra algorithm below
public int[] dijkstra() {
int distance[] = new int[this.vertices];
int previous[] = new int[this.vertices];
for (int i = 0; i < this.vertices; i++) {
distance[i] = Integer.MAX_VALUE;
previous[i] = -1;
}
distance[0] = 0;
previous[0] = 0;
PriorityQueue<Node> pQueue = new PriorityQueue<>(this.vertices, new NodeComparison());
addValues(pQueue, distance);
while (!pQueue.isEmpty()) {
Node n = pQueue.remove();
List<Edge> neighbours = adjacencyList.get(n.position);
for (Edge neighbour : neighbours) {
if (distance[neighbour.destination] > distance[n.position] + neighbour.weight) {
distance[neighbour.destination] = distance[n.position] + neighbour.weight;
previous[neighbour.destination] = n.position;
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
}
}
}
return previous;
}
Here our interest is in line
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
I am not changing priority of the particular node by removing it and adding again rather I am just adding new node with same value but different priority.
Now at the time of extracting I will always get this node first because I have implemented min heap here and the node with value greater than this (less priority) always be extracted afterwards and in this way all neighboring nodes will already be relaxed when less prior element will be extracted.
Without reimplementing the priority queue yourself (so by only using utils.PriorityQueue) you have essentially two main approaches:
1) Remove and put back
Remove element then put it back with new priority. This is explained in the answers above. Removing an element is O(n) so this approach is quite slow.
2) Use a Map and keep stale items in the queue
Keep a HashMap of item -> priority. The keys of the map are the items (without their priority) and the values of the map are the priorities.
Keep it in sync with the PriorityQueue (i.e. every time you add or remove an item from the Queue, update the Map accordingly).
Now when you need to change the priority of an item, simply add the same item to the queue with a different priority (and update the map of course). When you poll an item from the queue, check if its priority is the same than in your map. If not, then ditch it and poll again.
If you don't need to change the priorities too often, this second approach is faster. Your heap will be larger and you might need to poll more times, but you don't need to find your item.
The 'change priority' operation would be O(f(n)log n*), with f(n) the number of 'change priority' operation per item and n* the actual size of your heap (which is n*f(n)).
I believe that if f(n) is O(n/logn)(for example f(n) = O(sqrt(n)), this is faster than the first approach.
Note : in the explanation above, by priority I means all the variables that are used in your Comparator. Also your item need to implement equals and hashcode, and both methods shouldn't use the priority variables.
It depends a lot on whether you have direct control of when the values change.
If you know when the values change, you can either remove and reinsert (which in fact is fairly expensive, as removing requires a linear scan over the heap!).
Furthermore, you can use an UpdatableHeap structure (not in stock java though) for this situation. Essentially, that is a heap that tracks the position of elements in a hashmap. This way, when the priority of an element changes, it can repair the heap. Third, you can look for an Fibonacci heap which does the same.
Depending on your update rate, a linear scan / quicksort / QuickSelect each time might also work. In particular if you have much more updates than pulls, this is the way to go. QuickSelect is probably best if you have batches of update and then batches of pull opertions.
To trigger reheapify try this:
if(!priorityQueue.isEmpty()) {
priorityQueue.add(priorityQueue.remove());
}
Something I've tried and it works so far, is peeking to see if the reference to the object you're changing is the same as the head of the PriorityQueue, if it is, then you poll(), change then re-insert; else you can change without polling because when the head is polled, then the heap is heapified anyways.
DOWNSIDE: This changes the priority for Objects with the same Priority.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
It depends on the definition of "better" and the implementation of the wrapper.
If the implementation of the wrapper is to re-insert the value using the PriorityQueue's .remove(...) and .add(...) methods,
it's important to point out that .remove(...) runs in O(n) time.
Depending on the heap implementation,
updating the priority of a value can be done in O(log n) or even O(1) time,
therefore this wrapper suggestion may fall short of common expectations.
If you want to minimize your effort to implement,
as well as the risk of bugs of any custom solution,
then a wrapper that performs re-insert looks easy and safe.
If you want the implementation to be faster than O(n),
then you have some options:
Implement a heap yourself. The wikipedia entry describes multiple variants with their properties. This approach is likely to get your the best performance, at the same time the more code you write yourself, the greater the risk of bugs.
Implement a different kind of wrapper: handlee updating the priority by marking the entry as removed, and add a new entry with the revised priority.
This is relatively easy to do (less code), see below, though it has its own caveats.
I came across the second idea in Python's documentation,
and applied it to implement a reusable data structure in Java (see caveats at the bottom):
public class UpdatableHeap<T> {
private final PriorityQueue<Node<T>> pq = new PriorityQueue<>(Comparator.comparingInt(node -> node.priority));
private final Map<T, Node<T>> entries = new HashMap<>();
public void addOrUpdate(T value, int priority) {
if (entries.containsKey(value)) {
entries.remove(value).removed = true;
}
Node<T> node = new Node<>(value, priority);
entries.put(value, node);
pq.add(node);
}
public T pop() {
while (!pq.isEmpty()) {
Node<T> node = pq.poll();
if (!node.removed) {
entries.remove(node.value);
return node.value;
}
}
throw new IllegalStateException("pop from empty heap");
}
public boolean isEmpty() {
return entries.isEmpty();
}
private static class Node<T> {
private final T value;
private final int priority;
private boolean removed = false;
private Node(T value, int priority) {
this.value = value;
this.priority = priority;
}
}
}
Note some caveats:
Entries marked removed stay in memory until they are popped
This can be unacceptable in use cases with very frequent updates
The internal Node wrapped around the actual values is an extra memory overhead (constant per entry). There is also an internal Map, mapping all the values currently in the priority queue to their Node wrapper.
Since the values are used in a map, users must be aware of the usual cautions when using a map, and make sure to have appropriate equals and hashCode implementations.