I have a directed graph implemented with the adjacency lists using a Java HashMap. Graph class stores only a pointer like this:
HashMap<Node<V>, List<Edge<V>>> graph;
I'm trying to write a method that can perform a transposition of the graph (by side effect). Here is the code:
/**
* Helper method for connection test
*/
public void reverseDirection(){
for(Node<V> v : getNodes()){
for(Edge<V> e : getOutEdges(v)){
Node<V> target = e.getTarget();
int weight = e.getWeight();
graph.get(v).remove(e);
graph.get(target).add(new Edge<V>(v, weight));
}
}
}
While executing some tests i get this:
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at esercitazione9.Graph.reverseDirection(Graph.java:71)
at esercitazione9.GraphUtil.fortementeConnesso(GraphUtil.java:126)
at esercitazione9.GraphUtil.main(GraphUtil.java:194)
Javadoc says that this exception does not always indicates that an object has been concurrently modified. It may occurs even when a thread modifies a collection directly while it is iterating over the collection.
This is exactly my case, but I have no ideas to solve it. There is another way to reverse all the edges directions without having iterator-collection interference? Note: computational cost can't be above O(n+m).
You cannot remove item from the collection you iterate over in any other way than using the iterator's remove() method (well, unless it's ConcurrentHashMap, I cannot think other exceptios at the moment). There are two canonical solutions to this problem:
Rewrite your loops to use explicit Iterators and call remove instead of graph.get(v).remove(e);
Create a separate collection to hold items to remove (or, alternatively, items to retain) from the collection you iterate over, and do it after the actual iteration.
As you explicitly ask for "not 1", I believe it is the only option. Computational cost should not increase if you store items to remove, as the number of allocations and insertions cannot be larger than O(n+m) (n collections, m removed edges total). Keep in mind that in case your graph contains loops, special care must be taken.
Ok. I just modified the code as suggested:
public void reverseDirection(){
Collection<Edge<V>> removed = new LinkedList<Edge<V>>();
for(Node<V> v : getNodes()){
for(Edge<V> e : getOutEdges(v)){
Node<V> target = e.getTarget();
int weight = e.getWeight();
removed.add(e);
graph.get(target).add(new Edge<V>(v, weight));
}
graph.get(v).removeAll(removed);
}
}
I think now there are some issues with the logic of the algorithm because doesn't return the expected result. I will post the fixed code later.
Related
I have a list of classes that I am attempting to sort in ascending order, by adding items in a for loop like so.
private static void addObjectToIndex(classObject object) {
for (int i=0;i<collection.size();i++)
{
if (collection.get(i).ID >= object.ID)
{
collection.insertElementAt(object, i);
return;
}
}
if (classObject.size() == 0)
classObject.add(object);
}
This is faster than sorting it every time I call that function, as that would be simpler but slower, as it gives O(N) time as opposed to using Collections.sort's O(N log N) every time (unless I'm wrong).
The problem is that when I run Collections.binarySearch to attempt to grab an item out of the Vector collection(The collection requires method calls on an atomic basis) it still ends up returning negative numbers as shown in the code below.
Comparator<classObject> c = new Comparator<classObject>()
{
public int compare(classObject u1, classObject u2)
{
int z1 = (int)(u1).ID;
int z2 = (int)(u2).ID;
if(z1 > z2)
return 1;
return z2 <= z1 ? 0 : -1;
}
};
int result = Collections.binarySearch(collection, new classObject(pID), c);
if (result < 0)
return null;
if (collection.get(result).ID != pID)
return null;
else
return collection.get(result);
Something like
result = -1043246
Shows up in the debugger, resulting in the second code snippet returning null.
Is there something I'm missing here? It's probably brain dead simple. I've tried adjusting the for loop that places things in order, <=, >=, < and > and it doesn't work. Adding object to the index i+1 doesn't work. Still returning null, which makes the entire program blow up.
Any insight would be appreciated.
Boy, did you get here from the 80s, because it sure sounds like you've missed quite a few API updates!
This is faster than sorting it every time I call that function, as that would be simpler but slower, as it gives O(N) time as opposed to using Collections.sort's O(N log N) every time (unless I'm wrong).
You're now spending an O(n) investment on every insert, So that's O(n^2) total, vs the model of 'add everything you want to add without sorting it' and then 'at the very end, sort the entire list', which is O(n logn).
Vector is threadsafe which is why I'm using it as opposed to something else, and that can't change
Nope. Threadsafety is not that easy; what you've written isn't thread safe.
Vector is obsolete and should never be used. What Vector does (vs. ArrayList) is that each individual operation on a vector is thread safe (i.e. atomic). Note that you can get this behaviour from any list if you really need it with: List<T> myList = Collections.synchronizedList(someList);, but it is highly unlikely you want this.
Take your current impl of addObjectToIndex. it is not atomic: It makes many different method calls on your vector, and these have zero guarantee of being consistent. If two threads both call addObjectToIndex and your computer has more than one core, than you will eventually end up with a list that looks like: [1, 2, 5, 4, 10] - i.e., not sorted.
Take your addObjectToIndex method: That method just doesn't work properly unless its view of your collection is consistent for the entirety of the run. In other words, that block needs to be 'atomic' - it either does it all or does none of it, and it needs a consistent view throughout. Stick a synchronized around the entire block. In contrast to Vector, which considers each individual call atomic and nothing else, which doesn't work here. More generally, 'just synchronize' is a rather inefficient way to do multicore - the various collections in the java.util.concurrent are usually vastly more efficient and much easier to use, you should read through that API and see if there's anything that'll work for you.
if(z1 > z2) return 1;
I'm pretty sure your insert code sorts ascending, but your comparator sorts descending. Which would break the binary search code (the binary search code is specced to return arbitrary garbage if the list isn't sorted, and as far as the comparator you use here is concerned, it isn't). You should use the same comparator anytime it is relevant, and not re-implement the logic multiple times (or if you do, at least test it!).
There is also no need to write all this code.
Comparator<classObject> c = Comparator::comparingInt(co -> co.ID);
is all you need.
However
It looks like what you really want is a collection that keeps itself continually sorted. Java has that; it's called a TreeSet. You pass it a Comparator (or you don't, and TreeSet expects that the elements you put in have a natural order, either is fine), and it will keep the collection sorted, at very cheap cost (far better than your O(n^2)!), continually. It IS a set, meaning if the comparator says that 2 items are equal, then adding both to the set results in the second add call being ignored (sets cannot contain the same element more than once, and for a TreeSet, 'the same element' is defined solely by 'comparing them returns 0' - TreeSet ignores hashCode and equals entirely).
This sounds like what you really want. If you need 2 different objects with the same ID to be added anyway, then add some more fields to your comparator (instead of returning 0 upon same ID, move on to checking the insertion timestamp or whatnot). But, with a name like 'ID', sounds like duplicates aren't going to happen.
The reason you want to use this off-the-shelf stuff is because otherwise you need to do it yourself, and if you're going to endeavour to write it yourself, you need to be a good programmer. Which you clearly aren't (yet - we all started a newbie and learned to become good later, it's the natural order of things). For example, if I try to add an element to a non-empty collection where the element I try to add has a larger ID than anything in the collection, it just won't add anything. That's because you wrote if (classObject.size() == 0) classObject.add(object); but you wanted classObject.add(object); without the if. Also, In java we write ClassObject, not ClassObject, and more generally, ClassObject is a completely meaningless name. Find a better name; this helps code be less confusing, and this question does suggest you could use some of that.
I'm trying to implement pagination within a java/spring boot app on a Collection that is returned from a function. I want to get pages that are ordered by each elements "startTime". So if the user asks for page 2 and each page has 10 items in it, then I would want to give the user the items with the top 10-20 most recent start times. I've since tried two approaches:
a) Converting the returned collection into an array and then using IntStream on it to get elements from one index to another.
final exampleClass[] example = exampleCollection.toArray(new exampleClass[0]);
Collection<exampleClass> examplePage = IntStream.range(start, end)
...
b) Converting the returned collection into an ArrayList and then using Pageable/PageRequest to create a new page from that ArrayList.
The problem is that these seem very inefficient since I first have to change the Collection to an ArrayList or array and then operate on it. I would like to know if there are more efficient ways to turn collections into structures that I can iterate on using indices so that I can implement pagination. Or, if there are some Spring functions for creating pages that don't require non-Collection parameters. However, I can't find any Spring functions for creating pages that do.
Also, is there any difference in runtime between
List<exampleClass> x = new ArrayList<>(exampleCollection);
and
List<exampleClass> x = (List<exampleClass>)exampleCollection;
I would like to know if there are more efficient ways to turn collections into structures that I can iterate on using indices
The only efficient way is to check by instanceof if your collection is indeed a List. If it is, then you can cast it, and simply use e.g. sublist(start, stop) to produce your paginated result.
Please note that accessing an element by its index might not be efficient either. In a LinkedList, accessing an element is a O(N) operation, so accesing M elements by index produces a O(M*N) operation, whereas using sublist() is a O(M+N) operation.
There is a specialization of the List interface that is used to mark lists that are fast at being accessed by index, and that is : RandomAccess, you may or may not want to check on that to decide on the best strategy.
Also, is there any difference in runtime between
List x = new ArrayList<>(exampleCollection);
and
List<exampleClass> x = (List<exampleClass>)exampleCollection;
There absolutely is.
The second is a cast, and has virtually no cost. Just beware that x and exampleCollection are one and the same object (modifying one is the same as modifying the other). Obviously, a cast may fail with a ClassCastException if exampleCollection is not actually a list.
The first performs a copy, which has a cost in both CPU (traversal of exampleCollection) and memory (allocating an array of the collection's size). Both are pretty low for small collections, but your mileage may vary.
In this copy case, modifying one collection does nothing to the other, you get a copy.
A collection does not have to have consistent iteration order: if you call iterator() twice you may get two different sequences. Converting the collection into an array or a list is the best solution.
As for the second question: This line of code:
List<exampleClass> x = new ArrayList<>(exampleCollection);
creates a new ArrayList, which is a shallow copy of the original collection. That is, it contains pointers to the same objects as the original collection, but the list itself is new and you could for example sort the list, or add or remove items, without affecting the original collection. Compared to:
List<exampleClass> x = (List<exampleClass>)exampleCollection;
Assuming exampleCollection is actually a List, this gives you a pointer to that list with a new data type. If you make changes like sort or add or remove items, you will see these modifications in exampleCollection. On the other hand if exampleCollection is not a List, you will get a run-time error (ClassCastException).
I'm using a huge ArrayList with the code bellow
public final List<MyClass> list = new ArrayList<>();
public void update(MyClass myClass) {
int i;
for (i=0; i < list.size(); i++) {
if (myClass.foo(list.get(i))) {
list.set(i, myClass);
break;
}
}
if (i == list.size()) {
list.add(myClass);
}
}
The list is extremely large. There is something else that I can do to increase the performance with this scenario? Maybe using some Java 8 feature, replacing ArrayList or something like that.
Another code that is taking too long to run related this List is the code bellow:
public List<MyClass> something(Integer amount) {
list.sort((m1, m2) -> Double.compare(m2.getBar(), m1.getBar()));
return list.stream()
.limit(amount)
.collect(Collectors.toList());
}
Any help is welcome, thank you all
It seems like the choice of the ArrayList is not good.
In the first case, you attempt to find an object by his properties in the list. To find an object in the list, you have to check in each elements of your list. Bigger is the list, the longer it will be. (You have a worst case complexity of O(N) with ArrayList)
If you use an HashMap instead of a List, you can use your property as key of your map. Like this, you can select the object you need to update directly without check each element of your list. The execution time will be no more dependent of the number of entries. (You have a worst case complexity of O(1) with HashMap)
If you use HashMap instead of ArrayList, your update code gonna look like this:
public void update(MyClass myClass) {
map.put(myClass.getKey(), myClass);
}
(where getKey() is the properties you try to equals in your foo method).
But this is only for the first case. With the informations we have it seems the best solution.
There is something else that I can do to increase the performance with this scenario?
The problem is that your algorithm has to apply myClass.foo to every element of the list until you find the first match. If you do this serially, then the worst-case complexity is O(N) where N is the list size. (And the list size is large.)
Now, you could do the searching in parallel. However, if there can be multiple matches, then matching the first one in the list is going to be tricky. And you still end up with O(N/C) where C is the number of cores available.
The only way to get better than O(N) is to use a different data structure. But without knowing what the MyClass::foo method does, it is hard to say what that data structure should be.
Your second problem seems to be trying to solve the "top K of N" problem. This can be implemented in O(N log K) and possibly better; see Optimal algorithm for returning top k values from an array of length N.
I am working on a problem where i'm required to store elements with requirements of No Duplication and Maintaining order. I chose to go with LinkedHashSet Since it fulfilled both my requirements.
Let's say I have this code:
LinkedHashSet hs = new LinkedHashSet();
hs.add("B");
hs.add("A");
hs.add("D");
hs.add("E");
hs.add("C");
hs.add("F");
if(hs.contains("D")){
//do something to remove elements added after"D" i-e remove "E", "C" and "F"
//maybe hs.removeAll(Collection<?>c) ??
}
Can anyone please guide me with the logic to remove these elements?
Am I using the wrong datastructure? If so, then what would be a better alternative?
I think you may need to use an iterator to do the removal if you are using a LinkedHashSet. That is to say find the element, then keep removing until you get to the tail. This will be O(n), but even if you wrote your own LinkedHashSet (with a doubly linked list and hashset) you would have access to the raw linking structure so that you could cut the linked list in O(1), but you would still need to remove all elements that you just cut from the linked list from the HashSet which is where the O(n) cost would arise again.
So in summary, remove the element, then keep an iterator to that element and continue to walk down removing elements until you get to the end. I'm not sure if LinkedHashSet exposes the required calls, but you can probably figure that out.
You could write your own version of an ArrayList that doesn't allow for duplicates, by overriding add() and addAll(). To my knowledge, there is no "common" 3rd party version of such, which has always surprised me. Anybody know of one?
Then the remove code is pretty simple (no need to use an ListIterator)
int idx = this.indexOf("D");
if (idx >= 0) {
for (int goInReverse = this.size()-1; goInReverse > idx; goInReverse--)
this.remove(goInReverse);
}
However, this is still O(N), cause you loop through every element of the List.
The basic problem here is that you have to maintain two data structures, a "map" one representing the key / value mapping, and a "list" other representing the insertion order.
There are "map" and "list" organizations that offer fast removal of a elements after a given point; e.g. ordered trees of various kinds and both array and chain-based lists (modulo the cost of locating the point.)
However, it seems impossible to remove N elements from the two data structures in better than O(N). You have to visit all of the elements being removed to remove them from the 2nd data structure. (In fact, I suspect one could prove this mathematically ...)
In short, there is no data structure that has better complexity than what you are currently using.
The area where it is possible to improve performance (with a custom collection class!) is in avoiding an explicit use of an iterator. Using an iterator and the standard iterator API, the cost is O(N) on the total number of elements in the data structure. You could make this O(N) on the number of elements removed ... if the hash entry nodes also had next/prev links for the sequence.
So, after trying a couple of things mentioned above, I chose to implement a different Data structure. Since I did not have any issue with the O(n) for this problem (as my data is very small)
I used Graphs, this library came in really handy: http://jgrapht.org/
What I am doing is adding all elements as vertices to a DirectedGraph also creating edges between them (edges helped me solve another non-related problem as well). And when it's time to remove the elements I use a recursive function with the following pseudo code:
removeElements(element) {
tempEdge = graph.getOutgoingEdgeFrom(element)
if(tempEdge !=null)
return;
tempVertex = graph.getTargetVertex(tempEdge)
removeElements(tempVertex)
graph.remove(tempVertex)
}
I agree that graph DS is not good for these kind of problems, but under my conditions, this works perfectly... Cheers!
I'm trying to use a PriorityQueue to order objects using a Comparator.
This can be achieved easily, but the objects class variables (with which the comparator calculates priority) may change after the initial insertion. Most people have suggested the simple solution of removing the object, updating the values and reinserting it again, as this is when the priority queue's comparator is put into action.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
You have to remove and re-insert, as the queue works by putting new elements in the appropriate position when they are inserted. This is much faster than the alternative of finding the highest-priority element every time you pull out of the queue. The drawback is that you cannot change the priority after the element has been inserted. A TreeMap has the same limitation (as does a HashMap, which also breaks when the hashcode of its elements changes after insertion).
If you want to write a wrapper, you can move the comparison code from enqueue to dequeue. You would not need to sort at enqueue time anymore (because the order it creates would not be reliable anyway if you allow changes).
But this will perform worse, and you want to synchronize on the queue if you change any of the priorities. Since you need to add synchronization code when updating priorities, you might as well just dequeue and enqueue (you need the reference to the queue in both cases).
I don't know if there is a Java implementation, but if you're changing key values alot, you can use a Fibonnaci heap, which has O(1) amortized cost to decrease a key value of an entry in the heap, rather than O(log(n)) as in an ordinary heap.
One easy solution that you can implement is by just adding that element again into the priority queue. It will not change the way you extract the elements although it will consume more space but that also won't be too much to effect your running time.
To proof this let's consider dijkstra algorithm below
public int[] dijkstra() {
int distance[] = new int[this.vertices];
int previous[] = new int[this.vertices];
for (int i = 0; i < this.vertices; i++) {
distance[i] = Integer.MAX_VALUE;
previous[i] = -1;
}
distance[0] = 0;
previous[0] = 0;
PriorityQueue<Node> pQueue = new PriorityQueue<>(this.vertices, new NodeComparison());
addValues(pQueue, distance);
while (!pQueue.isEmpty()) {
Node n = pQueue.remove();
List<Edge> neighbours = adjacencyList.get(n.position);
for (Edge neighbour : neighbours) {
if (distance[neighbour.destination] > distance[n.position] + neighbour.weight) {
distance[neighbour.destination] = distance[n.position] + neighbour.weight;
previous[neighbour.destination] = n.position;
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
}
}
}
return previous;
}
Here our interest is in line
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
I am not changing priority of the particular node by removing it and adding again rather I am just adding new node with same value but different priority.
Now at the time of extracting I will always get this node first because I have implemented min heap here and the node with value greater than this (less priority) always be extracted afterwards and in this way all neighboring nodes will already be relaxed when less prior element will be extracted.
Without reimplementing the priority queue yourself (so by only using utils.PriorityQueue) you have essentially two main approaches:
1) Remove and put back
Remove element then put it back with new priority. This is explained in the answers above. Removing an element is O(n) so this approach is quite slow.
2) Use a Map and keep stale items in the queue
Keep a HashMap of item -> priority. The keys of the map are the items (without their priority) and the values of the map are the priorities.
Keep it in sync with the PriorityQueue (i.e. every time you add or remove an item from the Queue, update the Map accordingly).
Now when you need to change the priority of an item, simply add the same item to the queue with a different priority (and update the map of course). When you poll an item from the queue, check if its priority is the same than in your map. If not, then ditch it and poll again.
If you don't need to change the priorities too often, this second approach is faster. Your heap will be larger and you might need to poll more times, but you don't need to find your item.
The 'change priority' operation would be O(f(n)log n*), with f(n) the number of 'change priority' operation per item and n* the actual size of your heap (which is n*f(n)).
I believe that if f(n) is O(n/logn)(for example f(n) = O(sqrt(n)), this is faster than the first approach.
Note : in the explanation above, by priority I means all the variables that are used in your Comparator. Also your item need to implement equals and hashcode, and both methods shouldn't use the priority variables.
It depends a lot on whether you have direct control of when the values change.
If you know when the values change, you can either remove and reinsert (which in fact is fairly expensive, as removing requires a linear scan over the heap!).
Furthermore, you can use an UpdatableHeap structure (not in stock java though) for this situation. Essentially, that is a heap that tracks the position of elements in a hashmap. This way, when the priority of an element changes, it can repair the heap. Third, you can look for an Fibonacci heap which does the same.
Depending on your update rate, a linear scan / quicksort / QuickSelect each time might also work. In particular if you have much more updates than pulls, this is the way to go. QuickSelect is probably best if you have batches of update and then batches of pull opertions.
To trigger reheapify try this:
if(!priorityQueue.isEmpty()) {
priorityQueue.add(priorityQueue.remove());
}
Something I've tried and it works so far, is peeking to see if the reference to the object you're changing is the same as the head of the PriorityQueue, if it is, then you poll(), change then re-insert; else you can change without polling because when the head is polled, then the heap is heapified anyways.
DOWNSIDE: This changes the priority for Objects with the same Priority.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
It depends on the definition of "better" and the implementation of the wrapper.
If the implementation of the wrapper is to re-insert the value using the PriorityQueue's .remove(...) and .add(...) methods,
it's important to point out that .remove(...) runs in O(n) time.
Depending on the heap implementation,
updating the priority of a value can be done in O(log n) or even O(1) time,
therefore this wrapper suggestion may fall short of common expectations.
If you want to minimize your effort to implement,
as well as the risk of bugs of any custom solution,
then a wrapper that performs re-insert looks easy and safe.
If you want the implementation to be faster than O(n),
then you have some options:
Implement a heap yourself. The wikipedia entry describes multiple variants with their properties. This approach is likely to get your the best performance, at the same time the more code you write yourself, the greater the risk of bugs.
Implement a different kind of wrapper: handlee updating the priority by marking the entry as removed, and add a new entry with the revised priority.
This is relatively easy to do (less code), see below, though it has its own caveats.
I came across the second idea in Python's documentation,
and applied it to implement a reusable data structure in Java (see caveats at the bottom):
public class UpdatableHeap<T> {
private final PriorityQueue<Node<T>> pq = new PriorityQueue<>(Comparator.comparingInt(node -> node.priority));
private final Map<T, Node<T>> entries = new HashMap<>();
public void addOrUpdate(T value, int priority) {
if (entries.containsKey(value)) {
entries.remove(value).removed = true;
}
Node<T> node = new Node<>(value, priority);
entries.put(value, node);
pq.add(node);
}
public T pop() {
while (!pq.isEmpty()) {
Node<T> node = pq.poll();
if (!node.removed) {
entries.remove(node.value);
return node.value;
}
}
throw new IllegalStateException("pop from empty heap");
}
public boolean isEmpty() {
return entries.isEmpty();
}
private static class Node<T> {
private final T value;
private final int priority;
private boolean removed = false;
private Node(T value, int priority) {
this.value = value;
this.priority = priority;
}
}
}
Note some caveats:
Entries marked removed stay in memory until they are popped
This can be unacceptable in use cases with very frequent updates
The internal Node wrapped around the actual values is an extra memory overhead (constant per entry). There is also an internal Map, mapping all the values currently in the priority queue to their Node wrapper.
Since the values are used in a map, users must be aware of the usual cautions when using a map, and make sure to have appropriate equals and hashCode implementations.