LinkedList new reverse algorithm - java

I created an algorithm for reversing a linked list which is mentioned below. Can someone tells me if its efficient or not. It is taking O(n) time tho.
private void insertBegining(T data) {
Node<T> newNode = new Node<T>(data);
newNode.setNextNode(root);
root = newNode;
}
private void reverseList(){
Node<T> curr = root;
while(curr.getNextNode() != null){
insertBegining(curr.getNextNode().getData());
curr.setNextNode(curr.getNextNode().getNextNode());
}
}

You don't need to create new nodes, just reuse the existing ones changing the next field, same complexity O(n) but less heap usage.
private void reverseList(){
Node<T> reversed=null;
while(root != null){
Node<T> next=root.getNextNode();
root.setNextNode(reversed);
reversed=root;
root=next;
}
root=reversed;
}

Can someone tells me if its efficient or not.
It is incredibly inefficient. There's no fixing this; linked lists just are, by nature. Don't use them in your code if you like efficiency.
There are two different 'kinds' of efficiency: Academic/Algorithmic (described, generally, in big-O notation), and pragmatic efficiency: How long does it actually take, on actual real life modern commonly employed hardware, such as ARM and i86/64 architecture chips on windows, linux, and macos.
If you want to make reversing a LinkedList algorithmically faster than O(1), your only option is to work on the original form. For example, if you have a doubly-linked list, where each node is not just aware of the node that follows it, but also aware of the node that precedes it, then reversing a list can be an O(1) operation: Just create a wrapper that starts at the end and implements any attempt to 'go next' by actually invoking the 'getPrevious()' method. But, this all demands that you have a doubly-linked list to start with. If you just do not have it, then it is obviously impossible to reverse the list without iterating through it once, which dooms you to O(n) or worse performance.
The reason that linked lists are so, so bad (and this makes it considerably worse) in pragmatic terms is in particular the cache issue.
Modern CPU design uses hierarchical layers of on-chip cache. The CPUs no longer operate on main memory, because main memory is waaay too slow; the CPU can process 500 cycles worth of instructions or more (and, generally, a bunch of them more or less in parallel because CPUs have pipelines, so it could do a heck of a lot of work in those 500 cycles), just in the time it takes to fetch some data from memory.
The solution is that nowadays CPUs can't even access memory anymore, at all. Instead, the CPU only operates on a page of memory loaded in a CPU cache. If the CPU needs to act on data that isn't loaded in a page that is in cache, then the CPU tells the memory controller to go fetch it, and will then go to sleep or do other work. A cache page is eventually 'saved' back into actual memory later by the controller when the CPU is done operating on it.
Whenever a CPU core needs to operate on memory that isn't loaded in a cache page that's called a cache miss. These are incredibly expensive (those 500+ cycles I mentioned).
The problem with linked lists, at least as implemented above, is the problem of fragmentation: You have not just the objects stored in your linked list (say, it's a linked list of strings - those strings), you also have these node objects.
The locations in memory of both the strings and the node objects are crucial.
The best possible situation is if all these node objects are all stored in a contiguous block of memory (all next to each other, nicely ordered). This way, if you are e.g. just iterating through a list to e.g. figure out how large it is, in memory you get the minimum amount of misses (you'd process an entire cache-page's worth of node objects and then move on to the next page). However, often you also interact with the objects these nodes are pointing at, and generally the strings are in a different place.
The worst possible situation is if the nodes are scattered throughout memory, causing a cache miss on every iteration. Often nodes and the data they contain are intermixed which is not good, especially if the data contained is large.
That's why node-based linked lists are inherently inefficient. It'd be slightly more efficient if the objects you are storing themselves contain the next/prev pointers, but java doesn't make this easy and design-wise it's annoying (it conflates ideas and means an object can only exist in one linked list at a time. Java doesn't allow you to create on-the-fly alternate definitions of objects that have mixed in a next and prev field).
ArrayList is what you generally want.

You don't need to create new nodes, just reuse the existing ones by changing the direction of next pointer, below is the code with same complexity O(n) but less heap usage. It uses the concept of 3 pointers to reverse a list.
private void reverseList(){
if (head == null) {
throw new EmptyListException(EMPTY_LIST);
} else if (head.next == null) {
return;
}
Node<T> nextNode;
Node<T> node = head;
Node<T> prevNode = null;
while (node != null) {
nextNode = node.next;
node.next = prevNode;
prevNode = node;
node = nextNode;
}
head = prevNode;
}

Related

Java Best Practice regarding clearing a Linked List

I am coding a data structure Linked list in java(For my learning sake I am not using any standard java libraries) and I want to clear the data structure via null out the references. Please suggest me which approach is better
1) Just null the start reference of the list and that will suffice.
2) Apart from nulling out the start , I de-reference all next pointers of all internal nodes to null.Does this help the garbage collector in any way?
The confusion in I see approach 2 is followed in JDK for LinkedList implementation.But i don't see the same for TreeMap
I am using JDK 8
This is an interesting question, and the answer has a long history with subtle tradeoffs.
The short answer is, clearing the references is not necessary for the correct operation of the data structure. Since you're learning data structures, I'd suggest that you not worry about this issue in your own implementation. My hunch (though I haven't benchmarked this) is that any benefit that might accrue from clearing all the link nodes will rarely be noticeable under typical conditions.
(In addition, it's likely that under typical conditions, LinkedList will be outperformed by ArrayList or ArrayDeque. There are benchmarks that illustrate this. It's not too difficult to come up with workloads where LinkedList outperforms the others, but it's rarer than people think.)
I was quite surprised to learn that the clear operation of LinkedList unlinks all the nodes from each other and from the contained element. Here's a link to the code in JDK 8. This change dates back to 2003, and the change appeared in JDK 5. This change was tracked by the bug JDK-4863813. That change (or a slightly earlier one) clears the next and previous references from individual nodes when they're unlinked from the list. There's also a test case in that bug report that's of some interest.
The problem seems to be that it is possible to make changes to the LinkedList, which creates garbage nodes, faster than the garbage collector can reclaim them. This eventually causes the JVM to run out of memory. The fact that the garbage nodes are all linked together also seems to have the effect of impeding the garbage collector, making it easier for the mutator threads to outstrip the collector threads. (It's not clear to me how important it is to have multiple threads mutating the list. In the test case they all synchronize on the list, so there's no actual parallelism.) The change to LinkedList to unlink the nodes from each other makes it easier for the collector to do its work, and so apparently makes the test no longer run out of memory.
Fast forward to 2009, when the LinkedList code was given a "facelift." This was tracked by bug JDK-6897553 and discussed in this email review thread. One of the original motivations for the "facelift" was to reduce the clear() operation from O(n) to O(1), as unlinking all the nodes seemed unnecessary to that bug's submitter. (It certainly seemed unnecessary to me!) But after some discussion, it was decided that the unlinking behavior provided enough benefit to the garbage collector to retain it and to document it.
The comment also says that unlinking the nodes
is sure to free memory even if there is a reachable Iterator
This refers to a somewhat pathological case like the following:
// fields in some class
List<Obj> list = createAndPopulateALinkedList();
Iterator<Object> iterator;
void someMethod() {
iterator = list.iterator();
// ...
list.clear();
}
The iterator points to one of the linked list's nodes. Even though the list has been cleared, the iterator still keeps a node alive, and since that node has next and previous references, all of the nodes formerly in the list are still alive. Unlinking all the nodes in clear() lets these be collected. I think this is pretty pathological, though, since it's rare for an iterator to be stored in a field. Usually iterators are created, used, and discarded within a single method, most often within a single for loop.
Now, regarding TreeMap. I don't think there's a fundamental reason why LinkedList unlinks its nodes whereas TreeMap does not. One might like to think that the entire JDK code base is maintained consistently, so that if it's good practice for LinkedList to unlink its nodes, this also ought to have been done to TreeMap. Alas, this is not the case. Most likely what happened is that a customer ran into the pathological behavior with LinkedList and the change was made there, but nobody has ever observed similar behavior with TreeMap. Thus there was no impetus to update TreeMap.

Why is clear an O(n) operation for linked list?

According to attachment 1, linked list's clear operation is O(n).
I have a question about why is it so.
Here is how we implemented the linked list in class(java)
public class LinkedIntList {
private ListNode front;
......
}
And if I were to write a clear method for this linked list class, this is how I would write it
public void clear() {
front = null;
}
Given this implementation(think this is how most people would write this), this would be one operation that is independent of the size of the list (just setting front to null). Also by setting the front pointer as null, wouldn't you essentially be asking the garbage collector to "reclaim the underlying memory and reuses it for future object allocation." In this case , the underlying memory would be the front node and all the nodes that are consecutively attached to it.(http://javabook.compuware.com/content/memory/how-garbage-collection-works.aspx)
After stating all of that, how is clear an O(n) operation for linked list?
Attachment 1:
This is from a data structures class I am in
Remember that a Linked List has n entries that were allocated for it, and for clearing it, you actually need to free them.
Since java has a built in garbage collector (GC) - you don't need to explicitly free those - but the GC will go over each and every one of them and free them when time comes
So even though your explicit method is O(1), invoking it requires O(n) time from the GC, which will make your program O(n)
I expect that your data structure class is not assuming that JAVA is the only system in the world.
In C, C++, Pascal, Assembly, Machine Code, Objective C, VB 6, etc, it takes a fixed time to free each block of memory, as they do not have a garbage collector. Until very recently most programs where wrote without the benefits of a garbage collector.
So in any of the above, all the node will need to be pass to free(), and the call to free() takes about a fixed time.
In Java, the link listed would take O(1) time to clear for a simple implantation of a linked list.
However as it may be possible that nodes would be pointed to from outside of the list, or that a garbage collector will consider different part of the memory at different time, there can be real life benefits from setting all the “next” and “prev” pointers to null. But in 99% of cases, it is best just to set the “front” pointer in the header to null as your code shows.
I think you should ask your lecture about this, as I expect lots of the students in the class will have the same issue. You need to learn C well before you can understand most generally data structure books or classes.

How to efficiently implement hashCode() for a singly linked list node in Java?

Eclipse implements the hashCode() function for a singly linked list's Node class the following way:
class Node{
int val;
Node next;
public Node(int val){
this.val = val;
next = null;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((next == null) ? 0 : next.hashCode());
result = prime * result + val;
return result;
}
}
Now hashCode() for a node is dependent on the hash code of the nodes that follow it.
So, every call of hashCode() will take amortized linear time in the length of the linked list. Thus using a HashSet<Node> will become unfeasible.
One way to get around this is to cache the value of the hashCode in a variable(call it hash) so that it is computed only once. But even in this case, the hash will become invalid once any node's val is changed. And again it will take linear time to modify the hashCode of nodes that follow the current node.
So what are some good ways of implementing hashing for such a linked list Node?
My first thought upon reading your question was: what does LinkedList do? Digging into the source, we see that there is no hashCode() or equals() defined on the inner LinkedList.Node class (link to source).
Why does this make sense? Well, nodes are normally internal data structures, only visible to the list itself. They are not going to be placed into collections or any other data structure where comparing equality and hash-codes are necessary. No external code has access to them.
You say in your question:
Thus using a HashSet<Node> will become unfeasible.
But I would argue that you have no need to place your nodes in such a data structure. By definition, your nodes will link to each other and require no additional classes to facilitate that relationship. And unless you plan to expose this class outside your list (which isn't necessary), they will never end up in a HashSet.
I would propose you follow the LinkedList.Node model and avoid creating these methods on your nodes. The outer list can base its hashcode and equality on the values stored in the nodes (but not the nodes themselves), which is how LinkedList does it - see AbstractList (link to source).
Source links are to the OpenJDK source, but in this case they are identical to source supplied with Oracle JDKs
You have to ask yourself what quality of hashing is valueable for you. The only restriction is to make sure another list with same number in same order has the same hash. That's achieved by using a contant number as well as using the first as well as by limiting on 5 numbers. How much numbers make sense for you depends on the structure of your data. If for example you always store consecutive, ascending numbers starting from 1 and the difference is only the length, that will be hard to optimize. If it's completly random over the entire range of int the first number will do the job well. How many numbers deliver the best ratio for you is found out by measuring I'd say.
In the end what you need is a good ration between collisions (objects put to the same bucket) and calculation time. Generated implementation typically try to maximize the calculation time, providing the human developer with the pleasure of much room for improvement. ;-)
And concerning the changing of contained value: java.util.HashSet (respectivly the HashMap it holds) will calulate its own hash upon yours, and cache that. So if an object containted in a HashSet can't be found again once it changed that far that its hash changed.

PriorityQueue.poll() calls compareTo()?

I am implementing a PriorityQueue in my program. For that I have also implemented compareTo(). The compareTo() is being called when I perform add(), which is expected. But it is also called when I perform poll(). I thought that the function of poll() is just to remove the head. Why does it need to call compareTo()?
The way a priority queue is implemented is often done with a heap. Part of poll()ing requires restructuring the heap which requires the heap to compare elements... hence compareTo(). This is just a guess though (i.e. I have not dug into the source code to verify my claim).
Here's a quick search on how priority queues are implemented using heaps if you are interested: http://pages.cs.wisc.edu/~vernon/cs367/notes/11.PRIORITY-Q.html#imp
Actually just for fun I'll describe how this works in a non-rigorous fashion. A heap is a tree satisfying the heap property: parents are always less than or equal to their children (min heap) or parents are always at least as large as their children (max heap). PriorityQueue is a minheap so poll() removes the root (make sure you understand this). But what happens to the tree if you remove the root? It's no longer a tree... So the way they fix this is by moving the root of the tree to a leaf node (where it can be plucked without destroying the tree/invalidating the heap property), and putting some other node in the root. But which node do you put into the root? Intuitively you might think they'd put the left or right child of the root (those are "almost as small as the original root"). You can do that, but you'd then need to fix the subtree rooted at that child (and the code is ugly). Instead they do the same thing (conceptually) but do it slightly differently to make the code nicer. In particular, they pluck a leaf node and stick it in the root (generally you swap the root and the leaf node to do both steps simultaneously). However, the heap property is no longer necessarily satisfied (the leaf node we stuck in the root could be quite large!). To fix this, you "bubble down" the new root until you get it to its correct location. Specifically, you compare the new root with the left and right children and keep swapping (if the parent is larger than at least one of the children) until the heap property is satisfied. Notice that this swapping will indeed lead to a valid heap (you can prove this, but it's intuitive).
Everything is in the JavaDoc (emphasis mine):
An unbounded priority queue based on a priority heap.
And in the source code of poll() you'll find:
public E poll() {
//...
if (s != 0)
siftDown(0, x);
return result;
}
Where siftDown() is:
/**
* Inserts item x at position k, maintaining heap invariant by
* demoting x down the tree repeatedly until it is less than or
* equal to its children or is a leaf.
* [...]
*/
private void siftDown(int k, E x) {
if (comparator != null)
siftDownUsingComparator(k, x);
else
siftDownComparable(k, x);
}
The JavaDoc comment on siftDown() is crucial, read it carefully. Basically the undeerlying implementation of PriorityQueue uses a heap which has to be restructured every time you modify it by polling.
Why are you bothered by this? compareTo() should be lightweight, idempotent and side-effect free method, like equals(). You shouldn't put any restrictions on it.

Updating Java PriorityQueue when its elements change priority

I'm trying to use a PriorityQueue to order objects using a Comparator.
This can be achieved easily, but the objects class variables (with which the comparator calculates priority) may change after the initial insertion. Most people have suggested the simple solution of removing the object, updating the values and reinserting it again, as this is when the priority queue's comparator is put into action.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
You have to remove and re-insert, as the queue works by putting new elements in the appropriate position when they are inserted. This is much faster than the alternative of finding the highest-priority element every time you pull out of the queue. The drawback is that you cannot change the priority after the element has been inserted. A TreeMap has the same limitation (as does a HashMap, which also breaks when the hashcode of its elements changes after insertion).
If you want to write a wrapper, you can move the comparison code from enqueue to dequeue. You would not need to sort at enqueue time anymore (because the order it creates would not be reliable anyway if you allow changes).
But this will perform worse, and you want to synchronize on the queue if you change any of the priorities. Since you need to add synchronization code when updating priorities, you might as well just dequeue and enqueue (you need the reference to the queue in both cases).
I don't know if there is a Java implementation, but if you're changing key values alot, you can use a Fibonnaci heap, which has O(1) amortized cost to decrease a key value of an entry in the heap, rather than O(log(n)) as in an ordinary heap.
One easy solution that you can implement is by just adding that element again into the priority queue. It will not change the way you extract the elements although it will consume more space but that also won't be too much to effect your running time.
To proof this let's consider dijkstra algorithm below
public int[] dijkstra() {
int distance[] = new int[this.vertices];
int previous[] = new int[this.vertices];
for (int i = 0; i < this.vertices; i++) {
distance[i] = Integer.MAX_VALUE;
previous[i] = -1;
}
distance[0] = 0;
previous[0] = 0;
PriorityQueue<Node> pQueue = new PriorityQueue<>(this.vertices, new NodeComparison());
addValues(pQueue, distance);
while (!pQueue.isEmpty()) {
Node n = pQueue.remove();
List<Edge> neighbours = adjacencyList.get(n.position);
for (Edge neighbour : neighbours) {
if (distance[neighbour.destination] > distance[n.position] + neighbour.weight) {
distance[neighbour.destination] = distance[n.position] + neighbour.weight;
previous[neighbour.destination] = n.position;
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
}
}
}
return previous;
}
Here our interest is in line
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
I am not changing priority of the particular node by removing it and adding again rather I am just adding new node with same value but different priority.
Now at the time of extracting I will always get this node first because I have implemented min heap here and the node with value greater than this (less priority) always be extracted afterwards and in this way all neighboring nodes will already be relaxed when less prior element will be extracted.
Without reimplementing the priority queue yourself (so by only using utils.PriorityQueue) you have essentially two main approaches:
1) Remove and put back
Remove element then put it back with new priority. This is explained in the answers above. Removing an element is O(n) so this approach is quite slow.
2) Use a Map and keep stale items in the queue
Keep a HashMap of item -> priority. The keys of the map are the items (without their priority) and the values of the map are the priorities.
Keep it in sync with the PriorityQueue (i.e. every time you add or remove an item from the Queue, update the Map accordingly).
Now when you need to change the priority of an item, simply add the same item to the queue with a different priority (and update the map of course). When you poll an item from the queue, check if its priority is the same than in your map. If not, then ditch it and poll again.
If you don't need to change the priorities too often, this second approach is faster. Your heap will be larger and you might need to poll more times, but you don't need to find your item.
The 'change priority' operation would be O(f(n)log n*), with f(n) the number of 'change priority' operation per item and n* the actual size of your heap (which is n*f(n)).
I believe that if f(n) is O(n/logn)(for example f(n) = O(sqrt(n)), this is faster than the first approach.
Note : in the explanation above, by priority I means all the variables that are used in your Comparator. Also your item need to implement equals and hashcode, and both methods shouldn't use the priority variables.
It depends a lot on whether you have direct control of when the values change.
If you know when the values change, you can either remove and reinsert (which in fact is fairly expensive, as removing requires a linear scan over the heap!).
Furthermore, you can use an UpdatableHeap structure (not in stock java though) for this situation. Essentially, that is a heap that tracks the position of elements in a hashmap. This way, when the priority of an element changes, it can repair the heap. Third, you can look for an Fibonacci heap which does the same.
Depending on your update rate, a linear scan / quicksort / QuickSelect each time might also work. In particular if you have much more updates than pulls, this is the way to go. QuickSelect is probably best if you have batches of update and then batches of pull opertions.
To trigger reheapify try this:
if(!priorityQueue.isEmpty()) {
priorityQueue.add(priorityQueue.remove());
}
Something I've tried and it works so far, is peeking to see if the reference to the object you're changing is the same as the head of the PriorityQueue, if it is, then you poll(), change then re-insert; else you can change without polling because when the head is polled, then the heap is heapified anyways.
DOWNSIDE: This changes the priority for Objects with the same Priority.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
It depends on the definition of "better" and the implementation of the wrapper.
If the implementation of the wrapper is to re-insert the value using the PriorityQueue's .remove(...) and .add(...) methods,
it's important to point out that .remove(...) runs in O(n) time.
Depending on the heap implementation,
updating the priority of a value can be done in O(log n) or even O(1) time,
therefore this wrapper suggestion may fall short of common expectations.
If you want to minimize your effort to implement,
as well as the risk of bugs of any custom solution,
then a wrapper that performs re-insert looks easy and safe.
If you want the implementation to be faster than O(n),
then you have some options:
Implement a heap yourself. The wikipedia entry describes multiple variants with their properties. This approach is likely to get your the best performance, at the same time the more code you write yourself, the greater the risk of bugs.
Implement a different kind of wrapper: handlee updating the priority by marking the entry as removed, and add a new entry with the revised priority.
This is relatively easy to do (less code), see below, though it has its own caveats.
I came across the second idea in Python's documentation,
and applied it to implement a reusable data structure in Java (see caveats at the bottom):
public class UpdatableHeap<T> {
private final PriorityQueue<Node<T>> pq = new PriorityQueue<>(Comparator.comparingInt(node -> node.priority));
private final Map<T, Node<T>> entries = new HashMap<>();
public void addOrUpdate(T value, int priority) {
if (entries.containsKey(value)) {
entries.remove(value).removed = true;
}
Node<T> node = new Node<>(value, priority);
entries.put(value, node);
pq.add(node);
}
public T pop() {
while (!pq.isEmpty()) {
Node<T> node = pq.poll();
if (!node.removed) {
entries.remove(node.value);
return node.value;
}
}
throw new IllegalStateException("pop from empty heap");
}
public boolean isEmpty() {
return entries.isEmpty();
}
private static class Node<T> {
private final T value;
private final int priority;
private boolean removed = false;
private Node(T value, int priority) {
this.value = value;
this.priority = priority;
}
}
}
Note some caveats:
Entries marked removed stay in memory until they are popped
This can be unacceptable in use cases with very frequent updates
The internal Node wrapped around the actual values is an extra memory overhead (constant per entry). There is also an internal Map, mapping all the values currently in the priority queue to their Node wrapper.
Since the values are used in a map, users must be aware of the usual cautions when using a map, and make sure to have appropriate equals and hashCode implementations.

Categories