I have to write a program as efficiently as possible that will insert given nodes into a sorted LinkedList. I'm thinking of how binary search is faster than linear in average and worst case, but when I Googled it, the runtime was O(nlogn)? Should I do linear on a singly-LinkedList or binary search on a doubly-LinkedList and why is that one (the one to chose) faster?
Also how is the binary search algorithm > O(logn) for a doubly-LinkedList?
(No one recommend SkipList, I think they're against the rules since we have another implementation strictly for that data structure)
You have two choices.
Linearly search an unordered list. This is O(N).
Linearly search an ordered list. This is also O(N) but it is twice as fast, as on average the item you search for will be in the middle, and you can stop there if it isn't found.
You don't have the choice of binary searching it, as you don't have direct access to elements of a linked list.
But if you consider search to be a rate-determining step, you shouldn't use a linked list at all: you should use a sorted array, a heap, a tree, etc.
Binary search is very fast on arrays simply because it's very fast and simple to access the middle index between any two given indexes of elements in the array. This make it's running time complexity to be O(log n) while taking a constant O(1) space.
For the linked list, it's different, since in order to access the middle element we need to traverse it node by node and therefore finding the middle node could run in an order of O(n)
Thus binary search is slow on linked list and fast on arrays
Binary search is possible by using skip list. You will spend number of pointers as twice as linked list if you set skip 2, 4, 8, ..., 2^n at same time. And then you can get O(log n) for each search.
If your data store in each node is quite big, applying this will very efficient.
You can read more on https://www.geeksforgeeks.org/skip-list/amp/
So basically binary search on a LL is O(n log n) because you would need to traverse the list n times to search the item and then log n times to split the searched set. But this is only true if you are traversing the LL from the beginning every time.
Ideally if you figure out some method (which it's possible!) to start from somewhere else like... the middle of the searched set, then you eliminate the need to always traverse the list n times to start the search and can optimize your algorithm to O(log n).
Related
Which would be faster when using a LinkedList? I haven't studied sorting and searching yet. I was thinking adding them directly in a sorted manner would be faster, as manipulating the nodes and pointers afterward intuitively seems to be very expensive. Thanks.
In general, using a linked list has many difficulties and is very expensive .
i)In the usual case, if you want to add values to the sorted linked list, you must go through the following steps:
If Linked list is empty then make the node as head and return it.
If the value of the node to be inserted is smaller than the value of the head node, then insert the node at the start and make it head.
In a loop, find the appropriate node after
which the input node is to be inserted.
To find the appropriate node start from the head,
keep moving until you reach a node GN who's value is greater than
the input node. The node just before GN is the appropriate node .
Insert the node after the appropriate node found in step 3.
the Time Complexity in this case is O(n). ( for each element )
but don't forget to sort the linked list before adding other elements in a sorted manner ; This means you have to pay extra cost to sort the linked list.
ii ) but if you want to add elements to the end of the linked list and then sort them the situation depends on the way of sorting ! for example you can use merge sort that has O(n*log n) time complexity ! and even you can use insertion sort with O(n^2) time complexity .
note : How to sort is a separate issue .
As you have already noticed! It is not easy to talk about which method is faster and it depends on the conditions of the problem, such as the number of elements of the link list and the elements needed to be added, or whether the cost of initial sorting is considered or not!
You have presented two options:
Keep the linked list sorted by inserting each new node at its sorted location.
Insert each new node at the end (or start) of the linked list and when all nodes have been added, sort the linked list
The worst case time complexity for the first option occurs when the list has each time to be traversed completely to find the position where a new node is to be inserted. In that case the time complexity is O(1+2+3+...+n) = O(n²)
The worst time complexity for the second option is O(n) for inserting n elements and then O(nlogn) for sorting the linked list with a good sorting algorithm. So in total the sorting algorithm determines the overall worst time complexity, i.e. O(nlogn).
So the second option has the better time complexity.
Merge sort has a complexity of O(nlogn) and is suitable for linked list.
If your data has a limited range, you can use radix sort and achieve O(kn) complexity, where k is log(range size).
In practice it is better to insert elements in a vector (dynamic array), then sort the array, and finally turn the array into a list. This will almost certainly give better running times.
I am currently studying time complexity of arraylist especially about access and search. And I am little confused about which one is which.
So I know that time complexity of access(when you know index) is O(1).
But are 2 and 3 correct??
Search on the arraylist when the arraylist is sorted and you do not know that index is O(n)...?
Time complexity When you need to find data from unsorted arraylist and you do not know its index is O(n)...?
Should answer for 2 and 3 be same? or sorted / unsorted arraylist would change the time complexity?
When doing a linear search through an arraylist, you are looking for some element. Since you don't know what index the data is at, you have to go through every element until you find the item you are looking for.
In the worst case, you have to go all the way until the last element. Big Oh works as an upper bound on the algorithm. In the worst case, we have to go through all n elements, so the algorithm is O(n).
If the arraylist was sorted, you could use a binary search which is O(log n)
For #2, it depends on how you're doing the search. If you do a sequential search, e.g. using indexOf(Object o), then performance is O(n), regardless of whether the data is sorted or not.
Searching a sorted list or array can be done in O(log n) using a binary search algorithm.
Doing a binary search on a List can be done using Collections.binarySearch(). As the javadoc says:
This method runs in log(n) time for a "random access" list (which provides near-constant-time positional access). If the specified list does not implement the RandomAccess interface and is large, this method will do an iterator-based binary search that performs O(n) link traversals and O(log n) element comparisons.
Doing a binary search on an array can be done using Arrays.binarySearch().
I need to write a program that searches through a previously sorted linked list and I'm not sure which search would be more efficient.
All traversals of linked lists are in order, so a linear search is the best you can do, with an average case linear in the number of elements. If there are going to be lots of searches and you still need a linked list (instead of a random-access structure such as an array), consider a skip list, which lets you skip forward until you get near the desired element.
Linear search would be more efficient, and here's why.
In order to get to the kth location in a doubly linked list of size n, you have to iterate over at most n/2 elements.
If you were to apply binary search on top of that, then you'd wind up having to go down k elements every time, plus the work to perform a binary search.
O(n + log(n)) = O(n), which is equivalent to the performance of a linear search.
If your comparisons are cheap, the other answers are correct that linear and binary search are mostly equivalent (binary search has slightly higher expected number of node traversals, though the big-O is the same, O(n) traversals).
But this assumes that comparisons are a negligible cost relative to traversals. And that's not always a good assumption. If your comparisons are expensive, then binary search is still worth it, because binary search remains O(log n) on number of comparisons, while linear search is O(n). For example, if your comparison operation is something ridiculously expensive (say, the MD5 hash of file data, and you didn't cache it with the file names for whatever reason), and you've got a 1000 element list, linear search means you're, on average, computing the MD5 hash of 500 files (in this case, you could probably reduce it a bit by choosing the end to start from based on whether the MD5 you're searching for begins with 0-7 or 8-f, but even so, it's O(n) comparisons divided by a constant factor). Binary search means at most 10 (or 11, can't be bothered with off-by-one errors) file reads and MD5 computations. If the files are large enough, that's the difference between taking ~1 second to run and taking ~50 seconds.
Question: Given: a list of integers (duplicates are allowed); and integer N. Remove the duplicates from the list and find the N-th largest element in the modified list. Implement at least two different solutions to find N-th largest element with O(N*log(N)) average time complexity in Big-O notation, where N is the number of elements in the list.
According to my understanding i can use Merge Sort, Heap Sort, Quick sort on the provided integer list with duplicates to find the N-th largest element with O(N*log(N)) average time complexity in Big-O notation. Is that correct ?
Also, what about duplicates in the list do i just add an extra line of code in the above mentioned algorithm to remove duplicates will that not affect the O(N*log(N)) average time complexity because Merge Sort, Heap Sort, Quick sort will only sort the list not delete duplicates.
I am not looking for any code but just tips and ideas about how to proceed with the question ? I am using Java also is there any predefined classed/methods in java that i can use to accomplish the task rather than me coding Merge Sort, Heap Sort, Quick sort on my own.
My aim is to complete the task keeping in mind O(N*log(N)) average time complexity.
You first sort the list and then you remove illiterate over it to find all the duplicates. A second methode might be to loop over all elements and keep a tree structure in which for each element you first check if it's already present in the tree structure O(log(n)) and if is remove it else add it to the tree structure O(log(n)). This makes the whole algorithm O(n(log(n))).
Since you added java as a tag:
List<Integer> listWithoutDuplicates = new ArrayList(new TreeSet(originalList));
This will construct a new, sorted list without duplicates. Regarding time complexity, this adds up to O(n*log n) (the complexity of constructing a binary balanced tree with n elements). As a bonus, you can try and replace TreeSet with LinkedHashSet to preserve elements' order (among duplicates, I think the last one is kept).
About the last part, removing an element by index is straight forward afterwards.
Of course, if it is a homework question, you may need to implement your own red-black tree to get the same result.
I'm accessing the minimum element of a binary tree lots of times. What implementations allow me to access the minimum element in constant time, rather than O(log n)?
Depending on your other requirements, a min-heap might be what you are looking for. It gives you constant time retrieval of the minimum element.
However you cannot do some other operations with the same ease as with a simple binary search tree, like determining if a value is in the tree or not. You can take a look at splay trees, a kind of self-balancing binary tree that provides improved access time to recently accessed elements.
Find it once in O(log n) and then compare new values which you are going to add with this cached minimum element.
UPD: about how can this work if you delete the minimum element. You'll just need to spend O(log n) one more time and find new one.
Let's imagine that you have 10 000 000 000 000 of integers in your tree. Each element needs 4 bytes in memory. In this case all your tree needs about 40 TB of harddrive space. Time O (log n) which should be spent for searching minimum element in this huge tree is about 43 operations. Of course it's not the simplest operations but anyway it's almost nothing even for 20 years old processors.
Of course this is actual if it's a real-world problem. If for some purposes (maybe academical) you need real O(1) algorithm then I'm not sure that my approach can give you such performance without using additional memory.
This may sound silly, but if you mostly access the minimum element, and don't change the tree too much, maintaining a pointer to the minimal element on add/delete (on any tree) may be the best solution.
Walking the tree will always be O(log n). Did you write the tree implementation yourself? You can always simply stash a reference to the current lowest value element alongside your data structure and keep it updated when adding/removing nodes. (If you didn't write the tree, you could also do this by wrapping the tree implementation in your own wrapper object that does the same thing.)
There is a implementation in the TAOCP that uses the "spare" pointers in non-full nodes to complete a double linked list along the nodes in order (I don't recall the detail right now, but I image you have to have a "has_child" flag for each direction to make it work).
With that and a start pointer, you could have the starting element in O(1) time.
This solution is not faster or more efficient that caching the minimum.
If by minimum element you mean element with the smallest value then you could use TreeSet with a custom Comparator which sorts the items into correct order to store individual elements and then just call SortedSet#first() or #last() to get the biggest/smallest values as efficiently as possible.
Note that inserting new items to TreeSet is slightly slow compared to other Sets/Lists but if you don't have a huge amount of elements which constantly change then it shouldn't be a problem.
If you can use a little memory it sounds like a combined collection might work for you.
For instance, what you are looking for sounds a lot like a linked list. You can always get to the minimum element but an insert or lookup of an arbitrary node might take longer because you have to do a lookup O(n)
If you combine a linked list and a tree you might get the best of both worlds. To look up an item for get/insert/delete operations you would use the tree to find the element. The element's "holder" node would have to have ways to cross over from the tree to the linked list for delete operations. Also the linked list would have to be a doubly linked list.
So I think getting the smallest item would be O(1), any arbitrary lookup/delete would be O(logN)--I think even an insert would be O(logN) because you could find where to put it in the tree, look at the previous element and cross over to your linked list node from there, then add "next".
Hmm, this is starting to seem like a really useful data structure, perhaps a little wasteful in memory, but I don't think any operation would be worse than O(logN) unless you have to re balance the tree.
If you upgrade / "upcomplex" your binary tree to a threaded binary tree, then you can get O(1) first and last elements.
You basically keep a reference to the current first and last nodes.
Immediately after an insert, if first's previous is non-null, then update first. Similarly for last.
Whenever you remove, you first check whether the node being removed is the first or last. And update that stored first, last appropriately.