Question: Given: a list of integers (duplicates are allowed); and integer N. Remove the duplicates from the list and find the N-th largest element in the modified list. Implement at least two different solutions to find N-th largest element with O(N*log(N)) average time complexity in Big-O notation, where N is the number of elements in the list.
According to my understanding i can use Merge Sort, Heap Sort, Quick sort on the provided integer list with duplicates to find the N-th largest element with O(N*log(N)) average time complexity in Big-O notation. Is that correct ?
Also, what about duplicates in the list do i just add an extra line of code in the above mentioned algorithm to remove duplicates will that not affect the O(N*log(N)) average time complexity because Merge Sort, Heap Sort, Quick sort will only sort the list not delete duplicates.
I am not looking for any code but just tips and ideas about how to proceed with the question ? I am using Java also is there any predefined classed/methods in java that i can use to accomplish the task rather than me coding Merge Sort, Heap Sort, Quick sort on my own.
My aim is to complete the task keeping in mind O(N*log(N)) average time complexity.
You first sort the list and then you remove illiterate over it to find all the duplicates. A second methode might be to loop over all elements and keep a tree structure in which for each element you first check if it's already present in the tree structure O(log(n)) and if is remove it else add it to the tree structure O(log(n)). This makes the whole algorithm O(n(log(n))).
Since you added java as a tag:
List<Integer> listWithoutDuplicates = new ArrayList(new TreeSet(originalList));
This will construct a new, sorted list without duplicates. Regarding time complexity, this adds up to O(n*log n) (the complexity of constructing a binary balanced tree with n elements). As a bonus, you can try and replace TreeSet with LinkedHashSet to preserve elements' order (among duplicates, I think the last one is kept).
About the last part, removing an element by index is straight forward afterwards.
Of course, if it is a homework question, you may need to implement your own red-black tree to get the same result.
Related
Which would be faster when using a LinkedList? I haven't studied sorting and searching yet. I was thinking adding them directly in a sorted manner would be faster, as manipulating the nodes and pointers afterward intuitively seems to be very expensive. Thanks.
In general, using a linked list has many difficulties and is very expensive .
i)In the usual case, if you want to add values to the sorted linked list, you must go through the following steps:
If Linked list is empty then make the node as head and return it.
If the value of the node to be inserted is smaller than the value of the head node, then insert the node at the start and make it head.
In a loop, find the appropriate node after
which the input node is to be inserted.
To find the appropriate node start from the head,
keep moving until you reach a node GN who's value is greater than
the input node. The node just before GN is the appropriate node .
Insert the node after the appropriate node found in step 3.
the Time Complexity in this case is O(n). ( for each element )
but don't forget to sort the linked list before adding other elements in a sorted manner ; This means you have to pay extra cost to sort the linked list.
ii ) but if you want to add elements to the end of the linked list and then sort them the situation depends on the way of sorting ! for example you can use merge sort that has O(n*log n) time complexity ! and even you can use insertion sort with O(n^2) time complexity .
note : How to sort is a separate issue .
As you have already noticed! It is not easy to talk about which method is faster and it depends on the conditions of the problem, such as the number of elements of the link list and the elements needed to be added, or whether the cost of initial sorting is considered or not!
You have presented two options:
Keep the linked list sorted by inserting each new node at its sorted location.
Insert each new node at the end (or start) of the linked list and when all nodes have been added, sort the linked list
The worst case time complexity for the first option occurs when the list has each time to be traversed completely to find the position where a new node is to be inserted. In that case the time complexity is O(1+2+3+...+n) = O(n²)
The worst time complexity for the second option is O(n) for inserting n elements and then O(nlogn) for sorting the linked list with a good sorting algorithm. So in total the sorting algorithm determines the overall worst time complexity, i.e. O(nlogn).
So the second option has the better time complexity.
Merge sort has a complexity of O(nlogn) and is suitable for linked list.
If your data has a limited range, you can use radix sort and achieve O(kn) complexity, where k is log(range size).
In practice it is better to insert elements in a vector (dynamic array), then sort the array, and finally turn the array into a list. This will almost certainly give better running times.
Question:
Which sorting algorithm will be fastest when run on an array that happens to already be in order?
(A) It is not possible to know which will be fastest.
(B) selection sort
(C) insertion sort
(D) binary sort
(E) All of these algorithms will run at the same speed.
I have been doing some research for a homework assignment and have been getting conflicting answers. Some places say it is insertion, while some say both are equal and yet others say it can't be determined. Very confused right now, would appreciate some help.
C Insertion sort
Is normally always the fastest and easiest to implement when an array is already nearly or completely sorted. As we have less operations.
Selection sort will still do pair wise comparison and binary sort will also be slightly slower.
I would say that insertion sort because:
Insertion sort is a simple sorting algorithm, it builds the final sorted array one item at a time. It is much less efficient on large lists than other sort algorithms.
Advantages of Insertion Sort:
1) It is very simple.
2) It is very efficient for small data sets.
3) It is stable; i.e., it does not change the relative order of elements with equal keys.
4) In-place; i.e., only requires a constant amount O(1) of additional memory space.
Insertion sort iterates through the list by consuming one input element at each repetition, and growing a sorted output list. On a repetition, insertion sort removes one element from the input data, finds the location it belongs within the sorted list, and inserts it there. It repeats until no input elements remain.
I read this at- http://www.java2novice.com/java-sorting-algorithms/insertion-sort/
I have to write a program as efficiently as possible that will insert given nodes into a sorted LinkedList. I'm thinking of how binary search is faster than linear in average and worst case, but when I Googled it, the runtime was O(nlogn)? Should I do linear on a singly-LinkedList or binary search on a doubly-LinkedList and why is that one (the one to chose) faster?
Also how is the binary search algorithm > O(logn) for a doubly-LinkedList?
(No one recommend SkipList, I think they're against the rules since we have another implementation strictly for that data structure)
You have two choices.
Linearly search an unordered list. This is O(N).
Linearly search an ordered list. This is also O(N) but it is twice as fast, as on average the item you search for will be in the middle, and you can stop there if it isn't found.
You don't have the choice of binary searching it, as you don't have direct access to elements of a linked list.
But if you consider search to be a rate-determining step, you shouldn't use a linked list at all: you should use a sorted array, a heap, a tree, etc.
Binary search is very fast on arrays simply because it's very fast and simple to access the middle index between any two given indexes of elements in the array. This make it's running time complexity to be O(log n) while taking a constant O(1) space.
For the linked list, it's different, since in order to access the middle element we need to traverse it node by node and therefore finding the middle node could run in an order of O(n)
Thus binary search is slow on linked list and fast on arrays
Binary search is possible by using skip list. You will spend number of pointers as twice as linked list if you set skip 2, 4, 8, ..., 2^n at same time. And then you can get O(log n) for each search.
If your data store in each node is quite big, applying this will very efficient.
You can read more on https://www.geeksforgeeks.org/skip-list/amp/
So basically binary search on a LL is O(n log n) because you would need to traverse the list n times to search the item and then log n times to split the searched set. But this is only true if you are traversing the LL from the beginning every time.
Ideally if you figure out some method (which it's possible!) to start from somewhere else like... the middle of the searched set, then you eliminate the need to always traverse the list n times to start the search and can optimize your algorithm to O(log n).
I'm comparing insertion sort with a quicksort. I've figured out why the qsort is slower on an almost sorted array but I cant figure out why the insersion sort is so much quicker, surely it still has to compare nearly as many elements in the array?
The reason that insertion sort is faster on sorted or nearly-sorted arrays is that when it's inserting elements into the sorted portion of the array, it barely has to move any elements at all. For example, if the sorted portion of the array is 1 2 and the next element is 3, it only has to do one comparison -- 2 < 3 -- to determine that it doesn't need to move 3 at all. As a result, insertion sort on an already-sorted array is linear-time, since it only needs to do one comparison per element.
It depends on several factors. Insertion sort will be more efficient that quick sort on small data sets. It also depends on your backing list implementation (LinkedList or ArrayList). Lastly, it also depends on whether there is any kind of ordering to the input data. For example, if your input data is already sorted in the opposite order and you use a LinkedList, the insertion sort will be blazing fast.
Quicksort has its worst case (O(n^2) time complexity) on already sorted arrays (see quicksort entry on Wikipedia).
Depending on the choice of the pivot element, this can be alleviated somewhat, but at the same time, the best case for insertion sort is exactly the pre-sorted case (it has O(n) time complexity for such inputs), so it is going to beat quicksort for this particular use case.
Insertion Sort algorithm runs in a linear time on an sorted array, because for a sorted array , Insertion sort has to do only one comparison , that is for the previous element , as a result Insertion sorting on sorted array has to move linearly , getting to complexity as - O(n).
As Compared to Quick Sort where complexity is O(n2)
Sorted inputs are best case for insertion sort (O(n)) and worst case for quick sort (O(n^2)).
It all has to do with complexity which is determined by number of key comparison which is the basic operation in both algorithm.
So when you see the algorithm of both you find that in insertion sort you only have n comparison as when we insert an element we only compare with left element and its done. On the other hand in case of quick sort u have to keep comparing your pivot element to all your left element and your array kind of decrease by a constant one factor leading to approx n^2 comparison.
I am optimizing an implementation of a sorted LinkedList.
To insert an element I traverse the list and compare each element until I have the correct index, and then break loop and insert.
I would like to know if there is any other way that I can insert the element at the same time as traversing the list to reduce the insert from O(n + (n capped at size()/2)) to O(n).
A ListIterator is almost what Im after because of its add() method, but unfortunately in the case where there are elements in the list equal to the insert, the insert has to be placed after them in the list. To implement this ListIterator needs a peek() which it doesnt have.
edit: I have my answer, but will add this anyway since a lot of people havent understood correctly:
I am searching for an insertion point AND inserting, which combined is higher than O(n)
You may consider a skip list, which is implemented using multiple linked lists at varying granularity. E.g. the linked list at level 0 contains all items, level 1 only links to every 2nd item on average, level 2 to only every 4th item on average, etc.... Searching starts from the top level and gradually descends to lower levels until it finds an exact match. This logic is similar to a binary search. Thus search and insertion is an O(log n) operation.
A concrete example in the Java class library is ConcurrentSkipListSet (although it may not be directly usable for you here).
I'd favor Péter Török suggestion, but I'd still like to add something for the iterator approach:
Note that ListIterator provides a previous() method to iterate through the list backwards. Thus first iterate until you find the first element that is greater and then go to the previous element and call add(...). If you hit the end, i.e. all elements are smaller, then just call add(...) without going back.
I have my answer, but will add this anyway since a lot of people havent understood correctly: I am searching for an insertion point AND inserting, which combined is higher than O(n).
Your require to maintain a collection of (possibly) non-unique elements that can iterated in an order given by a ordering function. This can be achieved in a variety of ways. (In the following I use "total insertion cost" to mean the cost of inserting a number (N) of elements into an initially empty data structure.)
A singly or doubly linked list offers O(N^2) total insertion cost (whether or not you combine the steps of finding the position and doing the insertion!), and O(N) iteration cost.
A TreeSet offers O(NlogN) total insertion cost and O(N) iteration cost. But has the restriction of no duplicates.
A tree-based multiset (e.g. TreeMultiset) has the same complexity as a TreeSet, but allows duplicates.
A skip-list data structure also has the same complexity as the previous two.
Clearly, the complexity measures say that a data structure that uses a linked list performs the worst as N gets large. For this particular group of requirements, a well-implemented tree-based multiset is probably the best, assuming there is only one thread accessing the collection. If the collection is heavily used by many threads (and it is a set), then a ConcurrentSkipListSet is probably better.
You also seem to have a misconception about how "big O" measures combine. If I have one step of an algorithm that is O(N) and a second step that is also O(N), then the two steps combined are STILL O(N) .... not "more than O(N)". You can derive this from the definition of "big O". (I won't bore you with the details, but the Math is simple.)