So, I have several lists of word pairs and I need to sort them in ascending or descending order. The method I'm using right now is the insertion sort algorithm. And this seems to work fine for the smaller lists. But every time I try to sort a large list, it freezes, with no errors. I tried to see what was going on by debugging with printing out "a was swapped for b"
and you can see it start working and it slows down and eventually stops like the computer just said, "there's too many, I give up". My question is, is there something wrong with my code, or do I simply need to use a more efficient method, and if so, which one and what would it look like?
for (int j=0; j < wordpair_list.size()-1; j++){
for (int i=0; i < wordpair_list.size()-1; i++){
String wordA_1 = wordpair_list.get(i).getWordA();
String wordA_2 = wordpair_list.get(i+1).getWordA();
if (wordA_1.compareToIgnoreCase(wordA_2) < 0){
WordPair temp = wordpair_list.get(i);
wordpair_list.set(i,wordpair_list.get(i+1));
wordpair_list.set(i+1, temp);
}
}
}
that's for descending. all i do for ascending is swap the '>' in the if statement to '<'
I think you are performing bubble sort. As others have pointed out, performing get() and set() operations are expensive with linked lists.
I am not conversant with Java, but it appears that you can use ListIterators to carry out bubble sort in O(N^2)
ListIterator listIterator(int index) Returns a list-iterator of the
elements in this list (in proper sequence), starting at the specified
position in the list. Throws IndexOutOfBoundsException if the
specified index is is out of range (index < 0 || index >= size()).
For bubble sort, you just need to swap the adjacent elements, so you can iterate through the list like an array and keep swapping if needed.
Moreover, you can skip the section of the list that is already sorted. Take a look at a good bubble sort algorithm.
http://en.wikipedia.org/wiki/Bubble_sort
Besides the fact that insertion sort is already O(N^2) algorithm, the access (both get and set) to the item in the linked list by the item index is O(N) operation, making your code O(N^3), in other words extremely slow.
Basically, you have two options:
copy the linked list into a temp array, sort an array using your algorithm (array access by index is O(1), overall algorithm will stay at O(N^2) (unless you choose a better one), create a new sorted linked list from the array.
use some other algorithm that does not need indexed access (for example, it is actually possible to implement insertion sort without indexed operations because you can swap two adjacent items in the linked list, through you will have to use a linked list implementation where you have direct access to the "previous" and "next" links).
First of all this is not insert sort, this is closer to bubble sort. Second, there is nothing wrong with your code, but as expected for this sorting algorithm it is quadratic. Thus for larger input it may take a long time to finish(e.g. for 200 000 elements it may take several minutes).
EDIT: as you are using List the complexity is even higher - up to cubic. As set in a list is not constant. You may try to implement the algorithm with Array to save this added complexity.
Related
Which would be faster when using a LinkedList? I haven't studied sorting and searching yet. I was thinking adding them directly in a sorted manner would be faster, as manipulating the nodes and pointers afterward intuitively seems to be very expensive. Thanks.
In general, using a linked list has many difficulties and is very expensive .
i)In the usual case, if you want to add values to the sorted linked list, you must go through the following steps:
If Linked list is empty then make the node as head and return it.
If the value of the node to be inserted is smaller than the value of the head node, then insert the node at the start and make it head.
In a loop, find the appropriate node after
which the input node is to be inserted.
To find the appropriate node start from the head,
keep moving until you reach a node GN who's value is greater than
the input node. The node just before GN is the appropriate node .
Insert the node after the appropriate node found in step 3.
the Time Complexity in this case is O(n). ( for each element )
but don't forget to sort the linked list before adding other elements in a sorted manner ; This means you have to pay extra cost to sort the linked list.
ii ) but if you want to add elements to the end of the linked list and then sort them the situation depends on the way of sorting ! for example you can use merge sort that has O(n*log n) time complexity ! and even you can use insertion sort with O(n^2) time complexity .
note : How to sort is a separate issue .
As you have already noticed! It is not easy to talk about which method is faster and it depends on the conditions of the problem, such as the number of elements of the link list and the elements needed to be added, or whether the cost of initial sorting is considered or not!
You have presented two options:
Keep the linked list sorted by inserting each new node at its sorted location.
Insert each new node at the end (or start) of the linked list and when all nodes have been added, sort the linked list
The worst case time complexity for the first option occurs when the list has each time to be traversed completely to find the position where a new node is to be inserted. In that case the time complexity is O(1+2+3+...+n) = O(n²)
The worst time complexity for the second option is O(n) for inserting n elements and then O(nlogn) for sorting the linked list with a good sorting algorithm. So in total the sorting algorithm determines the overall worst time complexity, i.e. O(nlogn).
So the second option has the better time complexity.
Merge sort has a complexity of O(nlogn) and is suitable for linked list.
If your data has a limited range, you can use radix sort and achieve O(kn) complexity, where k is log(range size).
In practice it is better to insert elements in a vector (dynamic array), then sort the array, and finally turn the array into a list. This will almost certainly give better running times.
I need to compare about 60.000 with a list of 935.000 elements and if they match I need to perform a calculation.
I already implemented everything needed but the process takes about 40 min. I have a unique 7-digit number in both lists. The 935.000 and the 60.000 files are unsorted. Is it more efficient to sort (which sort?) the big list before I try to find the element? Keep in mind that I have to do this calculation only once a month so I don't need to repeat the process every day.
Basically which is faster:
unsorted linear search
sort list first and then search with another algorithm
Try it out.
You've got Collections.sort() which will do the heavy lifting for you, and Collections.binarySearch() which will allow you to find the elements in the sorted list.
When you search the unsorted list, you have to look through half the elements on average before you find the one you're looking for. When you do that 60,000 times on a list of 935,000 elements, that works out to about
935,000 * 1/2 * 60,000 = 28,050,000,000 operations
If you sort the list first (using mergesort) it will take about n * log(n) operations. Then you can use binary search to find elements in log(n) lookups for each of the 60,000 elements in your shorted list. That's about
935,000 * log(935,000) + log(935,000) * 60,000 = 19,735,434 operations
It should be a lot faster if you sort the list first, then use a search algorithm that takes advantage of the sorted list.
What would work quite well is to sort both lists and then iterate over both at the same time.
Use collections.sort() to sort the lists.
You start with an index for each sorted list and just basically walk straight through it. You start with the first element on the short list and compare it to the first elements of the long list. If you reach an element on the long list with an higher 7 digit number than the current number in the short list, increment your index of the short list. This way there is no need to check elements twice.
But actually, since you want to find the intersection of two lists, you might be better off just using longList.retainAll(shortList) to just get the intersection of the two lists. Then you can perform whatever you want on both of the lists in about O(1) since there is no need to actually find anything.
You can sort both lists and compare them element by element incrementing first or second index (i and j in the example below) as needed:
List<Comparable> first = ....
List<Comparable> second = ...
Collections.sort(first);
Collections.sort(second);
int i = 0;
int j = 0;
while (i < first.size() && j < second.size()) {
if (first.get(i).compareTo(second.get(j)) == 0) {
// Action for equals
}
if (first.get(i).compareTo(second.get(j)) > 0) {
j++;
} else {
i++;
}
}
The complexity of this code is O(n log(n)) where n is the biggest list size.
Question: Given: a list of integers (duplicates are allowed); and integer N. Remove the duplicates from the list and find the N-th largest element in the modified list. Implement at least two different solutions to find N-th largest element with O(N*log(N)) average time complexity in Big-O notation, where N is the number of elements in the list.
According to my understanding i can use Merge Sort, Heap Sort, Quick sort on the provided integer list with duplicates to find the N-th largest element with O(N*log(N)) average time complexity in Big-O notation. Is that correct ?
Also, what about duplicates in the list do i just add an extra line of code in the above mentioned algorithm to remove duplicates will that not affect the O(N*log(N)) average time complexity because Merge Sort, Heap Sort, Quick sort will only sort the list not delete duplicates.
I am not looking for any code but just tips and ideas about how to proceed with the question ? I am using Java also is there any predefined classed/methods in java that i can use to accomplish the task rather than me coding Merge Sort, Heap Sort, Quick sort on my own.
My aim is to complete the task keeping in mind O(N*log(N)) average time complexity.
You first sort the list and then you remove illiterate over it to find all the duplicates. A second methode might be to loop over all elements and keep a tree structure in which for each element you first check if it's already present in the tree structure O(log(n)) and if is remove it else add it to the tree structure O(log(n)). This makes the whole algorithm O(n(log(n))).
Since you added java as a tag:
List<Integer> listWithoutDuplicates = new ArrayList(new TreeSet(originalList));
This will construct a new, sorted list without duplicates. Regarding time complexity, this adds up to O(n*log n) (the complexity of constructing a binary balanced tree with n elements). As a bonus, you can try and replace TreeSet with LinkedHashSet to preserve elements' order (among duplicates, I think the last one is kept).
About the last part, removing an element by index is straight forward afterwards.
Of course, if it is a homework question, you may need to implement your own red-black tree to get the same result.
I'm considering the best possible way to remove duplicates from an (Unsorted) array of strings - the array contains millions or tens of millions of stringz..The array is already prepopulated so the optimization goal is only on removing dups and not preventing dups from initially populating!!
I was thinking along the lines of doing a sort and then binary search to get a log(n) search instead of n (linear) search. This would give me nlogn + n searches which althout is better than an unsorted (n^2) search = but this still seems slow. (Was also considering along the lines of hashing but not sure about the throughput)
Please help! Looking for an efficient solution that addresses both speed and memory since there are millions of strings involved without using Collections API!
Until your last sentence, the answer seemed obvious to me: use a HashSet<String> or a LinkedHashSet<String> if you need to preserve order:
HashSet<String> distinctStrings = new HashSet<String>(Arrays.asList(array));
If you can't use the collections API, consider building your own hash set... but until you've given a reason why you wouldn't want to use the collections API, it's hard to give a more concrete answer, as that reason could rule out other answers too.
ANALYSIS
Let's perform some analysis:
Using HashSet. Time complexity - O(n). Space complexity O(n). Note, that it requires about 8 * array size bytes (8-16 bytes - a reference to a new object).
Quick Sort. Time - O(n*log n). Space O(log n) (the worst case O(n*n) and O(n) respectively).
Merge Sort (binary tree/TreeSet). Time - O(n * log n). Space O(n)
Heap Sort. Time O(n * log n). Space O(1). (but it is slower than 2 and 3).
In case of Heap Sort you can through away duplicates on fly, so you'll save a final pass after sorting.
CONCLUSION
If time is your concern, and you don't mind allocating 8 * array.length bytes for a HashSet - this solution seems to be optimal.
If space is a concern - then QuickSort + one pass.
If space is a big concern - implement a Heap with throwing away duplicates on fly. It's still O(n * log n) but without additional space.
I would suggest that you use a modified mergesort on the array. Within the merge step, add logic to remove duplicate values. This solution is n*log(n) complexity and could be performed in-place if needed (in this case in-place implementation is a bit harder than with normal mergesort because adjacent parts could contain gaps from the removed duplicates which also need to be closed when merging).
For more information on mergesort see http://en.wikipedia.org/wiki/Merge_sort
Creating a hashset to handle this task is way too expensive. Demonstrably, in fact the whole point of them telling you not to use the Collections API is because they don't want to hear the word hash. So that leaves the code following.
Note that you offered them binary search AFTER sorting the array: that makes no sense, which may be the reason your proposal was rejected.
OPTION 1:
public static void removeDuplicates(String[] input){
Arrays.sort(input);//Use mergesort/quicksort here: n log n
for(int i=1; i<input.length; i++){
if(input[i-1] == input[i])
input[i-1]=null;
}
}
OPTION 2:
public static String[] removeDuplicates(String[] input){
Arrays.sort(input);//Use mergesort here: n log n
int size = 1;
for(int i=1; i<input.length; i++){
if(input[i-1] != input[i])
size++;
}
System.out.println(size);
String output[] = new String[size];
output[0]=input[0];
int n=1;
for(int i=1;i<input.length;i++)
if(input[i-1]!=input[i])
output[n++]=input[i];
//final step: either return output or copy output into input;
//here I just return output
return output;
}
OPTION 3: (added by 949300, based upon Option 1). Note that this mangles the input array, if that is unacceptable, you must make a copy.
public static String[] removeDuplicates(String[] input){
Arrays.sort(input);//Use mergesort/quicksort here: n log n
int outputLength = 0;
for(int i=1; i<input.length; i++){
// I think equals is safer, but are nulls allowed in the input???
if(input[i-1].equals(input[i]))
input[i-1]=null;
else
outputLength++;
}
// check if there were zero duplicates
if (outputLength == input.length)
return input;
String[] output = new String[outputLength];
int idx = 0;
for ( int i=1; i<input.length; i++)
if (input[i] != null)
output[idx++] = input[i];
return output;
}
Hi do you need to put them into an array. It would be faster to use a collection using hash values like a set. Here each value is unique because of its hash value.
If you put all entries to a set collection type. You can use the
HashSet(int initialCapacity)
constructor to prevent memory expansion while run time.
Set<T> mySet = new HashSet<T>(Arrays.asList(someArray))
Arrays.asList() has runtime O(n) if memory do not have to be expanded.
Since this is an interview question, I think they want you to come up with your own implementation instead of using the set api.
Instead of sorting it first and compare it again, you can build a binary tree and create an empty array to store the result.
The first element in the array will be the root.
If the next element is equals to the node, return. -> this remove the duplicate elements
If the next element is less than the node, compare it to the left, else compare it to the right.
Keep doing the above the 2 steps until you reach to the end of the tree, then you can create a new node and know this has no duplicate yet.
Insert this new node value to the array.
After the traverse of all elements of the original array, you get a new copy of an array with no duplicate in the original order.
Traversing takes O(n) and searching the binary tree takes O(logn) (insertion should only take O(1) since you are just attaching it and not re-allocating/balancing the tree) so the total should be O(nlogn).
O.K., if they want super speed, let's use the hashcodes of the Strings as much as possible.
Loop through the array, get the hashcode for each String, and add it to your favorite data structure. Since you aren't allowed to use a Collection, use a BitSet. Note that you need two, one for positives and one for negatives, and they will each be huge.
Loop again through the array, with another BitSet. True means the String passes. If the hashcode for the String does not exist in the Bitset, you can just mark it as true. Else, mark it as possibly duplicate, as false. While you are at it, count how many possible duplicates.
Collect all the possible duplicates into a big String[], named possibleDuplicates. Sort it.
Now go through the possible duplicates in the original array and binary Search in the possibleDuplicates. If present, well, you are still stuck, cause you want to include it ONCE but not all the other times. So you need yet another array somewhere. Messy, and I've got to go eat dinner, but this is a start...
I'm comparing insertion sort with a quicksort. I've figured out why the qsort is slower on an almost sorted array but I cant figure out why the insersion sort is so much quicker, surely it still has to compare nearly as many elements in the array?
The reason that insertion sort is faster on sorted or nearly-sorted arrays is that when it's inserting elements into the sorted portion of the array, it barely has to move any elements at all. For example, if the sorted portion of the array is 1 2 and the next element is 3, it only has to do one comparison -- 2 < 3 -- to determine that it doesn't need to move 3 at all. As a result, insertion sort on an already-sorted array is linear-time, since it only needs to do one comparison per element.
It depends on several factors. Insertion sort will be more efficient that quick sort on small data sets. It also depends on your backing list implementation (LinkedList or ArrayList). Lastly, it also depends on whether there is any kind of ordering to the input data. For example, if your input data is already sorted in the opposite order and you use a LinkedList, the insertion sort will be blazing fast.
Quicksort has its worst case (O(n^2) time complexity) on already sorted arrays (see quicksort entry on Wikipedia).
Depending on the choice of the pivot element, this can be alleviated somewhat, but at the same time, the best case for insertion sort is exactly the pre-sorted case (it has O(n) time complexity for such inputs), so it is going to beat quicksort for this particular use case.
Insertion Sort algorithm runs in a linear time on an sorted array, because for a sorted array , Insertion sort has to do only one comparison , that is for the previous element , as a result Insertion sorting on sorted array has to move linearly , getting to complexity as - O(n).
As Compared to Quick Sort where complexity is O(n2)
Sorted inputs are best case for insertion sort (O(n)) and worst case for quick sort (O(n^2)).
It all has to do with complexity which is determined by number of key comparison which is the basic operation in both algorithm.
So when you see the algorithm of both you find that in insertion sort you only have n comparison as when we insert an element we only compare with left element and its done. On the other hand in case of quick sort u have to keep comparing your pivot element to all your left element and your array kind of decrease by a constant one factor leading to approx n^2 comparison.