Talking about performance and resources.
Which one is faster and needs less resources when adding and editing values, ArrayList or TreeMap?
Or is there any type of data that can beat these two? (it has to be able to somehow make the data sorted)
ArrayLists and TreeMaps are different types of structures meant for different things. It would be helpful to know what you are planning on using these structures for.
ArrayList
Allows duplicates (it is a List)
Amortized O(1) to add to the end of the list
O(n) to insert anywhere else in the list
O(1) to access
O(n) to remove
TreeMap
Does not allow duplicate keys (it is a Map)
O(logn) to insert
O(logn) to access
O(logn) to remove
Sorting the ArrayList will take O(nlogn) time (after inserting everything), while the TreeMap will always be sorted.
EDIT
You've mentioned that you are working with records retrieved from a database. Since they are coming from a DB, I would assume that they are already sorted - in which case you should just insert them one by one into an ArrayList.
Depends on what you need.
If we are talking about sorted data, as the ArrayList is a list/array, if it is sorted, you could get a value in O(log n) speed. To insert, though, it is O(n), because you potentially have to move the whole array when inserting a new element.
The TreeMap data structure is implemented as a red-black tree, which both the insertion time and search time are O(log n).
So, in a nutshell:
Data Structure Insertion Speed Search Speed
ArrayList (sorted) O(n) O(log n)
TreeMap O(log n) O(log n)
I'd definitely go with a TreeMap. It has the additional bonus of having it all ready to go right now (you'd have to implement some of the code yourself to make the ArrayList work).
Note:
If you don't get the O(n) (called big-Oh) notation, think of it as a formula for the amount seconds needed when the structure has n elements. So, if you had an ArrayList with 1000 elements (n=1000), it'd take 3 seconds (log 1000 = 3) to find an item there. It would take 1000 seconds to insert a new element though.
The TreeMap, on the other hand, would take 3 seconds to both search and insert.
ArrayList requires only O(1) to add and editing a value since you're only accessing it with the index. However, searching an element in a ArrayList is O(n) because you have to get through at most all the elements of the list.
But you have to be aware of what data structures you will implement. It's hard to choose betwen an ArrayList and a TreeMap since they are not really for the same purpose(a Map doesn't allow duplicates but not an ArrayList, etc.).
Here's two sheet which describes the differences between both (and others type of collections too).
List and Map :
Added Set too :
Related
Which would be faster when using a LinkedList? I haven't studied sorting and searching yet. I was thinking adding them directly in a sorted manner would be faster, as manipulating the nodes and pointers afterward intuitively seems to be very expensive. Thanks.
In general, using a linked list has many difficulties and is very expensive .
i)In the usual case, if you want to add values to the sorted linked list, you must go through the following steps:
If Linked list is empty then make the node as head and return it.
If the value of the node to be inserted is smaller than the value of the head node, then insert the node at the start and make it head.
In a loop, find the appropriate node after
which the input node is to be inserted.
To find the appropriate node start from the head,
keep moving until you reach a node GN who's value is greater than
the input node. The node just before GN is the appropriate node .
Insert the node after the appropriate node found in step 3.
the Time Complexity in this case is O(n). ( for each element )
but don't forget to sort the linked list before adding other elements in a sorted manner ; This means you have to pay extra cost to sort the linked list.
ii ) but if you want to add elements to the end of the linked list and then sort them the situation depends on the way of sorting ! for example you can use merge sort that has O(n*log n) time complexity ! and even you can use insertion sort with O(n^2) time complexity .
note : How to sort is a separate issue .
As you have already noticed! It is not easy to talk about which method is faster and it depends on the conditions of the problem, such as the number of elements of the link list and the elements needed to be added, or whether the cost of initial sorting is considered or not!
You have presented two options:
Keep the linked list sorted by inserting each new node at its sorted location.
Insert each new node at the end (or start) of the linked list and when all nodes have been added, sort the linked list
The worst case time complexity for the first option occurs when the list has each time to be traversed completely to find the position where a new node is to be inserted. In that case the time complexity is O(1+2+3+...+n) = O(n²)
The worst time complexity for the second option is O(n) for inserting n elements and then O(nlogn) for sorting the linked list with a good sorting algorithm. So in total the sorting algorithm determines the overall worst time complexity, i.e. O(nlogn).
So the second option has the better time complexity.
Merge sort has a complexity of O(nlogn) and is suitable for linked list.
If your data has a limited range, you can use radix sort and achieve O(kn) complexity, where k is log(range size).
In practice it is better to insert elements in a vector (dynamic array), then sort the array, and finally turn the array into a list. This will almost certainly give better running times.
I have a list suppose
listA=[679,890,907,780,5230,781]
and want to delete some elements that is existed in another
listB=[907,5230]
in minimum time complexity?
I can do this problem by using two "for loops" means O(n2) time complexity, but I want to reduce this complexity to O(nlog(n)) or O(n)?
Is it possible?
It's possible - if one of the lists is sorted. Assuming that list A is sorted and list B is unsorted, with respective dimensions M and N, the minimum time complexity to remove all of list B's elements from list A will be O((N+M)*log(M)). The way you can achieve this is by binary search - each lookup for an element in list A takes O(log(M)) time, and there are N lookups (one for each element in list B). Since it takes O(M*log(M)) time to sort A, it's more efficient for huge lists to sort and then remove all elements, with total time complexity O((N+M)*log(M)).
On the other hand, if you don't have a sorted list, just use Collection.removeAll, which has a time complexity of O(M*N) in this case. The reason for this time complexity is that removeAll does (by default) something like the following pseudocode:
public boolean removeAll(Collection<?> other)
for each elem in this list
if other contains elem
remove elem from this list
Since contains has a time complexity of O(N) for lists, and you end up doing M iterations, this takes O(M*N) time in total.
Finally, if you want to minimize the time complexity of removeAll (with possibly degraded real world performance) you can do the following:
List<Integer> a = ...
List<Integer> b = ...
HashSet<Integer> lookup = new HashSet<>(b);
a.removeAll(lookup);
For bad values of b, the time to construct lookup could take up to time O(N*log(N)), as shown here (see "pathologically distributed keys"). After that, invoking removeAll will take O(1) for contains over M iterations, taking O(M) time to execute. Therefore, the time complexity of this approach is O(M + N*log(N)).
So, there are three approaches here. One provides you with time complexity O((N+M)*log(M)), another provides you with time complexity O(M*N), and the last provides you with time complexity O(M + N*log(N)). Considering that the first and last approaches are similar in time complexity (as log tends to be very small even for large numbers), I would suggest going with the naive O(M*N) for small inputs, and the simplest O(M + N*log(N)) for medium-sized inputs. At the point where your memory usage starts to suffer from creating a HashSet to store the elements of B (very large inputs), I would finally switch to the more complex O((N+M)*log(M)) approach.
You can find an AbstractCollection.removeAll implementation here.
Edit:
The first approach doesn't work so well for ArrayLists - removing from the middle of list A takes O(M) time, apparently. Instead, sort list B (O(N*log(N))), and iterate through list A, removing items as appropriate. This takes O((M+N)*log(N)) time and is better than the O(M*N*log(M)) that you end up with when using an ArrayList. Unfortunately, the "removing items as appropriate" part of this algorithm requires that you create data to store the non-removed elements in O(M), as you don't have access to the internal data array of list A. In this case, it's strictly better to go with the HashSet approach. This is because (1) the time complexity of O((M+N)*log(N)) is actually worse than the time complexity for the HashSet method, and (2) the new algorithm doesn't save on memory. Therefore, only use the first approach when you have a List with O(1) time for removal (e.g. LinkedList) and a large amount of data. Otherwise, use removeAll. It's simpler, often faster, and supported by library designers (e.g. ArrayList has a custom removeAll implementation that allows it to take linear instead of quadratic time using negligible extra memory).
You can achieve this in following way
Sort second list( you can sort any one of the list. Here I have sorted second list). After that loop through first array and for each element of first array, do binary search in second array.
You can sort list by using Collections.sort() method.
Total complexity:-
For sorting :- O(mLogm) where m is size of second array. I have sorted only second array.
For removing :- O(nLogm)
I have a class User that has 3 objects(?) I'm not sure of the terminology.
an (int) ID code
an (int) date that the user was created
and a (string) name
I am trying to create a methods that
Add a user to my data structure (1)
return the name of a user based on their ID number (2)
return a full list of all users sorted by date (3)
return a list of users who's name has a certain string, sorted by date (4)
return a list of users who joined before a certain date (5)
I have made 10 arrays based on the years joined (2004-2014) and then sort the elements in the arrays again by the date (sorting by month then day)
Am I correct in thinking that this means methods (3) and (5) have O(1) time complexity but that (1),(4) and (2) have O(N)?
Also is there another data structure/method that I can use to have O(1) for all my methods? I tried repeatably to come up with one but the inclusion of method (2) has me stumped.
Comparison based sorting is always O(N*log N), and adding to already sorted container is O(log N). To avoid that, you need buckets, the way you now have them for years now. This trades memory for execution time.
(1) can be O(1) only if you only add things to HashMaps.
(2) can be O(1) if you have a separate HashMap which maps the ID to the user.
(3) of course is O(N) because you need to list all N users, but if you have a HashMap where key is the day and value is list of users, you only need to go through constant (10 years * 365 days + 2) number of arrays to list all users. So O(N) with (1) still being O(1). Assuming users are unsorted withing a single day.
(4) Basically same as 3 for simple implementation, just with less printing. You could perhaps speed up the best case with a trie or something, but it'll still be O(N) because it will be certain % of N which will match.
(5) Same as (3), you just can break out sooner.
You have to make compromises, and make informed guesses about the most common operations. There is a good chance that the most common operation will be to find a user by ID. A HashMap is thus the ideal structure for that: it's O(1), as well as the insertion into the map.
To implement the list of users sorted by date, and the list of users before a given date, the best data structure would be a TreeSet. The TreeSet is already sorted (so your 3rd operation would be O(1), and can return a sorted subset in O(log(n)) time.
But maintaining a TreeSet in parallel to a HashMap is cumbersome, error-prone, and costs memory. And insertion complexity would become O(log(N)). If these aren't common operations, you could simply iterate over the entries and filter them/sort them. Definitely forget about your 10 arrays. This is unmaintainable, and a TreeSet is a much better and easier solution, not limited to 10 years.
The list of users by name containing a given string is O(N), whatever the data structure you choose.
A HashMap doesn't sort anything; its primary purpose/advantage is to offer near-O(1) lookups (which you can use for lookups by ID). If you need to sort something, you should have the class implement Comparable, add it to a List, and use Collections.sort to sort the elements.
As for efficiency:
O(1)
O(1)
O(n log n)
O(n) (at least)
O(n) (or less, but I think it will have to be O(n))
Also is there another data structure/method that I can use to have O(1) for all my methods?
Three HashMaps. This isn't a database, so you have to maintain "indices" by hand.
I am optimizing an implementation of a sorted LinkedList.
To insert an element I traverse the list and compare each element until I have the correct index, and then break loop and insert.
I would like to know if there is any other way that I can insert the element at the same time as traversing the list to reduce the insert from O(n + (n capped at size()/2)) to O(n).
A ListIterator is almost what Im after because of its add() method, but unfortunately in the case where there are elements in the list equal to the insert, the insert has to be placed after them in the list. To implement this ListIterator needs a peek() which it doesnt have.
edit: I have my answer, but will add this anyway since a lot of people havent understood correctly:
I am searching for an insertion point AND inserting, which combined is higher than O(n)
You may consider a skip list, which is implemented using multiple linked lists at varying granularity. E.g. the linked list at level 0 contains all items, level 1 only links to every 2nd item on average, level 2 to only every 4th item on average, etc.... Searching starts from the top level and gradually descends to lower levels until it finds an exact match. This logic is similar to a binary search. Thus search and insertion is an O(log n) operation.
A concrete example in the Java class library is ConcurrentSkipListSet (although it may not be directly usable for you here).
I'd favor Péter Török suggestion, but I'd still like to add something for the iterator approach:
Note that ListIterator provides a previous() method to iterate through the list backwards. Thus first iterate until you find the first element that is greater and then go to the previous element and call add(...). If you hit the end, i.e. all elements are smaller, then just call add(...) without going back.
I have my answer, but will add this anyway since a lot of people havent understood correctly: I am searching for an insertion point AND inserting, which combined is higher than O(n).
Your require to maintain a collection of (possibly) non-unique elements that can iterated in an order given by a ordering function. This can be achieved in a variety of ways. (In the following I use "total insertion cost" to mean the cost of inserting a number (N) of elements into an initially empty data structure.)
A singly or doubly linked list offers O(N^2) total insertion cost (whether or not you combine the steps of finding the position and doing the insertion!), and O(N) iteration cost.
A TreeSet offers O(NlogN) total insertion cost and O(N) iteration cost. But has the restriction of no duplicates.
A tree-based multiset (e.g. TreeMultiset) has the same complexity as a TreeSet, but allows duplicates.
A skip-list data structure also has the same complexity as the previous two.
Clearly, the complexity measures say that a data structure that uses a linked list performs the worst as N gets large. For this particular group of requirements, a well-implemented tree-based multiset is probably the best, assuming there is only one thread accessing the collection. If the collection is heavily used by many threads (and it is a set), then a ConcurrentSkipListSet is probably better.
You also seem to have a misconception about how "big O" measures combine. If I have one step of an algorithm that is O(N) and a second step that is also O(N), then the two steps combined are STILL O(N) .... not "more than O(N)". You can derive this from the definition of "big O". (I won't bore you with the details, but the Math is simple.)
I am trying to find out which structure would be the fastest, because i have a problem with my code. I have a large amount of data to store. Maybe thousands of nodes are needed. My first thought was to create an ArrayList and then i started adding integers to use them later. This ArrayList will be useful for fast accessing bytes in Random Access Files. So, i put the first node, which represents a pointer to the first entry in a Random Access File. Then, i put the second, at the same way, and so on..
My program takes too long when putting the integers in the ArrayList.
Could i fix my code using a faster structure??
Yes,
you can use LinkedList, your arraylist have amartized O(1) insertion but when you have a huge arraylist and it needs to be resized, it will take long time to allocate a new arraylist, copy the current elements and continue.
eg: if you have 10 million elements in your arraylist and it s full, when you insert one more, your arraylist has to double the size of current and then copy all the elements to the new one. this is very expensive operation.
If you use LinkedList you have O(1) insertion but not random access. So if you want to access to nth element, you will have to traverse all the nodes up to n. it takes O(n). but do you really do that.
So linkedlist is you option. possibly, doubly linked list.
If you want fast reads as well as fast insertion, you can use Dictionary, HashMap. You have O(1) writes and reads, if and only if you have a perfect hashing.
But again, internally, HashTable, Dictionary uses arrays so once your dictionary grows too large, you will have the same problem, moreover, each time your array expands, your hashcodes are re-calculated.
You can use Trees with logn writes and reads.
You can use Skiplist with logn writes and reads.
An ArrayList is clearly not the fastest thing here, because the ArrayList does not contain int but the Integer wrapper types. Therefore a plain array int[] intArray have the lowest overhead.
On the other hand: if you can omit the list/array completely and do the calculations instantly, this would save some more overhead. This leads in the direction to not do microoptimization but to think about the problem and perhaps use a completely different algorithm.