Way to make this HashMap more efficient - java

I have a class User that has 3 objects(?) I'm not sure of the terminology.
an (int) ID code
an (int) date that the user was created
and a (string) name
I am trying to create a methods that
Add a user to my data structure (1)
return the name of a user based on their ID number (2)
return a full list of all users sorted by date (3)
return a list of users who's name has a certain string, sorted by date (4)
return a list of users who joined before a certain date (5)
I have made 10 arrays based on the years joined (2004-2014) and then sort the elements in the arrays again by the date (sorting by month then day)
Am I correct in thinking that this means methods (3) and (5) have O(1) time complexity but that (1),(4) and (2) have O(N)?
Also is there another data structure/method that I can use to have O(1) for all my methods? I tried repeatably to come up with one but the inclusion of method (2) has me stumped.

Comparison based sorting is always O(N*log N), and adding to already sorted container is O(log N). To avoid that, you need buckets, the way you now have them for years now. This trades memory for execution time.
(1) can be O(1) only if you only add things to HashMaps.
(2) can be O(1) if you have a separate HashMap which maps the ID to the user.
(3) of course is O(N) because you need to list all N users, but if you have a HashMap where key is the day and value is list of users, you only need to go through constant (10 years * 365 days + 2) number of arrays to list all users. So O(N) with (1) still being O(1). Assuming users are unsorted withing a single day.
(4) Basically same as 3 for simple implementation, just with less printing. You could perhaps speed up the best case with a trie or something, but it'll still be O(N) because it will be certain % of N which will match.
(5) Same as (3), you just can break out sooner.

You have to make compromises, and make informed guesses about the most common operations. There is a good chance that the most common operation will be to find a user by ID. A HashMap is thus the ideal structure for that: it's O(1), as well as the insertion into the map.
To implement the list of users sorted by date, and the list of users before a given date, the best data structure would be a TreeSet. The TreeSet is already sorted (so your 3rd operation would be O(1), and can return a sorted subset in O(log(n)) time.
But maintaining a TreeSet in parallel to a HashMap is cumbersome, error-prone, and costs memory. And insertion complexity would become O(log(N)). If these aren't common operations, you could simply iterate over the entries and filter them/sort them. Definitely forget about your 10 arrays. This is unmaintainable, and a TreeSet is a much better and easier solution, not limited to 10 years.
The list of users by name containing a given string is O(N), whatever the data structure you choose.

A HashMap doesn't sort anything; its primary purpose/advantage is to offer near-O(1) lookups (which you can use for lookups by ID). If you need to sort something, you should have the class implement Comparable, add it to a List, and use Collections.sort to sort the elements.
As for efficiency:
O(1)
O(1)
O(n log n)
O(n) (at least)
O(n) (or less, but I think it will have to be O(n))

Also is there another data structure/method that I can use to have O(1) for all my methods?
Three HashMaps. This isn't a database, so you have to maintain "indices" by hand.

Related

Want to delete some elements from a list that is existed in another list

I have a list suppose
listA=[679,890,907,780,5230,781]
and want to delete some elements that is existed in another
listB=[907,5230]
in minimum time complexity?
I can do this problem by using two "for loops" means O(n2) time complexity, but I want to reduce this complexity to O(nlog(n)) or O(n)?
Is it possible?
It's possible - if one of the lists is sorted. Assuming that list A is sorted and list B is unsorted, with respective dimensions M and N, the minimum time complexity to remove all of list B's elements from list A will be O((N+M)*log(M)). The way you can achieve this is by binary search - each lookup for an element in list A takes O(log(M)) time, and there are N lookups (one for each element in list B). Since it takes O(M*log(M)) time to sort A, it's more efficient for huge lists to sort and then remove all elements, with total time complexity O((N+M)*log(M)).
On the other hand, if you don't have a sorted list, just use Collection.removeAll, which has a time complexity of O(M*N) in this case. The reason for this time complexity is that removeAll does (by default) something like the following pseudocode:
public boolean removeAll(Collection<?> other)
for each elem in this list
if other contains elem
remove elem from this list
Since contains has a time complexity of O(N) for lists, and you end up doing M iterations, this takes O(M*N) time in total.
Finally, if you want to minimize the time complexity of removeAll (with possibly degraded real world performance) you can do the following:
List<Integer> a = ...
List<Integer> b = ...
HashSet<Integer> lookup = new HashSet<>(b);
a.removeAll(lookup);
For bad values of b, the time to construct lookup could take up to time O(N*log(N)), as shown here (see "pathologically distributed keys"). After that, invoking removeAll will take O(1) for contains over M iterations, taking O(M) time to execute. Therefore, the time complexity of this approach is O(M + N*log(N)).
So, there are three approaches here. One provides you with time complexity O((N+M)*log(M)), another provides you with time complexity O(M*N), and the last provides you with time complexity O(M + N*log(N)). Considering that the first and last approaches are similar in time complexity (as log tends to be very small even for large numbers), I would suggest going with the naive O(M*N) for small inputs, and the simplest O(M + N*log(N)) for medium-sized inputs. At the point where your memory usage starts to suffer from creating a HashSet to store the elements of B (very large inputs), I would finally switch to the more complex O((N+M)*log(M)) approach.
You can find an AbstractCollection.removeAll implementation here.
Edit:
The first approach doesn't work so well for ArrayLists - removing from the middle of list A takes O(M) time, apparently. Instead, sort list B (O(N*log(N))), and iterate through list A, removing items as appropriate. This takes O((M+N)*log(N)) time and is better than the O(M*N*log(M)) that you end up with when using an ArrayList. Unfortunately, the "removing items as appropriate" part of this algorithm requires that you create data to store the non-removed elements in O(M), as you don't have access to the internal data array of list A. In this case, it's strictly better to go with the HashSet approach. This is because (1) the time complexity of O((M+N)*log(N)) is actually worse than the time complexity for the HashSet method, and (2) the new algorithm doesn't save on memory. Therefore, only use the first approach when you have a List with O(1) time for removal (e.g. LinkedList) and a large amount of data. Otherwise, use removeAll. It's simpler, often faster, and supported by library designers (e.g. ArrayList has a custom removeAll implementation that allows it to take linear instead of quadratic time using negligible extra memory).
You can achieve this in following way
Sort second list( you can sort any one of the list. Here I have sorted second list). After that loop through first array and for each element of first array, do binary search in second array.
You can sort list by using Collections.sort() method.
Total complexity:-
For sorting :- O(mLogm) where m is size of second array. I have sorted only second array.
For removing :- O(nLogm)

Which of the given sorting algorithms will be fastest when run on an array that happens to already be in order?

Question:
Which sorting algorithm will be fastest when run on an array that happens to already be in order?
(A) It is not possible to know which will be fastest.
(B) selection sort
(C) insertion sort
(D) binary sort
(E) All of these algorithms will run at the same speed.
I have been doing some research for a homework assignment and have been getting conflicting answers. Some places say it is insertion, while some say both are equal and yet others say it can't be determined. Very confused right now, would appreciate some help.
C Insertion sort
Is normally always the fastest and easiest to implement when an array is already nearly or completely sorted. As we have less operations.
Selection sort will still do pair wise comparison and binary sort will also be slightly slower.
I would say that insertion sort because:
Insertion sort is a simple sorting algorithm, it builds the final sorted array one item at a time. It is much less efficient on large lists than other sort algorithms.
Advantages of Insertion Sort:
1) It is very simple.
2) It is very efficient for small data sets.
3) It is stable; i.e., it does not change the relative order of elements with equal keys.
4) In-place; i.e., only requires a constant amount O(1) of additional memory space.
Insertion sort iterates through the list by consuming one input element at each repetition, and growing a sorted output list. On a repetition, insertion sort removes one element from the input data, finds the location it belongs within the sorted list, and inserts it there. It repeats until no input elements remain.
I read this at- http://www.java2novice.com/java-sorting-algorithms/insertion-sort/

treemap vs arraylist - perfomance & resources while iterating/adding/editing values

Talking about performance and resources.
Which one is faster and needs less resources when adding and editing values, ArrayList or TreeMap?
Or is there any type of data that can beat these two? (it has to be able to somehow make the data sorted)
ArrayLists and TreeMaps are different types of structures meant for different things. It would be helpful to know what you are planning on using these structures for.
ArrayList
Allows duplicates (it is a List)
Amortized O(1) to add to the end of the list
O(n) to insert anywhere else in the list
O(1) to access
O(n) to remove
TreeMap
Does not allow duplicate keys (it is a Map)
O(logn) to insert
O(logn) to access
O(logn) to remove
Sorting the ArrayList will take O(nlogn) time (after inserting everything), while the TreeMap will always be sorted.
EDIT
You've mentioned that you are working with records retrieved from a database. Since they are coming from a DB, I would assume that they are already sorted - in which case you should just insert them one by one into an ArrayList.
Depends on what you need.
If we are talking about sorted data, as the ArrayList is a list/array, if it is sorted, you could get a value in O(log n) speed. To insert, though, it is O(n), because you potentially have to move the whole array when inserting a new element.
The TreeMap data structure is implemented as a red-black tree, which both the insertion time and search time are O(log n).
So, in a nutshell:
Data Structure Insertion Speed Search Speed
ArrayList (sorted) O(n) O(log n)
TreeMap O(log n) O(log n)
I'd definitely go with a TreeMap. It has the additional bonus of having it all ready to go right now (you'd have to implement some of the code yourself to make the ArrayList work).
Note:
If you don't get the O(n) (called big-Oh) notation, think of it as a formula for the amount seconds needed when the structure has n elements. So, if you had an ArrayList with 1000 elements (n=1000), it'd take 3 seconds (log 1000 = 3) to find an item there. It would take 1000 seconds to insert a new element though.
The TreeMap, on the other hand, would take 3 seconds to both search and insert.
ArrayList requires only O(1) to add and editing a value since you're only accessing it with the index. However, searching an element in a ArrayList is O(n) because you have to get through at most all the elements of the list.
But you have to be aware of what data structures you will implement. It's hard to choose betwen an ArrayList and a TreeMap since they are not really for the same purpose(a Map doesn't allow duplicates but not an ArrayList, etc.).
Here's two sheet which describes the differences between both (and others type of collections too).
List and Map :
Added Set too :

Is efficiency of java's TreeMap based on number of keys or values?

Since Java uses a red-black tree to implement the TreeMap class, is the efficiency of put() and get() lg(N), where N = number of distinct keys, or N = number of insertions/retrievals you plan to do?
For example, say I want to use a
TreeMap<Integer, ArrayList<String>>
to store the following data:
1 million <1, "bob"> pairs and 1 million <2, "jack"> pairs (the strings get inserted into the arraylist value corresponding to the key)
The final treemap will have 2 keys, with each one storing arraylist of million "bob" or "jack" strings. Is the time efficiency lg(2mil) or lg(2)? I am guessing it's lg(2) since that's how a red-black tree works, but just wanted to check.
Performance of a TreeMap with 2 pairs will behave as N=2, regardless of how many duplicate additions were previously made. There is no "memory" of the excess additions so they cannot possibly produce any overhead.
So yes, you can informally assume that time efficiency is "log 2".
Although it's fairly meaningless as big-O notation is intended to relate to asymptotic efficiency rather than be relevant for small sizes. An O(N^3) algorithm could easily be faster than a O(log N) algorithm for N=2.
For this case, a tree map is lg(n) where n=2 as you describe. There are only 2 values in the map: one arraylist, and another arraylist. No matter what is contained inside those, the map only knows of two values.
While not directly concerned with your question, you may want to consider not using a treemap for this... I mean, how do you plan to access the data stored inside your "bob" or "jack" lists? These are going to be O(n) searches unless you're going to use some kind of binary search on them or something, and the n here is 1 million. If you elaborate more on your end goal, perhaps a more encompassing solution can be achieved.

Java: Inserting into LinkedList efficiently

I am optimizing an implementation of a sorted LinkedList.
To insert an element I traverse the list and compare each element until I have the correct index, and then break loop and insert.
I would like to know if there is any other way that I can insert the element at the same time as traversing the list to reduce the insert from O(n + (n capped at size()/2)) to O(n).
A ListIterator is almost what Im after because of its add() method, but unfortunately in the case where there are elements in the list equal to the insert, the insert has to be placed after them in the list. To implement this ListIterator needs a peek() which it doesnt have.
edit: I have my answer, but will add this anyway since a lot of people havent understood correctly:
I am searching for an insertion point AND inserting, which combined is higher than O(n)
You may consider a skip list, which is implemented using multiple linked lists at varying granularity. E.g. the linked list at level 0 contains all items, level 1 only links to every 2nd item on average, level 2 to only every 4th item on average, etc.... Searching starts from the top level and gradually descends to lower levels until it finds an exact match. This logic is similar to a binary search. Thus search and insertion is an O(log n) operation.
A concrete example in the Java class library is ConcurrentSkipListSet (although it may not be directly usable for you here).
I'd favor Péter Török suggestion, but I'd still like to add something for the iterator approach:
Note that ListIterator provides a previous() method to iterate through the list backwards. Thus first iterate until you find the first element that is greater and then go to the previous element and call add(...). If you hit the end, i.e. all elements are smaller, then just call add(...) without going back.
I have my answer, but will add this anyway since a lot of people havent understood correctly: I am searching for an insertion point AND inserting, which combined is higher than O(n).
Your require to maintain a collection of (possibly) non-unique elements that can iterated in an order given by a ordering function. This can be achieved in a variety of ways. (In the following I use "total insertion cost" to mean the cost of inserting a number (N) of elements into an initially empty data structure.)
A singly or doubly linked list offers O(N^2) total insertion cost (whether or not you combine the steps of finding the position and doing the insertion!), and O(N) iteration cost.
A TreeSet offers O(NlogN) total insertion cost and O(N) iteration cost. But has the restriction of no duplicates.
A tree-based multiset (e.g. TreeMultiset) has the same complexity as a TreeSet, but allows duplicates.
A skip-list data structure also has the same complexity as the previous two.
Clearly, the complexity measures say that a data structure that uses a linked list performs the worst as N gets large. For this particular group of requirements, a well-implemented tree-based multiset is probably the best, assuming there is only one thread accessing the collection. If the collection is heavily used by many threads (and it is a set), then a ConcurrentSkipListSet is probably better.
You also seem to have a misconception about how "big O" measures combine. If I have one step of an algorithm that is O(N) and a second step that is also O(N), then the two steps combined are STILL O(N) .... not "more than O(N)". You can derive this from the definition of "big O". (I won't bore you with the details, but the Math is simple.)

Categories