Consider, if I have to search for a particular row in a table, as per ORM each row is an object I believe. I did not work intensely on JDBC, so generally as a better practice where are these POJO objects collected or holded? In a set or list?
I am trying to find the complexity of searching an element in a List Vs. Set
What I have done?
private void searchSet() {
Set<String> names = new HashSet<>();
names.add("srk");
names.add("lastminute");
names.add("monkey");
for(String x:names){
if(x.equals("monkey")){
System.out.println("caught the name "+x);
}
}
}
private void searchList() {
List<String> names = new ArrayList<>();
names.add("srk");
names.add("lastminute");
names.add("monkey");
for(String x:names){
if(x.equals("monkey")){
System.out.println("caught the name "+x);
}
}
}
I am calculating the time taken to search element in set and list using the following approach.
long startTime,endTime,totalTime;
startTime = System.nanoTime();
endTime = System.nanoTime();
totalTime = endTime - startTime;
Now, I have the statistics which are hereunder
System.out.println("Time taken to search an element in list : "+totalTime);//for list - 614324
System.out.println("Time taken to search an element in set : "+totalTime);//for set - 757359
Based on these stats can conclude that its faster to search an element in List than set?
Which is a better collection to store database record objects, for searching. What is the complexity of searching an element in a List Vs Set. in a generic sense?
Data structures don't have complexities, algorithms have. (Note that data structures usually come with the complexities of their basic operations, which are tiny algorithms themselves.) In your case, you implemented the find algorithm yourself for both containers, and you did it as a linear search, which is O(n). The speed difference you observed is the result of an ArrayList being simpler and faster to traverse than a HashSet, i.e. the algorithm has the same complexity, but the constant factor is smaller.
Second, you have I/O within the functions you want to time. This will usually completely dominate any actual operations you perform and make your benchmark useless.
Third, you're looking for complexity and you wrote a benchmark. That's just wrong. You can get a hint for complexity by having a benchmark and plotting the results for different input sizes in a graph, but to really learn the complexity, you have to analyze the algorithm, not run it.
Fourth, List and Set in Java aren't data structures, they are interfaces. The data structures you have chosen are ArrayList (a version of the contiguous array data structure implementing the List interface) and HashSet (a version of the hash table data structure implementing the Set interface). So you need to look at those.
For an array, unless it's sorted, the find algorithm takes linear time, because you have no option other than traversing the whole thing.
For a hash table, which is optimized for lookup, the find algorithm is still technically O(n) in the worst case, but in the common case will be O(1). However, you have to actually use the optimized find algorithm (offered by Set.contains) in order to exploit this - a linear search over HashSet is no better (and actually worse) than a linear search over an ArrayList.
There are already method contain() given in both the collections then why are you again traversing ? complexity for list is O(n) and for set it is O(1) which is constant.
List Implementation code:
https://referencesource.microsoft.com/#PresentationFramework/src/Framework/System/Windows/Documents/List.cs,eabc7101897ec6e6
Set Implementation Code:
https://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs,50c894a3f7ad7bd0
Data Structure Time Complexity:
https://www.bigocheatsheet.com/
Useful book: Introduction to the design and analysis of algorithms by Anany Levitin
The first two links demonstrate the inner implementation for the Set class and the List, basically both of them are implemented using the Array data structure type.
The third link demonstrate each data structure complexity for different operations.
If you wish to measure the complexity for the two different code (Set, List) we could
Use Time complexity for algorithm analysis by looking at the most operation compensating for the most of time the algorithm takes to solve a problem.
2.Set up a sum expressing the number of times the algorithm’s basic operation
is executed
Using standard formulas and rules of sum manipulation, either find a closed-
form formula for the count or, at the very least, establish its order of growth.
Related
I have list of objects which contains a statusEnum. Now, I want to return all those objects which falls under specific list of provided statuses.
A simple solution is to loop on list of objects and then another for loop on provided list of statusEnums ... This would work however, It would make the time complexity of O(n)^2. Is there a way i could reduce it to O(n) ?
I can't change the map. The only other solution i could think of is maintaining another map based on statusEnums as the key but then it would increase the space complexity a lot.
EDIT
I had hashMap of objects (which i said as a list)
Here is the code which i came up with for others ...
public List<MyObjects> getObjectsBasedOnCriteria (List<ObjectStatus> statuses, String secondCriteria){
EnumSet<ObjectStatus> enumSet = EnumSet.copyOf(statuses);
for (Map.Entry<Long, MyObject> objEntry : myObjs.entrySet()){
MyObjects obj = objEntry.getValue();
if (enumSet.contains(obj.getStatus()) && obj.equals(secondCriteria)){
...
}
}
}
Use an Set to hold statusEnums (probably an EnumSet), and check if each instance's status is in that set using set.contains(object.getStatus()), or whatever.
Lookups in EnumSet and HashSet are O(1), so the solution is linear (assuming just one status per object). EnumSet.contains is more efficient than HashSet.contains with enum values; however, the choice is irrelevant to overall time complexity.
Assuming you have a sane number of statuses, esp if you have a enum of statuses you can use an EnumSet to match the status or a HashMap.
Even if you don't do this the time complexity is O(n * m) where n is the number of entries and m if the number of statuses you are looking for. In general it is assumed that you will have much more records than you have statuses you are checking for.
The number of possible enum values is limited to a few thousand due to a limitation in way Java is compiled so this is always an upper bound for enums.
I have two lists of phone numbers. 1st list is a subset of 2nd list. I ran two different algorithms below to determine which phone numbers are contained in both of two lists.
Way 1:
Sortting 1st list: Arrays.sort(FirstList);
Looping 2nd list to find matched element: If Arrays.binarySearch(FistList, 'each of 2nd list') then OK
Way 2:
Convert 1st list into HashMap with key/valus is ('each of 1st list', Boolean.TRUE)
Looping 2nd list to find matched element: If FirstList.containsKey('each of 2nd list') then OK
It results in Way 2 ran within 5 seconds is faster considerably than Way 1 with 39 seconds. I can't understand the reason why.
I appreciate your any comments.
Because hashing is O(1) and binary searching is O(log N).
HashMap relies on a very efficient algorithm called 'hashing' which has been in use for many years and is reliable and effective. Essentially the way it works is to split the items in the collection into much smaller groups which can be accessed extremely quickly. Once the group is located a less efficient search mechanism can be used to locate the specific item.
Identifying the group for an item occurs via an algorithm called a 'hashing function'. In Java the hashing method is Object.hashCode() which returns an int representing the group. As long as hashCode is well defined for your class you should expect HashMap to be very efficient which is exactly what you've found.
There's a very good discussion on the various types of Map and which to use at Difference between HashMap, LinkedHashMap and TreeMap
My shorthand rule-of-thumb is to always use HashMap unless you can't define an appropriate hashCode for your keys or the items need to be ordered (either natural or insertion).
Look at the source code for HashMap: it creates and stores a hash for each added (key, value) pair, then the containsKey() method calculates a hash for the given key, and uses a very fast operation to check if it is already in the map. So most retrieval operations are very fast.
Way 1:
Sorting: around O(nlogn)
Search: around O(logn)
Way 2:
Creating HashTable: O(n) for small density (no collisions)
Contains: O(1)
I need Structure (Arraylist, LinkedList, etc) that is very fast for this case:
While the structure is not empty I search the structure for elements that satisfy a condition , lets say k, remove the elements that satisfy k and start over for another condition lets say k+1.
e.g.:
for (int i = 1 ; i <= 1000000; i++) {
structure.add(i);
}
d = 2;
while (!structure.isEmpty()) {
for(int boom : structure.clone) {
if (boom % d == 2) {
structure.remove(boom);
}
d++;
}
}
If the elements are primitives, then the fastest structure will most probably be a specialized primitive collection (e.g., trove). Following references for boxed primitives is a nearly sure cache miss and this probably dominates the costs.
I wouldn't suggest a LinkedList for the same reason: It's dead slow due to cache misses.
If the order is unimportant, than an ArrayList is perfect. Instead of removing an element, replace it by the last one and remove the last array element. This is an O(1) operation and doesn't suffer from the bad spatial locality.
If the order is important, you can build your own ArrayList-like structure. Instead of removing an element, you mark it for removal e.g. in a BitSet or in a boolean[]. Finally you perform the removal in one sweep by moving all elements to their right position and adjusting the length. The optimized loop will most probably look similar to CharMatcher.removeFrom loop.
A simpler solution would be to use an ArrayList and copy all surviving elements to another one. I'd bet it'd beat the LinkedList hands down. As a minor GC-friedly optimization you can work with two lists.
LinkedList should be fastest for this case. Use the iterator explicitly (structure.iterator()) and call the remove method of the iterator instead of calling structure.remove(element)!
I don't know your exact use case, but here's one note.
If you have your predicates P1 .. PN pre-compiled, available, and if you are not modifying the contents of the collection and if your predicates are not dependent on each other, you might want to create a composite predicate, like bundle up N predicates in some logical order, and then in only one iteration over your collection perform the filtering method.
As for data structure, I'd think if it like this:
If my filtering predicates will be totally arbitrary, then a list should be OK to use.
In some more specific cases with very limited and strict value sets, you might consider a tree-like or a graph-like structure, where you could have some master nodes which would denote that property "property1" has value "value1". In case you wanted to drop all items where "property1" value is "value1" you could tell that master node to remove all his children (and that they should detach themselves from any other parent master nodes they might have).
Sorted List data structure
If you construct the lists yourself you can consider using a sorted data-structure. It will give you best search performance( log n complexity so it is very fast).
Linked List data structure
LinkedList gives you constant time element removal but random access doesn't have constant complexity (is slow).
You will have to benchmark if a LinkedList or a sorted list would be faster for your scenario.
If your elements are ints, I suppose bit set would be the fastest data structure for this task. Iteration would be slightly slower than through array list (even not standard java.util.ArrayList, only primitive specialization), but remove ops cost nearly nothing, while removes from any array list are quite expensive.
Note, you can gain much by working directly with long[] as bit set and performing bitwise operations by hand, because java.util.BitSet is not very performance-focused. But, of cause, start with BitSet.
I have simple collection of string objects might be around 10 elements ,
but i use this collection in production environment such that the we search for a given string in that collection millions of tiimes ,
what is the best collection or data structure we can use to get the best results so that seach operation can be performed in 0(1) time
we can use HashMap here but the order of search there is in constant time not 0(1) i want to make sure that search is 0(1).
Our data structure must return true if present , else false if not present
Use a HashSet<String> structure. The contains() operation has a complexity of O(1).
Constant time is O(1). HashMap is fine. (Or HashSet, depending on whether you need a Set or a Map.)
If your set is immutable, Guava's ImmutableSet will reduce memory footprint by a factor of ~3 (and probably give you a small constant factor of improved speed).
If you can't use HashSet/HashMap as previously suggested, you could write a Radix Tree implementation.
I have a class along the lines of:
public class Observation {
private String time;
private double x;
private double y;
//Constructors + Setters + Getters
}
I can choose to store these objects in any type of collection (Standard class or 3rd party like Guava). I have stored some example data in an ArrayList below, but like I said I am open to any other type of collection that will do the trick. So, some example data:
ArrayList<Observation> ol = new ArrayList<Observation>();
ol.add(new Observation("08:01:23",2.87,3.23));
ol.add(new Observation("08:01:27",2.96,3.17));
ol.add(new Observation("08:01:27",2.93,3.20));
ol.add(new Observation("08:01:28",2.93,3.21));
ol.add(new Observation("08:01:30",2.91,3.23));
The example assumes a matching constructor in Observation. The timestamps are stored as String objects as I receive them as such from an external source but I am happy to convert them into something else. I receive the observations in chronological order so I can create and rely on a sorted collection of observations. The timestamps are NOT unique (as can be seen in the example data) so I cannot create a unique key based on time.
Now to the problem. I frequently need to find one (1) observation with a time equal or nearest to a certain time, e.g if my time was 08:01:29 I would like to fetch the 4th observation in the example data and if the time is 08:01:27 I want the 3rd observation.
I can obviously iterate through the collection until I find the time that I am looking for, but I need to do this frequently and at the end of the day I may have millions of observations so I need to find a solution where I can locate the relevant observations in an efficient manner.
I have looked at various collection-types including ones where I can filter the collections with Predicates but I have failed to find a solution that would return one value, as opposed to a subset of the collection that fulfills the "<="-condition. I am essentially looking for the SQL equivalent of SELECT * FROM ol WHERE time <= t LIMIT 1.
I am sure there is a smart and easy way to solve my problem so I am hoping to be enlightened. Thank you in advance.
Try TreeSet providing a comparator that compares the time. It mantains an ordered set and you can ask for TreeSet.floor(E) to find the greatest min (you should provide a dummy Observation with the time you are looking for). You also have headSet and tailSet for ordered subsets.
It has O(log n) time for adding and retrieving. I think is very suitable for your needs.
If you prefer a Map you can use a TreeMap with similar methods.
Sort your collection (ArrayList will probably work best here) and use BinarySearch which returns an integer index of either a match of the "closest" possible match, ie it returns an...
index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size(),
Have the Observation class implement Comparable and use a TreeSet to store the objects, which will keep the elements sorted. TreeSet implements SortedSet, so you can use headSet or tailSet to get a view of the set before or after the element you're searching for. Use the first or last method on the returned set to get the element you're seeking.
If you are stuck with ArrayList, but can keep the elements sorted yourself, use Collections.binarySearch to search for the element. It returns a positive number if the exact element is found, or a negative number that can be used to determine the closest element. http://download.oracle.com/javase/1.4.2/docs/api/java/util/Collections.html#binarySearch(java.util.List,%20java.lang.Object)
If you are lucky enough to be using Java 6, and the performance overhead of keeping a SortedSet is not a big deal for you. Take a look at TreeSet ceiling, floor, higher and lower methods.