Simple Collection With String Objects which allows search in 0(1) operation - java

I have simple collection of string objects might be around 10 elements ,
but i use this collection in production environment such that the we search for a given string in that collection millions of tiimes ,
what is the best collection or data structure we can use to get the best results so that seach operation can be performed in 0(1) time
we can use HashMap here but the order of search there is in constant time not 0(1) i want to make sure that search is 0(1).
Our data structure must return true if present , else false if not present

Use a HashSet<String> structure. The contains() operation has a complexity of O(1).

Constant time is O(1). HashMap is fine. (Or HashSet, depending on whether you need a Set or a Map.)
If your set is immutable, Guava's ImmutableSet will reduce memory footprint by a factor of ~3 (and probably give you a small constant factor of improved speed).

If you can't use HashSet/HashMap as previously suggested, you could write a Radix Tree implementation.

Related

Is there space overhead to using a hash map instead of an array?

I was trying to implement a trie and read in an example implementation that it would be more space efficient to use a small array of size 26 to store the children because then you wouldn't have to waste space with a HashMap (the code was in Java, if that makes a difference)
But wouldn't a map be more space efficient since you don't necessarily need to store all 26 values? Or is a HashMap object that contains Character objects as keys just more space because a simple int[] type does not use extra space in the background that the implementation for these more complex objects would use?
Just wanting to check if maybe this person was mistaken or if there's some overhead involved in using object types like HashMap that I should be aware of.
A hashmap stores keys and values, so if you were to implement a trie using a hashmap, you would be storing not only the values, but also the keys. If you use an array, then the key is actually the index of the value in the array, so you do not have to store it anywhere.
Besides that, hashmaps are less space efficient than arrays because they always have a load factor which is smaller than one, which is the same as saying that they keep more entries allocated than necessary due to the way they work. I am not expanding on this, because it is not related to your issue, but if you are curious, search for hashmap load factor.
This is a classic programming compromise. In this case, using a HashMap will take more space, but that may be a price you're willing to pay for improved performance. Or it might not. Depends on your problem.

Reducing time complexity

I have list of objects which contains a statusEnum. Now, I want to return all those objects which falls under specific list of provided statuses.
A simple solution is to loop on list of objects and then another for loop on provided list of statusEnums ... This would work however, It would make the time complexity of O(n)^2. Is there a way i could reduce it to O(n) ?
I can't change the map. The only other solution i could think of is maintaining another map based on statusEnums as the key but then it would increase the space complexity a lot.
EDIT
I had hashMap of objects (which i said as a list)
Here is the code which i came up with for others ...
public List<MyObjects> getObjectsBasedOnCriteria (List<ObjectStatus> statuses, String secondCriteria){
EnumSet<ObjectStatus> enumSet = EnumSet.copyOf(statuses);
for (Map.Entry<Long, MyObject> objEntry : myObjs.entrySet()){
MyObjects obj = objEntry.getValue();
if (enumSet.contains(obj.getStatus()) && obj.equals(secondCriteria)){
...
}
}
}
Use an Set to hold statusEnums (probably an EnumSet), and check if each instance's status is in that set using set.contains(object.getStatus()), or whatever.
Lookups in EnumSet and HashSet are O(1), so the solution is linear (assuming just one status per object). EnumSet.contains is more efficient than HashSet.contains with enum values; however, the choice is irrelevant to overall time complexity.
Assuming you have a sane number of statuses, esp if you have a enum of statuses you can use an EnumSet to match the status or a HashMap.
Even if you don't do this the time complexity is O(n * m) where n is the number of entries and m if the number of statuses you are looking for. In general it is assumed that you will have much more records than you have statuses you are checking for.
The number of possible enum values is limited to a few thousand due to a limitation in way Java is compiled so this is always an upper bound for enums.

Why using Hashmap.containsKey run faster considerably than Arrays.binarySearch?

I have two lists of phone numbers. 1st list is a subset of 2nd list. I ran two different algorithms below to determine which phone numbers are contained in both of two lists.
Way 1:
Sortting 1st list: Arrays.sort(FirstList);
Looping 2nd list to find matched element: If Arrays.binarySearch(FistList, 'each of 2nd list') then OK
Way 2:
Convert 1st list into HashMap with key/valus is ('each of 1st list', Boolean.TRUE)
Looping 2nd list to find matched element: If FirstList.containsKey('each of 2nd list') then OK
It results in Way 2 ran within 5 seconds is faster considerably than Way 1 with 39 seconds. I can't understand the reason why.
I appreciate your any comments.
Because hashing is O(1) and binary searching is O(log N).
HashMap relies on a very efficient algorithm called 'hashing' which has been in use for many years and is reliable and effective. Essentially the way it works is to split the items in the collection into much smaller groups which can be accessed extremely quickly. Once the group is located a less efficient search mechanism can be used to locate the specific item.
Identifying the group for an item occurs via an algorithm called a 'hashing function'. In Java the hashing method is Object.hashCode() which returns an int representing the group. As long as hashCode is well defined for your class you should expect HashMap to be very efficient which is exactly what you've found.
There's a very good discussion on the various types of Map and which to use at Difference between HashMap, LinkedHashMap and TreeMap
My shorthand rule-of-thumb is to always use HashMap unless you can't define an appropriate hashCode for your keys or the items need to be ordered (either natural or insertion).
Look at the source code for HashMap: it creates and stores a hash for each added (key, value) pair, then the containsKey() method calculates a hash for the given key, and uses a very fast operation to check if it is already in the map. So most retrieval operations are very fast.
Way 1:
Sorting: around O(nlogn)
Search: around O(logn)
Way 2:
Creating HashTable: O(n) for small density (no collisions)
Contains: O(1)

Java limited map

I am looking for some kind of map that would have fixed size, for example 20 entries, but not only, I want to keep only the lowest values, lets say I'm evaluating some kind of function and inserting results in my map ( I need map because I have to keep Key-Value ) but I want to have only 20 lowest results. I was thinking about sorting and then removing last element but I need to do it for milions of records, so sorting everytime I add value is not efficient, maybe there is some better way?
Thanks for help.
There is no built in data structure for this in java. You can try looking for one in the guava library. Otherwise think about using a LinkedHashMap or a TreeMap for this. You can wrap it in your own class to take care of the limiting.
If you care about efficiency be advised that TreeMap is in fact a red-black tree internally so put() has the time complexity of log(n).

How to store a huge static/immutable String Set that needs only the 'contains' operations in Java

I have a huge list of Strings (8 to 10 million). They are Wikipedia page titles. After creating a Set-like data structure over these strings, the only operation I need is boolean contains(String str).
The straightforward way is to just use a HashSet, TreeSet or something alike (in Java, for example).
Is there a data structure more fitted for this use case?
PS: We can't use bloom filters, we don't want to deal with false positives.
If you care more about saving space than constant-time contains(), and there is a lot of overlap in the stored strings, a trie might help. In that case, contains(str) would be O(n) where n is the length of str.

Categories