Random access for HashMap keys - java

I need to randomly access keys in a HashMap. Right now, I am using Set's toArray() method on the Set that HashMap's keySet() returns, and casting it as a String[] (my keys are Strings). Then I use Random to pick a random element of the String array.
public String randomKey() {
String[] keys = (String[]) myHashMap.keySet().toArray();
Random rand = new Random();
return keyring[rand.nextInt(keyring.length)];
}
It seems like there ought to be a more elegant way of doing this!
I've read the following post, but it seems even more convoluted than the way I'm doing it. If the following solution is the better, why is that so?Selecting random key and value sets from a Map in Java

There is no facility in a HashMap to return an entry without knowing the key so, if you want to use only that class, what you have is probably as good a solution as any.
Keep in mind however that you're not actually restricted to using a HashMap.
If you're going to be reading this collection far more often than writing it, you can create your own class which contains both a HashMap of the mappings and a different collection of the keys that allows random access (like a Vector).
That way, you won't incur the cost of converting the map to a set then an array every time you read, it will only happen when necessary (adding or deleting items from your collection).
Unfortunately, a Vector allows multiple keys of the same value so you would have to defend against that when inserting (to ensure fairness when selecting a random key). That will increase the cost of insertion.
Deletion would also be increased cost since you would have to search for the item to remove from the vector.
I'm not sure there's an easy single collection for this purpose. If you wanted to go the whole hog, you could have your current HashMap, a Vector of the keys, and yet another HashMap mapping the keys to the vector indexes.
That way, all operations (insert, delete, change, get-random) would be O(1) in time, very efficient in terms of time, perhaps less so in terms of space :-)
Or there's a halfway solution that still uses a wrapper but creates a long-lived array of strings whenever you insert, change or delete a key. That way, you only create the array when needed and you still amortise the costs. Your class then uses the hashmap for efficient access with a key, and the array for random selection.
And the change there is minimal. You already have the code for creating the array, you just have to create your wrapper class which provides whatever you need from a HashMap (and simply passes most calls through to the HashMap) plus one extra function to get a random key (using the array).
Now, I'd only consider using those methods if performance is actually a problem though. You can spend untold hours making your code faster in ways that don't matter :-)
If what you have is fast enough, it's fine.

Why not use the Collections.shuffle method, saved to a variable and simply pop one off the top as required.
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#shuffle(java.util.List)

You could avoid copying the whole keyset into a temporary data structure, by first getting the size, choosing the random index and then iterating over the keyset the appropriate number of times.
This code would need to be synchronized to avoid concurrent modifications.

If you really just want any element of a set, this will work fine.
String s = set.iterator().next();
If you are unsure whether there is an element in the set, use:
String s;
Iterator<String> it = set.iterator();
if (it.hasNext()) {
s = it.next();
}
else {
// set was empty
}

Related

Is there a better DS than a HashMap for storing a list of items if I frequently use the contains method on it?

I have a list of numbers. In my program I would frequently be checking if a certain number is part of my list. If it is not part of my list, I add it to the list, otherwise I do nothing. I have found myself using a hashmap to store the items instead of an arraylist.
void add(Map<Integer, Integer> mp, int item){
if(!mp.containsKey(item)){
mp.put(item, 1);
}
}
As you can see above, I put anything as the value, since I would not be using the values.
I have tested this process to be a lot faster than using an arraylist. (Also, containsKey() for hashmap is O(1) while contains() for arraylist is O(n))
Although it works well for me, it feels awkward for the simple reason that it is not the right data structure. Is this a good practice? Is there a better DS that I can use? Is there not any list that utilizes hashing to store values?
I have a list of numbers. In my program I would frequently be checking if a certain number is part of my list. If it is not part of my list, I add it to the list, otherwise I do nothing.
You are describing a set. From the Javadoc, a java.util.Set is:
A collection that contains no duplicate elements.
Further, the operation you are describing is add():
Adds the specified element to this set if it is not already present.
In code, you would create a new Set (this example uses a HashSet):
Set<Integer> numbers = new HashSet<>();
Then any time you encounter a number you wish to keep track of, just call add(). If x is already present, the set will remain unchanged and will not throw any errors – you don't need to be careful about adding things, just add anything you see, and the set sort of "filters" out the duplicates for you.
numbers.add(x);
It's beyond your original question, but there are various things you can do with the data once you've populated a set - check if other numbers are present/absent, iterate the numbers in the set, etc. The Javadoc shows which functionality is available to use.
An alternative solution from the standard library is a java.util.BitSet. This is - in my opinion - only an option if the values of item are not too big, and if they are relatively close together. If your values are not near each other (and not starting near to zero), then it might be worthwhile looking for third party solutions that offers sparse bit sets or other sparse data structures.
You can use a bit set like:
BitSet bits = new BitSet();
void add(int item) {
bits.set(item);
}
And as suggested in the comments by Eritrean, you can also use a Set (e.g. HashSet). Internally, a HashSet uses a HashMap, so it will perform similar to your current solution, but it does away with having to put sentinel values in yourself (you just add or remove the item itself).
As an added benefit, if you use Collection<Integer> as the type of parameters/fields in your code, you can easily switch between using an ArrayList or an HashSet and test it without having to change code all over the place.

What are the benefits of using Map over ArrayList of costume class

I am learning Java now and I am learning about different kinds of collections, so far I learned about LinkedList, ArrayList and Array[].
Now I've been introduced to Hash types of collections, HashSet and HashMap, and I didn't quite understand why there are useful, because the list of commands that they support is quietly limited, also, they are sorted in a random order and I need to Override the equal and HashKey methods in order to make it work right with class.
Now, what I don't understand is the benefits over the hassle of using these types instead of ArrayList of a costume class.
I mean, what Map is doing is connecting 2 objects as 1, but wouldn't it just be better to create a class that contains this 2 objects as parameters, and have getters to modify and use them?
If the benefit is that this Hash objects can only contain 1 object of the same name, wouldn't it just be easier to make the ArrayList check that the type is not already there before adding it?
So far I learned to choose when to use LinkedList, ArrayList or Array[] by the rule of "if it's really simple, use Array[], if it's a bit more complex use ArrayList (for example to hold collection of certain class), and if the list is dynamic with a lot of objects inside that need to change order according to removing or adding a new one in the middle or go back and forth within the list then use LinkedList.
But I couldn't understand when to prefer HashMap or HashSet, and I would be really glad if you could explain it to me.
Let me help you out here...
Hashed collections are the most efficient to add, search and remove data, since they hash the key (in HashMap) or the element (in HashSet) to find the place where they belong in a single step.
The concept of hashing is really simple. It is the process of representing an object as a number that can work as it´s id.
For example, if you have a string in Java like String name = "Jeremy";, and you print its hashcode: System.out.println(name.hashCode());, you will see a big number there (-2079637766), that was created using that string object values (in this string object, it's characters), that way, that number can be used as an Id for that object.
So the Hashed collections like the ones mentioned above, use this number to use it as an array index to find the elements in no-time. But obviously is too big to use it as an array index for a possible small array. So they need to reduce that number so it fits in the range of the array size. (HashMap and HashSet use arrays to store their elements).
The operation that they use to reduce that number is called hashing, and is something like this: Math.abs(-2079637766 % arrayLength);.
It's not like that exactly, it's a bit more complex, but this is to simplify.
Let's say that arrayLength = 16;
The % operator will reduce that big number to a number smaller than 16, so that it can be fit in the array.
That is why a Hashed collection will not allow duplicate, because if you try to add the same object or an equivalent one (like 2 strings with the same characters), it will produce the same hashcode and will override whatever value is in the result index.
In your question, you mentioned that if you are worried about duplicates items in an ArrayList, we can just check if the item is there before inserting it, so this way we don't need to use a HashSet. But that is not a good idea, because if you call the method list.contains(elem); in an ArrayList, it needs to go one by one comparing the elements to see if it's there. If you have 1 million elements in the ArrayList, and you check if an element is there, but it is not there, the ArrayList iterated over 1 million elements, that is not good. But with a HashSet, it would only hashed the object and go directly where it is supposed to be in the array and check, doing it in just 1 step, instead of 1 million. So you see how efficient a HashSet is compared to an ArrayList.
The same happens with a HashMap of size 1 million, that it will only take 1 single step to check if a key is there, and not 1 million.
The same thing happens when you need to add, find and remove an element, with the hashed collections it will do all that in a single step (constant time, doesn't depend on the size of the map), but that varies for other structures.
That's why it is really efficient and widely used.
Main Difference between an ArrayList and a LinkedList:
If you want to find the element at place 500 in an ArrayList of size 1000, you do: list.get(500); and it will do that in a single step, because an ArrayList is implemented with an array, so with that 500, it goes directly where the element is in the array.
But a LinkedList is not implemented with an array, but with objects pointing to each other. This way, they need to go linearly and counting from 0, one by one until they get to the 500, which is not really efficient compared to the 1 single step of the ArrayList.
But when you need to add and remove elements in an ArrayList, sometimes the Array will need to be recreated so more elements fit in it, increasing the overhead.
But that doesn't happen with the LinkedList, since no array has to be recreated, only the objects (nodes) have to be re-referenced, which is done in a single step.
So an ArrayList is good when you won't be deleting or adding a lot of elements on the structure, but you are going to read a lot from it.
If you are going to add and remove a lot of elements, then is better a linked list since it has less work to do with those operations.
Why you need to implement the equals(), hashCode() methods for user-defined classes when you want to use those objects in HashMaps, and implement Comparable interface when you want to use those objects with TreeMaps?
Based on what I mentioned earlier for HashMaps, is possible that 2 different objects produce the same hash, if that happens, Java will not override the previous one or remove it, but it will keep them both in the same index. That is why you need to implement hashCode(), so you make sure that your objects will not have a really simple hashCode that can be easily duplicated.
And the reason why is recommended to override the equals() method is that if there is a collision (2 or more objects sharing the same hash in a HashMap), then how do you tell them apart? Well, asking the equals() method of those 2 objects if they are the same. So if you ask the map if it contains a certain key, and in that index, it finds 3 elements, it asks the equals() methods of those elements if its equals() to the key that was passed, if so, it returns that one. If you don't override the equals() method properly and specify what things you want to check for equality (like the properties name, age, etc.), then some unwanted overrides inside the HashMap will happen and you will not like it.
If you create your own classes, say, Person, and has properties like name, age, lastName and email, you can use those properties in the equals() method and if 2 different objects are passed but have the same values in your selected properties for equality, then you return true to indicate that they are the same, or false otherwise. Like the class String, that if you do s1.equals(s2); if s1 = new String("John"); and s2 = new String("John");, even though they are different objects in Java Heap Memory, the implementation of String.equals method uses the characters to determine if the objects are equals, and it returns true for this example.
To use a TreeMap with user-defined classes, you need to implement the Comparable interface, since the TreeMap will compare and sort the objects based on some properties, you need to specify by which properties your objects will be sorted. Will your objects be sorted by age? By name? By id? Or by any other property that you would like. Then, when you implement the Comparable interface and override the compareTo(UserDefinedClass o) method, you do your logic and return a positive number if the current object is greater than the o object passed, 0 if they are the same and a negative number if the current object is smaller. That way, the TreeMap will know how to sort them, based on the number returned.
First HashSet. In HashSet, you can easily get whether it contains given element. Let's have a set of people in your class and you want to ask whether a guy is in your class. You can make an array list of strings. And if you want to ask if a guy is in your class, you have to iterate through whole the list until you find him, which might be too slow for longer lists. If you use HashSet instead, the operation is much faster. You calculate the hash of the searched string and then you go directly to the hash, so you don't need to pass so many elements to answer your question. Well, you can also make a workaround to make the ArrayList faster to access for this purpose but this is already prepared.
And now HashMap. Now imagine that you also want to store a score for each person. So now you can use HashMap. You enter the name and you get his score in a short time, without the need of iterating through whole the data structure.
Does it make sense?
Concerning your question:
"But I couldn't understand when to prefer HashMap or HashSet, and I
would be really glad if you could explain it to me"
The HashMap implement the Map interface, to be used for mapping a Key (K) to a value (V) in constant time, and where order doesn't matter, so you can put and retrieve those data efficiently if you now the key.
And HashSet implement the Set interface, but is internanly using and HashMap, its role is to be used as a Set, meaning you're not supposed to retrieve an element, you just check that is in the set or not (mostly).
In HashMap, you can have identical value, while you can't in a Set (because its a property of a Set).
Concerning this question :
If the benefit is that this Hash objects can only contain 1 object of the same name, >wouldn't it just be easier to make the ArrayList check that the type is not already >there before adding it?
When dealing with collection, you have may base you choice of a particular one on the data representation but also on the way you want to access and store those data, how do you access it ? Do you need to sort them ? Because each implemenation may have different complexity (https://en.wikipedia.org/wiki/Time_complexity), it become important.
Using the doc,
For ArrayList:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking).
For HashMap:
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
So it's about the time complexity.
You may choose even more untypical collection for certain problems :).
This has little to do with Java specifically, and the choice depends mostly on performance requirements, but there's a fundamental difference that must be highlighted. Conceptually, Lists are types of collections that keep the order of insertion and may have duplicates, Sets are more like bags of items that have no specific order and no duplicates. Of course, different implementations may find a way around it (like a TreeSet).
First, let's check the difference between ArrayList and LinkedList. A linked list is a set of nodes, where each node contains a value and a link to the next and previous nodes. This makes inserting an element to a linked list a matter of appending a node to the end of the list, which is a quick operation since the memory does not have to be contiguous, as long as a node keeps a reference to the next node. On the other side, accessing a specific element requires transversing the entire list until finding it.
An array list, as the name implies, wraps an array. Accessing elements in an array by using its index is direct access, but inserting an element implies resizing the array to include the new element, so the memory it occupies is contiguous, making writes a bit heavier in this case.
A HashMap works like a dictionary, where for each key there's a value. The behavior of the insertion will mostly depend on how the hashCode and equals functions of the object used as a key are implemented. If the hashCode of two keys is the same, there's a hash collision, so equals will be used to understand if it's the same key or not. If equals is the same, then it's the same key, so the value is replaced. If not, the new value is added to the collection. Accessing and Writing values depends mostly on calculating the hash of the key followed by direct access to the value, making both operations really quick, O(1).
A set is pretty much like a hash map, without the "values" part, thus, it follows the same rules regarding the implementation of hashCode and equals operations for the added value.
It might be handy to study a bit about the Big-O notation and complexity of algorithms. If you are starting with Java, I'd strongly recommend the book Effective Java, by Joshua Bloch.
Hope it helps you dig further.

Java: What collection type should I use for this case?

What I need:
Fastest put/remove, this is used alot.
Iteration, also used frequently.
Holds an object, e.g. Player. remove should be o(1) so maybe hashmap?
No duplicate keys
direct get() is never used, mainly iterating to retrieve data.`
I don't worry about memory, I just want the fastest speed possible even if it's at the cost of memory.
For iteration, nothing is faster than a plain old array. Entries are saved sequentially in memory, so the JVM can get to the next entry simply by adding the length of one entry to the its address.
Arrays are typically a bit of a hassle to deal with compared to maps or lists (e.g: no dictionary-style lookups, fixed length). However, in your case I think it makes sense to go with a one or two dimensional array since the length of the array will not change and dictionary-style lookups are not needed.
So if I understand you correctly you want to have a two-dimensional grid that holds information of which, if any, player is in specific tiles? To me it doesn't sound like you should be removing, or adding things to the grid. I would simply use a two-dimensional array that holds type Player or something similar. Then if no player is in a tile you can set that position to null, or some static value like Player.none() or Tile.empty() or however you'd want to implement it. Either way, a simple two-dimensional array should work fine. :)
The best Collection for your case is a LinkedList. Linked lists will allow for fast iteration, and fast removal and addition at any place in the linked list. For example, if you use an ArrayList, and you can to insert something at index i, then you have to move all the elements from i to the end one entry to the right. The same would happen if you want to remove. In a linked list you can add and remove in constant time.
Since you need two dimensions, you can use linked lists inside of linked lists:
List<List<Tile> players = new LinkedList<List<Tile>>(20);
for (int i = 0; i < 20; ++i){
List<Tile> tiles = new LinkedList<Tile>(20);
for (int j = 0; j < 20; ++j){
tiles.add(new Tile());
}
players.add(tiles);
}
use a map of sets guarantee O(1) for vertices lookup and amortized O(1) complexity edge insertion and deletions.
HashMap<VertexT, HashSet<EdgeT>> incidenceMap;
There is no simple one-size-fits-all solution to this.
For example, if you only want to append, iterate and use Iterator.remove(), there are two obvious options: ArrayList and LinkedList
ArrayList uses less memory, but Iterator.remove() is O(N)
LinkedList uses more memory, but Iterator.remove() is O(1)
If you also want to do fast lookup; (e.g. Collection.contains tests), or removal using Collection.remove, then HashSet is going to be better ... if the collections are likely to be large. A HashSet won't allow you to put an object into the collection multiple times, but that could be an advantage. It also uses more memory than either ArrayList or LinkedList.
If you were more specific on the properties required, and what you are optimizing for (speed, memory use, both?) then we could give you better advice.
The requirement of not allowing duplicates is effectively adding a requirement for efficient get().
Your options are either hash-based, or O(Log(N)). Most likely, hashcode will be faster, unless for whatever reason, calling hashCode() + equals() once is much slower than calling compareTo() Log(N) times. This could be, for instance, if you're dealing with very long strings. Log(N) is not very much, by the way: Log(1,000,000,000) ~= 30.
If you want to use a hash-based data structure, then HashSet is your friend. Make sure that Player has a good fast implementation of hashCode(). If you know the number of entries ahead of time, specify the HashSet size. ( ceil(N/load_factor)+1. The default load factor is 0.75).
If you want to use a sort-based structure, implement an efficient Player.compareTo(). Your choices are TreeSet, or Skip List. They're pretty comparable in terms of characteristics. TreeSet is nice in that it's available out of the box in the JDK, whereas only a concurrent SkipList is available. Both need to be rebalanced as you add data, which may take time, and I don't know how to predict which will be better.

Randomly getting elements in a HashMap or HashSet without looping

I have roughly 420,000 elements that I need to store easily in a Set or List of some kind. The restrictions though is that I need to be able to pick a random element and that it needs to be fast.
Initially I used an ArrayList and a LinkedList, however with that many elements it was very slow. When I profiled it, I saw that the equals() method in the object I was storing was called roughly 21 million times in a very short period of time.
Next I tried a HashSet. What I gain in performance I loose in functionality: I can't pick a random element. HashSet is backed by a HashMap which is backed by an array of HashMap.Entry objects. However when I attempted to expose them I was hindered by the crazy private and package-private visibility of the entire Java Collections Framework (even copying and pasting the class didn't work, the JCF is very "Use what we have or roll your own").
What is the best way to randomly select an element stored in a HashSet or HashMap? Due to the size of the collection I would prefer not to use looping.
IMPORTANT EDIT: I forgot a really important detail: exactly how I use the Collection. I populate the entire Collection at the begging of the table. During the program I pick and remove a random element, then pick and remove a few more known elements, then repeat. The constant lookup and changing is what causes the slowness
There's no reason why an ArrayList or a LinkedList would need to call equals()... although you don't want a LinkedList here as you want quick random access by index.
An ArrayList should be ideal - create it with an appropriate capacity, add all the items to it, and then you can just repeatedly pick a random number in the appropriate range, and call get(index) to get the relevant value.
HashMap and HashSet simply aren't suitable for this.
If ALL you need to do is get a large collection of values and pick a random one, then ArrayList is (literally) perfect for your needs. You won't get significantly faster (unless you went directly to primitive array, where you lose benefits of abstraction.)
If this is too slow for you, it's because you're using other operations as well. If you update your question with ALL the operations the collection must service, you'll get a better answer.
If you don't call contains() (which will call equals() many times), you can use ArrayList.get(randomNumber) and that will be O(1)
You can't do it with a HashMap - it stores the objects internally in an array, where the index = hashcode for the object. Even if you had that table, you'd need to guess which buckets contain objects. So a HashMap is not an option for random access.
Assuming that equals() calls are because you sort out duplicates with contains(), you may want to keep both a HashSet (for quick if-already-present lookup) and an ArrayList (for quick random access). Or, if operations don't interleave, build a HashSet first, then extract its data with toArray() or transform it into ArrayList with constructor of the latter.
If your problems are due to remove() call on ArrayList, don't use it and instead:
if you remove not the last element, just replace (with set()) the removed element with the last;
shrink the list size by 1.
This will of course screw up element order, but apparently you don't need it, judging by description. Or did you omit another important detail?

Is there a better way to calculate the values in a Map?

I just finished the main part of the current data structures project, and am working on collecting the statistics. One requirement is that a count of all the references within the TreeMap be recorded.
This Map contains a 31,000+ nodes where a String is mapped to a TreeSet of indeterminate size.
I need to traverse the map and keep a running count of the number of items in the set.
Originally my idea was this:
Set<String> keySet= lyricWords.keySet();
Iterator<String> iter= keySet.iterator();
String current= iter.next();
while (iter.hasNext){
runCount+= lyricWords.get(current).size();
}
The runtime for this is far too long to be acceptable. Is there a more efficient way to do this on the final structure? I could keep a count as the map is built, but the professor wants the numbers to be based on the final structure itself.
I'm not sure. But, probably, you have infinitive loop. Try:
runCount+= iter.next().size();
for (Map.Entry<String, TreeSet> e: lyricWords.entrySet()) {
runCount+= e.getValue().size();
}
I dont see a problem with keeping a count as the map is built.
The count will be correct at the end, and you wont have to incur the cost of iterating through the entire thing again.
I think that the tree can and should keep track of its size
This isn't of much use to you since this is an assignment you're working on, but this is an example where a data structure specifically designed for mapping keys to multiple values shows how much better it is than a Map<T, Collection<V>>.
Guava's Multimap collection type keeps track of the total number of entries it contains, so if you were using a TreeMultimap<String, Foo> rather than a TreeMap<String, TreeSet<Foo>> you could just call multimap.size() to get the number you're looking for.
By the way, the Multimap implementations store a running total of the number of entries which is updated when entries are added to or removed from it. You might be able to do this by doing some fancy stuff with subclassing the TreeMap and wrapping the TreeSets that are added to it, but it would be quite challenging to make it all work properly I think.

Categories