How to select a random key from a HashMap in Java? - java

I'm working with a large ArrayList<HashMap<A,B>>, and I would repeatedly need to select a random key from a random HashMap (and do some stuff with it). Selecting the random HashMap is trivial, but how should I select a random key from within this HashMap?
Speed is important (as I need to do this 10000 times and the hashmaps are large), so just selecting a random number k in [0,9999] and then doing .next() on the iterator k times, is really not an option. Similarly, converting the HashMap to an array or ArrayList on every random pick is really not an option. Please, read this before replying.
Technically I feel that this should be possible, since the HashMap stores its keys in an Entry[] internally, and selecting at random from an array is easy, but I can't figure out how to access this Entry[]. So any ideas to access the internal Entry[] are more than welcome. Other solutions (as long as they don't consume linear time in the hashmap size) are also welcome of course.
Note: heuristics are fine, so if there's a method that excludes 1% of the elements (e.g. because of multi-filled buckets) that's no problem at all.

from top of my head
List<A> keysAsArray = new ArrayList<A>(map.keySet())
Random r = new Random()
then just
map.get(keysAsArray.get(r.nextInt(keysAsArray.size()))

I managed to find a solution without performance loss. I will post it here since it may help other people -- and potentially answer several open questions on this topic (I'll search for these later).
What you need is a second custom Set-like data structure to store the keys -- not a list as some suggested here. Lists-like data structures are to expensive to remove items from. The operations needed are adding/removing elements in constant time (to keep it up-to-date with the HashMap) and a procedure to select the random element. The following class MySet does exactly this
class MySet<A> {
ArrayList<A> contents = new ArrayList();
HashMap<A,Integer> indices = new HashMap<A,Integer>();
Random R = new Random();
//selects random element in constant time
A randomKey() {
return contents.get(R.nextInt(contents.size()));
}
//adds new element in constant time
void add(A a) {
indices.put(a,contents.size());
contents.add(a);
}
//removes element in constant time
void remove(A a) {
int index = indices.get(a);
contents.set(index,contents.get(contents.size()-1));
indices.put(contents.get(index),index);
contents.remove((int)(contents.size()-1));
indices.remove(a);
}
}

You need access to the underlying entry table.
// defined staticly
Field table = HashMap.class.getDeclaredField("table");
table.setAccessible(true);
Random rand = new Random();
public Entry randomEntry(HashMap map) {
Entry[] entries = (Entry[]) table.get(map);
int start = rand.nextInt(entries.length);
for(int i=0;i<entries.length;i++) {
int idx = (start + i) % entries.length;
Entry entry = entries[idx];
if (entry != null) return entry;
}
return null;
}
This still has to traverse the entries to find one which is there so the worst case is O(n) but the typical behaviour is O(1).

Sounds like you should consider either an ancillary List of keys or a real object, not a Map, to store in your list.

As #Alberto Di Gioacchino pointed out, there is a bug in the accepted solution with the removal operation. This is how I fixed it.
class MySet<A> {
ArrayList<A> contents = new ArrayList();
HashMap<A,Integer> indices = new HashMap<A,Integer>();
Random R = new Random();
//selects random element in constant time
A randomKey() {
return contents.get(R.nextInt(contents.size()));
}
//adds new element in constant time
void add(A item) {
indices.put(item,contents.size());
contents.add(item);
}
//removes element in constant time
void remove(A item) {
int index = indices.get(item);
contents.set(index,contents.get(contents.size()-1));
indices.put(contents.get(index),index);
contents.remove(contents.size()-1);
indices.remove(item);
}
}

I'm assuming you are using HashMap as you need to look something up at a later date?
If not the case, then just change your HashMap to an Array/ArrayList.
If this is the case, why not store your objects in a Map AND an ArrayList so you can look up randomly or by key.
Alternatively, could you use a TreeMap instead of HashMap? I don't know what type your key is but you use TreeMap.floorKey() in conjunction with some key randomizer.

After spending some time, I came to the conclusion that you need to create a model which can be backed by a List<Map<A, B>> and a List<A> to maintain your keys. You need to keep the access of your List<Map<A, B>> and List<A>, just provide the operations/methods to the caller. In this way, you will have the full control over implementation, and the actual objects will be safer from external changes.
Btw, your questions lead me to,
Why does the java.util.Set<V> interface not provide a get(Object o) method?, and
Bimap: I was trying to be clever but, of course, its values() method also returns Set.
This example, IndexedSet, may give you an idea about how-to.
[edited]
This class, SetUniqueList, might help you if you decide to create your own model. It explicitly states that it wraps the list, not copies. So, I think, we can do something like,
List<A> list = new ArrayList(map.keySet());
SetUniqueList unikList = new SetUniqueList(list, map.keySet);
// Now unikList should reflect all the changes to the map keys
...
// Then you can do
unikList.get(i);
Note: I didn't try this myself. Will do that later (rushing to home).

Since Java 8, there is an O(log(N)) approach with O(log(N)) additional memory: create a Spliterator via map.entrySet().spliterator(), make log(map.size()) trySplit() calls and choose either the first or the second half randomly. When there are say less than 10 elements left in a Spliterator, dump them into a list and make a random pick.

If you absolutely need to access the Entry array in HashMap, you can use reflection. But then your program will be dependent on that concrete implementation of HashMap.
As proposed, you can keep a separate list of keys for each map. You would not keep deep copies of the keys, so the actual memory denormalisation wouldn't be that big.
Third approach is to implement your own Map implementation, the one that keeps keys in a list instead of a set.

How about wrapping HashMap in another implementation of Map? The other map maintains a List, and on put() it does:
if (inner.put(key, value) == null) listOfKeys.add(key);
(I assume that nulls for values aren't permitted, if they are use containsKey, but that's slower)

Related

How to shuffle a Hashtable in java

I have a constructor in a class, where I initialise a hashtable, consisting of an Integer and a list of Strings. I fill the keys with numbers from 0 to whatever is needed and their corresponding values are Vectors that I create in the constructor beforehand. What I need is to rearrange them randomly (just shuffle them in any way possible), so that when I use getQuest() I will get a random key-value pair that has happened to be on the last spot of the hashtable and delete it at the same time.
public class Quest {
private Hashtable<Integer, List<String>> quests;
public Quest()
{
this.quests = new Hashtable<>();
List<String> mushrooms = new Vector<>();
mushrooms.add("Choose a mushroom, but be careful not to be poisoned");
mushrooms.add("mushroom1");
mushrooms.add("mushroom2");
mushrooms.add("mushroom3");
List<String> weapons = new Vector<>();
weapons.add("Choose a weapon that will help you survive");
weapons.add("knife");
weapons.add("rifle");
weapons.add("sword");
this.quests.put(0, mushrooms);
this.quests.put(1, weapons);
}
public List<String> getQuest()
{
return quests.remove(quests.size());
}
}
Also, in case this is not a good idea, is it better to use rand or something like that and get a random pair from somewhere in the hashtable to return (while also deleting it, of course).
You don’t “shuffle” a Hashtable or Map - the elements are not ordered.
That said, an iterator will generally return the same order each time, so in order to change that, put the keys into an ArrayList and use Collections.shuffle on that. Iterate over that shuffled list and access the elements in the table/map via their key.
Use a list instead of a hashtable, and remove an item at a random index.
Also, if doing it your way, this line:
return quests.remove(quests.size());
should be
return quests.remove(quests.size()-1);
You might also want to check if quests.size() > 0...

Comparison Error when Storing values in a List, Boolean Map

I have a fully working version of MineSweeper implemented in Java. However, I am trying to add an additional feature that updates a Map to store the indexes of the locations of the mines within a 2D array. For example, if location [x][y] holds a mine, I am storing a linked list containing x and y, which maps to a boolean that is true to indicate that the space holds a mine. (This feature is seemingly trivial, but I am just doing this to practice with Collections in Java.)
My relevant private instance variables include:
public Class World{ ...
private LinkedList<Integer> index;
private Map<LinkedList<Integer>, Boolean> revealed;
"index" is the list to be stored in the map as the key for each boolean.
In my constructor I have:
public World(){ ...
tileArr = new Tile[worldWidth][worldHeight];
revealed = new TreeMap<LinkedList<Integer>, Boolean>();
index = new LinkedList<Integer>();
... }
Now, in the method in which I place the mines, I have the following:
private void placeBomb(){
int x = ran.nextInt(worldWidth); //Random stream
int y = ran.nextInt(worldHeight); //Random stream
if (!tileArr[x][y].isBomb()){
tileArr[x][y].setBomb(true);
index.add(x); //ADDED COMPONENT
index.add(y);
revealed.put(index, true);
index.remove(x);
index.remove(y); //END OF ADDED COMPONENT
} else placeBomb();
}
Without the marked added component my program runs fine, and I have a fully working game. However, this addition gives me the following error.
Exception in thread "main" java.lang.ClassCastException: java.util.LinkedList
cannot be cast to java.lang.Comparable
If anyone could help point out where this error might be, it would be very helpful! This is solely for additional practice with collections and is not required to run the game.
There are actually about 3 issues here. One that you know about, one that you don't and a third which is just that using LinkedList as a key for a map is clunky.
The ClassCastException happens because TreeMap is a sorted set and requires that every key in it implement the Comparable interface, or else you have to provide a custom Comparator. LinkedList doesn't implement Comparable, so you get an exception. The solution here could be to use a different map, like HashMap, or you could write a custom Comparator.
A custom Comparator could be like this:
revealed = new TreeMap<List<Integer>, Boolean>(
// sort by x value first
Comparator.comparing( list -> list.get(0) )
// then sort by y if both x values are the same
.thenComparing( list -> list.get(1) )
);
(And I felt compelled to include this, which is a more robust example that isn't dependent on specific elements at specific indexes):
revealed = new TreeMap<>(new Comparator<List<Integer>>() {
#Override
public int compare(List<Integer> lhs, List<Integer> rhs) {
int sizeComp = Integer.compare(lhs.size(), rhs.size());
if (sizeComp != 0) {
return sizeComp;
}
Iterator<Integer> lhsIter = lhs.iterator();
Iterator<Integer> rhsIter = rhs.iterator();
while ( lhsIter.hasNext() && rhsIter.hasNext() ) {
int intComp = lhsIter.next().compareTo( rhsIter.next() );
if (intComp != 0) {
return intComp;
}
}
return 0;
}
});
The issue that you don't know about is that you're only ever adding one LinkedList to the map:
index.add(x);
index.add(y);
// putting index in to the map
// without making a copy
revealed.put(index, true);
// modifying index immediately
// afterwards
index.remove(x);
index.remove(y);
This is unspecified behavior, because you put the key in, then modify it. The documentation for Map says the following about this:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.
What will actually happen (for TreeMap) is that you are always erasing the previous mapping. (For example, the first time you call put, let's say x=0 and y=0. Then the next time around, you set the list so that x=1 and y=1. This also modifies the list inside the map, so that when put is called, it finds there was already a key with x=1 and y=1 and replaces the mapping.)
So you could fix this by saying something like either of the following:
// copying the List called index
revealed.put(new LinkedList<>(index), true);
// this makes more sense to me
revealed.put(Arrays.asList(x, y), true);
However, this leads me to the 3rd point.
There are better ways to do this, if you want practice with collections. One way would be to use a Map<Integer, Map<Integer, Boolean>>, like this:
Map<Integer, Map<Integer, Boolean>> revealed = new HashMap<>();
{
revealed.computeIfAbsent(x, HashMap::new).put(y, true);
// the preceding line is the same as saying
// Map<Integer, Boolean> yMap = revealed.get(x);
// if (yMap == null) {
// yMap = new HashMap<>();
// revealed.put(x, yMap);
// }
// yMap.put(y, true);
}
That is basically like a 2D array, but with a HashMap. (It could make sense if you had a very, very large game board.)
And judging by your description, it sounds like you already know that you could just make a boolean isRevealed; variable in the Tile class.
From the spec of a treemap gives me this:
The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.
The Java Linkedlist can not be compared just like that. You have to give it a way to compare them or just use another type of map, that does not need sorting.

What is the fastest way to find orphans between two large (size ~900K ) Vectors of Strings in Java?

I'm currently working on a Java program that is required to handle large amounts of data. I have two Vectors...
Vector collectionA = new Vector();
Vector collectionB = new Vector();
...and both of them will contain around 900,000 elements during processing.
I need to find all items in collectionB that are not contained in collectionA. Right now, this is how I'm doing it:
for (int i=0;i<collectionA.size();i++) {
if(!collectionB.contains(collectionA.elementAt(i))){
// do stuff if orphan is found
}
}
But this causes the program to run for lots of hours, which is unacceptable.
Is there any way to tune this so that I can cut my running time significantly?
I think I've read once that using ArrayList instead of Vector is faster. Would using ArrayLists instead of Vectors help for this issue?
Use a HashSet for the lookups.
Explanation:
Currently your program has to test every item in collectionB to see if it is equal to the item in collectionA that it is currently handling (the contains() method will need to check each item).
You should do:
Set<String> set = new HashSet<String>(collectionB);
for (Iterator i = collectionA.iterator(); i.hasNext(); ) {
if (!set.contains(i.next())) {
// handle
}
}
Using the HashSet will help, because the set will calculate a hash for each element and store the element in a bucket associated with a range of hash values. When checking whether an item is in the set, the hash value of the item will directly identify the bucket the item should be in. Now only the items in that bucket have to be checked.
Using a SortedSet like TreeSet would also be an improvement over Vector, since to find the item, only the position it would be in has tip be checked, instead of all positions. Which Set implementation would perform best depends on the data.
If ordering of the elements doesn't matter, I would go for HashSets, and do it as follows:
Set<String> a = new HashSet<>();
Set<String> b = new HashSet<>();
// ...
b.removeAll(a):
So in essence, you're removing from set b all the elements that are in set a, leaving the asymmetric set difference. Note that the removeAll method does modify set b, so if that's not what you want, you would need to make a copy first.
To find out whether HashSet or TreeSet is more efficient for this type of operation, I ran the below code with both types, and used Guava's Stopwatch to measure execution time.
#Test
public void perf() {
Set<String> setA = new HashSet<>();
Set<String> setB = new HashSet<>();
for (int i=0; i < 900000; i++) {
String uuidA = UUID.randomUUID().toString();
String uuidB = UUID.randomUUID().toString();
setA.add(uuidA);
setB.add(uuidB);
}
Stopwatch stopwatch = Stopwatch.createStarted();
setB.removeAll(setA);
System.out.println(stopwatch.elapsed(TimeUnit.MILLISECONDS));
}
On my modest development machine, using Oracle JDK 7, the TreeSet variant is about 4 times slower (~450ms) than the HashSet variant (~105ms).

Reset all values in hashmap without iterating?

I am trying reset all values in a HashMap to some default value if a condition fails.
Currently i am doing this by iterating over all the keys and individually resetting the values.Is there any possible way to set a same value to all the keys without iterating?
Something like:
hm.putAll("some val") //hm is hashmap object
You can't avoid iterating but if you're using java-8, you could use the replaceAll method which will do that for you.
Apply the specified function to each entry in this map, replacing each
entry's value with the result of calling the function's Function#map
method with the current entry's key and value.
m.replaceAll((k,v) -> yourDefaultValue);
Basically it iterates through each node of the table the map holds and affect the return value of the function for each value.
#Override
public void replaceAll(BiFunction<? super K, ? super V, ? extends V> function) {
Node<K,V>[] tab;
if (function == null)
throw new NullPointerException();
if (size > 0 && (tab = table) != null) {
int mc = modCount;
for (int i = 0; i < tab.length; ++i) {
for (Node<K,V> e = tab[i]; e != null; e = e.next) {
e.value = function.apply(e.key, e.value); //<-- here
}
}
if (modCount != mc)
throw new ConcurrentModificationException();
}
}
Example:
public static void main (String[] args){
Map<String, Integer> m = new HashMap<>();
m.put("1",1);
m.put("2",2);
System.out.println(m);
m.replaceAll((k,v) -> null);
System.out.println(m);
}
Output:
{1=1, 2=2}
{1=null, 2=null}
You can't avoid iterating in some fashion.
You could get the values via Map.values() and iterate over those. You'll bypass the lookup by key and it's probably the most efficient solution (although I suspect generally that would save you relatively little, and perhaps it's not the most obvious to a casual reader of your code)
IMHO You must create your own Data Structure that extends from Map. Then you can write your method resetAll() and give the default value. A Map is a quick balanced tree that allows you to walk quick in the structure and set the value. No worries about the speed, because the tree will have the same structure before and after the reset.
Only, be carefull with concurrent threads. Maybe you should use ConcurrentHashMap.
public class MyMap<K,V> extends ConcurrentHashMap<K, V>{
public void resetAll(V value){
Iterator<Entry<K, V>> it = this.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pairs = (Map.Entry)it.next();
pairs.setValue( value );
}
}
}
Regards
If you're willing to make a copy of it ( a hasmap with default values )
You can first clear your hashmap and then move over the default values
hm.keySet().removeAll();
hm.putAll(defaultMap);
It is not possible to apply an operation to all values in a collection in less than O(n) time, however if your objection is truly with iteration itself, there are some possible alternatives, notably functional programming.
This is made most easy by the Guava library (or natively in Java 8), and their functional programming utilities. Their Maps.transformValues() provides a view of the map, with the provided function applied. This means that the function returns in O(1) time, unlike your iteration, but that the computation is done on the fly whenever you .get() from the returned map. This is obviously a tradeoff - if you only need to .get() certain elements from the transformed map, you save time by avoiding computing unnecessary values. On the other hand, if you know you'll later hit every element at least once, using this behavior means you'll actually waste time. In essence, this approach is O(k) where k is the number of lookups you plan to do. If k is always less than n, then using the transformation approach is optimal.
Read carefully however the caveat at the top of the page; iteration is a simple, easy, and generally ideally efficient way to work with the members of a map. You should only try to optimize past that when absolutely necessary.
Assuming that your problem is not with doing the iteration yourself, but with the fact that O(n) is going on at some point, I would suggest a couple of alternative approaches. Bear in mind I have no idea what you are using this for, so it might not make any sense to you.
Case A: If your set of keys is known and fixed beforehand, keep a copy (not a reference, an actual clone) somewhere with the values reset to the one you want. Then on that condition you mention, simply switch the references to use the default one.
Case B: If they keys change over time, use the idea from case A but add new entries with the default value for every new key added (or remove accordingly). Your updates should hardly notice but you can still switch back to the default in O(1).

Count the occurrences of items in ArrayList

I have a java.util.ArrayList<Item> and an Item object.
Now, I want to obtain the number of times the Item is stored in the arraylist.
I know that I can do arrayList.contains() check but it returns true, irrespective of whether it contains one or more Items.
Q1. How can I find the number of time the Item is stored in the list?
Q2. Also, If the list contains more than one Item, then how can I determine the index of other Items because arrayList.indexOf(item) returns the index of only first Item every time?
You can use Collections class:
public static int frequency(Collection<?> c, Object o)
Returns the number of elements in the specified collection equal to the specified object. More formally, returns the number of elements e in the collection such that (o == null ? e == null : o.equals(e)).
If you need to count occurencies of a long list many times I suggest you to use an HashMap to store the counters and update them while you insert new items to the list. This would avoid calculating any kind of counters.. but of course you won't have indices.
HashMap<Item, Integer> counters = new HashMap<Item, Integer>(5000);
ArrayList<Item> items = new ArrayList<Item>(5000);
void insert(Item newEl)
{
if (counters.contains(newEl))
counters.put(newEl, counters.get(newEl)+1);
else
counters.put(newEl, 1);
items.add(newEl);
}
A final hint: you can use other collections framework (like Apache Collections) and use a Bag datastructure that is described as
Defines a collection that counts the number of times an object appears in the collection.
So exactly what you need..
This is easy to do by hand.
public int countNumberEqual(ArrayList<Item> itemList, Item itemToCheck) {
int count = 0;
for (Item i : itemList) {
if (i.equals(itemToCheck)) {
count++;
}
}
return count;
}
Keep in mind that if you don't override equals in your Item class, this method will use object identity (as this is the implementation of Object.equals()).
Edit: Regarding your second question (please try to limit posts to one question apiece), you can do this by hand as well.
public List<Integer> indices(ArrayList<Item> items, Item itemToCheck) {
ArrayList<Integer> ret = new ArrayList<Integer>();
for (int i = 0; i < items.size(); i++) {
if (items.get(i).equals(itemToCheck)) {
ret.add(i);
}
}
return ret;
}
As the other respondents have already said, if you're firmly committed to storing your items in an unordered ArrayList, then counting items will take O(n) time, where n is the number of items in the list. Here at SO, we give advice but we don't do magic!
As I just hinted, if the list gets searched a lot more than it's modified, it might make sense to keep it sorted. If your list is sorted then you can find your item in O(log n) time, which is a lot quicker; and if you have a hashcode implementation that goes well with your equals, all the identical items will be right next to each other.
Another possibility would be to create and maintain two data structures in parallel. You could use a HashMap containing your items as keys and their count as values. You'd be obligated to update this second structure any time your list changes, but item count lookups would be o(1).
I could be wrong, but it seems to me like the data structure you actually want might be a Multiset (from google-collections/guava) rather than a List. It allows multiples, unlike Set, but doesn't actually care about the order. Given that, it has a int count(Object element) method that does exactly what you want. And since it isn't a list and has implementations backed by a HashMap, getting the count is considerably more efficient.
Thanks for your all nice suggestion. But this below code is really very useful as we dont have any search method with List that can give number of occurance.
void insert(Item newEl)
{
if (counters.contains(newEl))
counters.put(newEl, counters.get(newEl)+1);
else
counters.put(newEl, 1);
items.add(newEl);
}
Thanks to Jack. Good posting.
Thanks,
Binod Suman
http://binodsuman.blogspot.com
I know this is an old post, but since I did not see a hash map solution, I decided to add a pseudo code on hash-map for anyone that needs it in the future. Assuming arraylist and Float data types.
Map<Float,Float> hm = new HashMap<>();
for(float k : Arralistentry) {
Float j = hm.get(k);
hm.put(k,(j==null ? 1 : j+1));
}
for(Map.Entry<Float, Float> value : hm.entrySet()) {
System.out.println("\n" +value.getKey()+" occurs : "+value.getValue()+" times");
}

Categories