How to get few random keys from HashMap - java

I have a map with titles. I want to print 10 random keys from my hashmap.
For example my map (String, Object) contains 100 pairs: "A, new Object(...)", "B, ...", "C, ..." etc.
I want to get 10 random keys from this map and append it to one string.
So my string should looks like: "A\nD\nB".

A quick way to get random 10 keys without repetition is putting the keys in a list and using Collections.shuffle to shuffle the list.
Map<String, Object> map = ...yourmap
ArrayList<String> keys = new ArrayList<>(map.keySet());
Collections.shuffle(keys);
List<String> randomTenKeys = keys.subList(0, 10);
Creating a list of all keys and shuffling it is not the most efficient thing you can do. You can do it in a single pass with a reservoir sampling algorithm. I haven't looked into it but you can probably find an implementation in some Apache or Guava library.

Joni's answer is quite good and short. But, here is a fully working example if you'd like. I split your problem into two methods - one to return a list of randomly selected keys and another to print keys in whichever way you like. You could combine the two methods into one. But, its better to keep them separate.
import java.util.*;
import java.util.stream.IntStream;
public class Test {
public static void main(String [] args){
Map<String, Object> map = new HashMap<>();
//You can use for loop instead to make a map of String, Integer.
IntStream.rangeClosed(0, 9).forEach(i -> map.put(i +"", i));//Map of 10 numbers.
List<String> keys = getRandomKeys(map, 3);
String allKeys = combineKeys(keys, "\n");
System.out.println(allKeys);
}
public static List<String> getRandomKeys(Map<String, Object> map, int keyCount) {
List<String> keys = new ArrayList<>(map.keySet());
for(int i = 0; i < map.size()-keyCount; i++){
int idx = (int) ( Math.random() * keys.size() );
keys.remove(idx);
}
return keys;
}
public static String combineKeys(List<String> keys, String separator){
String all = "";
for(int i = 0; i < keys.size() - 1; i++){
all = all + keys.get(i) + separator;
}
all += keys.get(keys.size()-1);//last element does not need separator.
return all;
}
}

HashMap Stores the values already in unsorted order it is random.
you can directly use
for(Map.Entry entry : map.entrySet())
str.append(entry.getKey()+" "+entry.getValue());
however if you want new order every time you can shuffle your data.
For Shuffle you need to get all keys in a array or list
Then you can shuffle that list and iterate over that list to get values from hashmap

This is a complementary answer to Joni's answer. Use String:join to join the randomTenKeys.
Given below is Joni's answer:
Map<String, Object> map = ...yourmap
ArrayList<String> keys = new ArrayList<>(map.keySet());
Collections.shuffle(keys);
List<String> randomTenKeys = keys.subList(0, 10);
and the complementary answer is:
String joinedKeys = String.join("\n", randomTenKeys);

Set<String> keys = myMap.keySet();
String combined = "";
for (int i=0; i<10; i++)
{
int random = (int)(Math.random() * keys.size());
String key = keys.get(random);
combined += key + "\n";
keys.remove(random);
}

Related

Getting Objects with specified values from hashtable Java

I have a hashtable with (String, Object). I have to segregate all objects by the length of the key String and create an array of arrays of Strings with the same length. Can someone guide me how could I accomplish that?
My code so far:
Set<String> keys = words.keySet();
ArrayList<ArrayList<Word>> outer = new ArrayList<ArrayList<Word>>();
ArrayList<Word> inner = new ArrayList<Word>();
for(String key: keys) {
for (int i=0; i< 15; i++) {
if (key.length() == i) {
inner.add(words.get(key));
}
outer.add(i, inner);
}
}
The way you're looping is inefficient since you may not have many words of certain sizes so you'll be needlessly checking the length of every single word against i for each length. You can just go through your list of words once and use a map to associate words with the keys representing their lengths, then collate the lists at the end.
Try this:
Map<Integer, List<String>> sizeMap = new HashMap<>();
for (String key: keys) {
int length = key.length();
if (sizeMap.containsKey(length)) {
// If we already have a list initialized, add the word
List<String> mWords = sizeMap.get(length);
mWords.add(key);
} else {
// Otherwise, add an empty list so later we don't try appending to null
sizeMap.put(length, new ArrayList<>());
}
}
// Convert the map to a list of lists
for (List<String> sizeGrouping : sizeMap.values()) {
outer.add(sizeGrouping);
}

How to Sort a list of strings and find the 1000 most common values in java

In java (either using external libraries or not) I need to take a list of approximately 500,000 values and find the most frequently occurring (mode) 1000. Doing my best to keep the complexity to a minimum.
What I've tried so far, make a hash, but I can't because it would have to be backwards key=count value =string, otherwise when getting the top 1000, my complexity will be garbage. and the backwards way doesn't really work great because I would be having a terrible complexity for insertion as I search for where my string is to be able to remove it and insert it one higher...
I've tried using a binary search tree, but that had the same issue of what the data would be for sorting, either on the count or the string. If it's on the string then getting the count for the top 1000 is bad, and vice versa insertion is bad.
I could sort the list first (by string) and then iterate over the list and keep a count until it changes strings. but what data structure should I use to keep track of the top 1000?
Thanks
I would first create a Map<String, Long> to store the frequency of each word. Then, I'd sort this map by value in descending order and finally I'd keep the first 1000 entries.
In code:
List<String> top1000Words = listOfWords.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.sorted(Map.Entry.comparingByValue().reversed())
.limit(1000)
.map(Map.Entry::getKey)
.collect(Collectors.toList());
You might find it cleaner to separate the above into 2 steps: first collecting to the map of frequencies and then sorting its entries by value and keeping the first 1000 entries.
I'd separate this into three phases:
Count word occurrences (e.g. by using a HashMap<String, Integer>)
Sort the results (e.g. by converting the map into a list of entries and ordering by value descending)
Output the top 1000 entries of the sorted results
The sorting will be slow if the counts are small (e.g. if you've actually got 500,000 separate words) but if you're expecting lots of duplicate words, it should be fine.
I have had this question open for a few days now and have decided to rebel against Federico's elegant Java 8 answer and submit the least Java 8 answer possible.
The following code makes use of a helper class that associates a tally with a string.
public class TopOccurringValues {
static HashMap<String, StringCount> stringCounts = new HashMap<>();
// set low for demo. Change to 1000 (or whatever)
static final int TOP_NUMBER_TO_COLLECT = 10;
public static void main(String[] args) {
// load your strings in here
List<String> strings = loadStrings();
// tally up string occurrences
for (String string: strings) {
StringCount stringCount = stringCounts.get(string);
if (stringCount == null) {
stringCount = new StringCount(string);
}
stringCount.increment();
stringCounts.put(string, stringCount);
}
// sort which have most
ArrayList<StringCount> sortedCounts = new ArrayList<>(stringCounts.values());
Collections.sort(sortedCounts);
// collect the top occurring strings
ArrayList<String> topCollection = new ArrayList<>();
int upperBound = Math.min(TOP_NUMBER_TO_COLLECT, sortedCounts.size());
System.out.println("string\tcount");
for (int i = 0; i < upperBound; i++) {
StringCount stringCount = sortedCounts.get(i);
topCollection.add(stringCount.string);
System.out.println(stringCount.string + "\t" + stringCount.count);
}
}
// in this demo, strings are randomly generated numbers.
private static List<String> loadStrings() {
Random random = new Random(1);
ArrayList<String> randomStrings = new ArrayList<>();
for (int i = 0; i < 5000000; i++) {
randomStrings.add(String.valueOf(Math.round(random.nextGaussian() * 1000)));
}
return randomStrings;
}
static class StringCount implements Comparable<StringCount> {
int count = 0;
String string;
StringCount(String string) {this.string = string;}
void increment() {count++;}
#Override
public int compareTo(StringCount o) {return o.count - count;}
}
}
55 lines of code! It's like reverse code golf. The String generator creates 5 million strings instead of 500,000 because: why not?
string count
-89 2108
70 2107
77 2085
-4 2077
36 2077
65 2072
-154 2067
-172 2064
194 2063
-143 2062
The randomly generated strings can have values between -999 and 999 but because we are getting gaussian values, we will see numbers with higher scores that are closer to 0.
The Solution I chose to use was to first make a hash map with key value pairs as . I got the count by iterating over a linked list, and inserting the key value pair, Before insertion I would check for existence and if so increase the count. That part was quite straight forward.
The next part where I needed to sort it according to it's value, I used a library called guava published by google and it was able to make it very easy to sort by value instead of key using what they called a multimap. where they in a sense reverse the hash, and allow multiple values to be mapped to one key, so that I can have all my top 1000, opposed to some solutions mentioned above which didn't allow that, and would cause me to just get one value per key.
The last step was to iterate over the multimap (backwards) to get the 1000 most frequent occurrences.
Have a look at the code of the function if you're interested
private static void FindNMostFrequentOccurences(ArrayList profileName,int n) {
HashMap<String, Integer> hmap = new HashMap<String, Integer>();
//iterate through our data
for(int i = 0; i< profileName.size(); i++){
String current_id = profileName.get(i).toString();
if(hmap.get(current_id) == null){
hmap.put(current_id, 1);
} else {
int current_count = hmap.get(current_id);
current_count += 1;
hmap.put(current_id, current_count);
}
}
ListMultimap<Integer, String> multimap = ArrayListMultimap.create();
hmap.entrySet().forEach(entry -> {
multimap.put(entry.getValue(), entry.getKey());
});
for (int i = 0; i < n; i++){
if (!multimap.isEmpty()){
int lastKey = Iterables.getLast(multimap.keys());
String lastValue = Iterables.getLast(multimap.values());
multimap.remove(lastKey, lastValue);
System.out.println(i+1+": "+lastValue+", Occurences: "+lastKey);
}
}
}
You can do that with the java stream API :
List<String> input = Arrays.asList(new String[]{"aa", "bb", "cc", "bb", "bb", "aa"});
// First we compute a map of word -> occurrences
final Map<String, Long> collect = input.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
// Here we sort the map and collect the first 1000 entries
final List<Map.Entry<String, Long>> entries = new ArrayList<>(collect.entrySet());
final List<Map.Entry<String, Long>> result = entries.stream()
.sorted(Comparator.comparing(Map.Entry::getValue, Comparator.reverseOrder()))
.limit(1000)
.collect(Collectors.toList());
result.forEach(System.out::println);

String[] or ArrayList better as Key in HashMap?

So I need to choose between
HashMap<String[], Object>
HashMap<ArrayList<String>,Object>
My input Parameter is: ArrayList<String> in.
The whole ArrayList<String> in cannot be the key, since it does contain elements, which are not supposed to be like a Primary Key in a database. I do know, that the first n elements of the incoming ArrayList<String> in supposed to be the primary Keys.
Which one would be faster?
Scenario:
HashMap<ArrayList<String>, Object> hmAL = new HashMap<>();
HashMap<String[], Object> hmSA = new HashMap<>();
ArrayList<String> in = new ArrayList<>();
fillWithStuff(in);
//Which one would be faster?
getObject(in,hmAL,5);
getObject(in,hmSA,5);
With Option 1:
private Object getObject(ArrayList<String> in, HashMap<ArrayList<String>, Object> hm, int n){
return hm.get(in.sublist(0,n));
}
With Option 2:
private Object getObject(ArrayList<String> in, HashMap<String[], Object> hm, int n){
String[] temp = new String[n];
for(int i=0; i<n; i++)
temp[i]=in.get(i);
return hm.get(temp);
}
Considering:
Which is faster? Short the list, or copy to an array?
I'm wondering, which hash (since it is a HashMap) would be faster. Hashing of a ArrayList, or an equal-sized array. Or doesn't it make any difference?
Using String[] is not a good idea because it does not implement hashCode(). This means if you have 2 string arrays which are different objects but with the exact same values, the map will not find it.
The implementation of 'hashCode` seems to use each of the string elements hashcode so the lookup in a map would succeed. So I'd go with this one.
That said, I would rather build a key myself based on the objects in the list.
Dealing with copying only
The subList method is implemented very efficiently in Java 7+, not requiring any copying at all. It simply returns a view directly onto the original array. Thus, in Java 7+, it will be faster than the copy element by element method. However, in Java 6, both ways are essentially equivalent.
Dealing with the method as a whole
If you look at the whole method, your choice is no longer a choice. If you want the method to function, you will have to use the first implementation. Array hashCode() does not look at the elements inside it---only the identity of the array. Because you are creating the array in your method, the Map.get() will necessary return null.
On the other hand, the List.hashCode() method runs a hash on all of the contained elements, meaning that it will successfully match if all of the contained elements are the same.
Your choice is clear.
Just to add on above two answers, I have tested in Java 7 and found on an average with list it's 50 times faster with 2000000 total elements and 1000000 elements which participate in calculating hashcode i.e. primary keys (hypothetical number). Below is the program.
public class TestHashing {
public static void main(String[] args) {
HashMap<ArrayList<String>, Object> hmAL = new HashMap();
HashMap<String[], Object> hmSA = new HashMap<>();
ArrayList<String> in = new ArrayList<>();
fillWithStuff(in);
// Which one would be faster?
long start = System.nanoTime();
getObject(in, hmAL, 1000000);
long end = System.nanoTime();
long firstTime = (end-start);
System.out.println("firstTime :: "+ firstTime);
start = System.nanoTime();
getObject1(in, hmSA, 1000000);
end = System.nanoTime();
long secondTime = (end-start);
System.out.println("secondTime :: "+ secondTime);
System.out.println("First is faster by "+ secondTime/firstTime);
}
private static void fillWithStuff(ArrayList<String> in) {
for(int i =0; i< 2000000; i++) {
in.add(i+"");
}
}
private static Object getObject(ArrayList<String> in,
HashMap<ArrayList<String>, Object> hm, int n) {
return hm.get(in.subList(0, n));
}
private static Object getObject1(ArrayList<String> in, HashMap<String[], Object> hm, int n){
String[] temp = new String[n];
for(int i=0; i<n; i++)
temp[i]=in.get(i);
return hm.get(temp);
}
}
Output
firstTime :: 218000
secondTime :: 11627000
First is faster by 53

Compare Lists of Pairs to find similars

Movie1{{'hello',5},{'foo',3}}
Movie2{{'hi',2},{'foo',2}}
While testing i am testing with 2 movies each has around 20 unique words grouped in pairs of word and frequency
public ArrayList<Pair<String, Integer>> getWordsAndFrequency() {
String[] keys = description.split(" ");
String[] uniqueKeys;
int count = 0;
uniqueKeys = getUniqueKeys(keys);
for (String key : uniqueKeys) {
if (null == key) {
break;
}
for (String s : keys) {
if (key.equals(s)) {
count++;
}
}
words.add(Pair.of(key, count));
count = 0;
}
sortWords(words);
return words;
}
Your bug is your getWordsAndFrequency() method actually adds more entries to words. So each time you call it the word list gets longer and longer. To fix this, you should calculate the words and frequency once and add these Pairs to the list, then just return the list in the getWordsAndFrequency() method rather than calculating it every time.
Can you put the data (that is currently stored in an arraylist of pairs) in a hashmap?
You can then compute the intersection of the sets of keywords between two movies and add their scores
For example:
Map<String, Integer> keyWordsMovie1 = movie1.getWordsAndFrequency();
Map<String, Integer> keyWordsMovie2 = movie2.getWordsAndFrequency();
Set<String> commonKeyWords = new HashSet<String>(keyWordsMovie1.keySet()); //set of all keywords in movie1
intersection.retainAll(keyWordsMovie2.keySet());
for (String keyWord : intersection){
int freq1 = keyWordsMovie1.get(keyWord);
int freq2 = keyWordsMovie2.get(keyWord);
//you now have the frequencies of the keyword in both movies
}

How to put key/values into a HashMap from StringBuilder using loop? - Java

I'm using several techniques here, so it's hard to find help online.
I need to populate a HashMap<String, String> with values I take from part of a StringBuilder, then take the keys and add them into an ArrayList<String>, then print the list. But when I print, I get a list full of nulls. I don't know why, I thought it would print the values I got from the StringBuilder. It should print: Values taken from the hashmap keys: ABCDEFGHI. (The reason I used StringBuilder is because String is immutable, is this correct thinking too?)
Also, I figured using a loop to print the list is okay, since my keys are actually numbers.
I've never created a HashMap before, so maybe I'm missing something. Thanks for your help.
Here is my code:
// create HashMap from String
StringBuilder alphaNum = new StringBuilder("1A2B3C4D5E6F7G8H9I");
Map<String, String> myAlphaNum = new HashMap<String, String>(9);
// for loop puts key and values in map, taken from String
for (int i = 0; i < alphaNum.length();)
{
myAlphaNum.put(alphaNum.substring(i, ++i), alphaNum.substring(i, ++i));
}
if (myAlphaNum.containsKey(1))
System.out.println("Key is there.");
else
System.out.println("Key is null.");
// create ArrayList, add values to it using map keys
ArrayList<String> arrayList = new ArrayList<String>();
// for loop gets the "number" keys from HashMap to get the "letter" values
for (int j = 1; j <= myAlphaNum.size(); j++)
arrayList.add(myAlphaNum.get(j));
System.out.print("Values taken from the hashmap keys: ");
for (String list : arrayList)
System.out.print(list);
Console:
Key is null.
Values taken from the hashmap keys: nullnullnullnullnullnullnullnullnull
You are using containsKey/get with an Integer as parameter, while your map keys are defined as String. That's why you got null.
I would recommend to use a Map<Integer, String> myAlphaNum = new HashMap<Integer, String>(9); and in your loop myAlphaNum.put(Integer.parseInt(alphaNum.substring(i, ++i)), alphaNum.substring(i, ++i));. Then you'll get your desired output.
Also you could the ArrayList constructor that takes a Collection as parameter (or just sysout myAlphaNum.values()) directly.
// create ArrayList, add values to it using map keys
ArrayList<String> arrayList = new ArrayList<String>(myAlphaNum.values());
System.out.println(arrayList); //[A, B, C, D, E, F, G, H, I]
myAlphaNum has keys of type String, so passing an int to get (myAlphaNum.get(j)) will always return null.
There are several ways to iterate over the values (or keys or entries) of the map.
For example (assuming you only care about the values) :
for (String value : myAlphaNum.values())
arrayList.add(value);
// create HashMap from String
StringBuilder alphaNum = new StringBuilder("1A2B3C4D5E6F7G8H9I");
Map<String, String> myAlphaNum = new HashMap<String, String>(9);
// for loop puts key and values in map, taken from String
for (int i = 0; i < alphaNum.length();)
{
myAlphaNum.put(alphaNum.substring(i, ++i), alphaNum.substring(i, ++i));
}
System.out.println(myAlphaNum);
if (myAlphaNum.containsKey(1))
System.out.println("Key is there.");
else
System.out.println("Key is null.");
// create ArrayList, add values to it using map keys
ArrayList<String> arrayList = new ArrayList<String>();
// for loop gets the "number" keys from HashMap to get the "letter" values
for (int j = 1; j <= myAlphaNum.size(); j++)
arrayList.add(myAlphaNum.get(j+""));
System.out.print("Values taken from the hashmap keys: ");
for (String list : arrayList)
System.out.print(list);
You can try the above code. You have used string key bit while retriving Integer so it wont return anything.

Categories