Find most occurring String within ArrayList

Find most occurring String within ArrayList - java

I am kind of new to Java and wanted to ask how I can find the most occurring String within an ArrayList.
I have two classes.
One holds game activities and the winner for those games.
The other class is supposed to be the Event that holds all the games.
For the event class, I have to find the person with the most game activities won. I made an ArrayList in the events class that holds all the winners from the games. Now I need to find the name that occurs most often in the ArrayList and output the String.
private ArrayList<Game> games;
public String getEventWinner(){
ArrayList<String> winner;
winner = new ArrayList<String>();
for(Game game : games)
{
winner.add(game.getWinner());
}
return eventWinner;
}
That gets me all the winners from the games in an ArrayList, but now I do not know how to proceed and couldn't find any answer online. Could somebody lead me in the right direction?

Calculate the frequencies of the strings into a map, find the entry with max value, and return it.
With Stream API this could look like this:
public static String getEventWinner(List<Game> games) {
return games.stream()
.map(Game::getWinner) // Stream<String>
.collect(Collectors.groupingBy(
winner -> winner, LinkedHashMap::new,
Collectors.summingInt(x -> 1)
)) // build the frequency map
.entrySet().stream()
.max(Map.Entry.comparingByValue()) // Optional<Map.Entry<String, Integer>>
.map(Map.Entry::getKey) // the key - winner name
.orElse(null);
}
Here LinkedHashMap is used as a tie breaker, however, it allows to select as a winner the person who appeared earlier in the game list.
Test:
System.out.println(getEventWinner(Arrays.asList(
new Game("John"), new Game("Zach"), new Game("Zach"),
new Game("Chad"), new Game("John"), new Game("Jack")
)));
// output John
If it is needed to define the winner the person who achieved the maximum number of wins earlier, another loop-based solution should be used:
public static String getEventWinnerFirst(List<Game> games) {
Map<String, Integer> winMap = new HashMap<>();
String resultWinner = null;
int max = -1;
for (Game game : games) {
String winner = game.getWinner();
int tempSum = winMap.merge(winner, 1, Integer::sum);
if (max < tempSum) {
resultWinner = winner;
max = tempSum;
}
}
return resultWinner;
}
For the same input data, the output will be Zach because he occurred twice in the list earlier than John.
Update 2
It may be possible to find the earliest achiever of the max result using Stream but a temporary map needs to be created to store the number of wins:
public static String getEventWinner(List<String> games) {
Map<String, Integer> tmp = new HashMap<>();
return games.stream()
.map(Game::getWinner) // Stream<String>
.map(winner -> Map.entry(winner, tmp.merge(winner, 1, Integer::sum))) // store sum in tmp map
.collect(Collectors.maxBy(Map.Entry.comparingByValue()))
.map(Map.Entry::getKey) // the key - winner name
.orElse(null);
}

Stream APIs are great but can be a bit cryptic for the uninitiated. I find that non streaming code is often more readable.
Here is a solution in that spirit:
List<String> winners = Arrays.asList("a","b","c","c","b","b");
Map<String,Integer> entries = new HashMap<>();
String mostFrequent=null;
int mostFrequentCount=0;
for (String winner : winners) {
Integer count = entries.get(winner);
if (count == null) count = 0;
entries.put(winner, count+1);
if (count>=mostFrequentCount)
{
mostFrequentCount=count;
mostFrequent=winner;
}
}
System.out.println("Most frequent = " + mostFrequent + " # of wins = " + mostFrequentCount );

Related

How to Sort a list of strings and find the 1000 most common values in java

In java (either using external libraries or not) I need to take a list of approximately 500,000 values and find the most frequently occurring (mode) 1000. Doing my best to keep the complexity to a minimum.
What I've tried so far, make a hash, but I can't because it would have to be backwards key=count value =string, otherwise when getting the top 1000, my complexity will be garbage. and the backwards way doesn't really work great because I would be having a terrible complexity for insertion as I search for where my string is to be able to remove it and insert it one higher...
I've tried using a binary search tree, but that had the same issue of what the data would be for sorting, either on the count or the string. If it's on the string then getting the count for the top 1000 is bad, and vice versa insertion is bad.
I could sort the list first (by string) and then iterate over the list and keep a count until it changes strings. but what data structure should I use to keep track of the top 1000?
Thanks

I would first create a Map<String, Long> to store the frequency of each word. Then, I'd sort this map by value in descending order and finally I'd keep the first 1000 entries.
In code:
List<String> top1000Words = listOfWords.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.sorted(Map.Entry.comparingByValue().reversed())
.limit(1000)
.map(Map.Entry::getKey)
.collect(Collectors.toList());
You might find it cleaner to separate the above into 2 steps: first collecting to the map of frequencies and then sorting its entries by value and keeping the first 1000 entries.

I'd separate this into three phases:
Count word occurrences (e.g. by using a HashMap<String, Integer>)
Sort the results (e.g. by converting the map into a list of entries and ordering by value descending)
Output the top 1000 entries of the sorted results
The sorting will be slow if the counts are small (e.g. if you've actually got 500,000 separate words) but if you're expecting lots of duplicate words, it should be fine.

I have had this question open for a few days now and have decided to rebel against Federico's elegant Java 8 answer and submit the least Java 8 answer possible.
The following code makes use of a helper class that associates a tally with a string.
public class TopOccurringValues {
static HashMap<String, StringCount> stringCounts = new HashMap<>();
// set low for demo. Change to 1000 (or whatever)
static final int TOP_NUMBER_TO_COLLECT = 10;
public static void main(String[] args) {
// load your strings in here
List<String> strings = loadStrings();
// tally up string occurrences
for (String string: strings) {
StringCount stringCount = stringCounts.get(string);
if (stringCount == null) {
stringCount = new StringCount(string);
}
stringCount.increment();
stringCounts.put(string, stringCount);
}
// sort which have most
ArrayList<StringCount> sortedCounts = new ArrayList<>(stringCounts.values());
Collections.sort(sortedCounts);
// collect the top occurring strings
ArrayList<String> topCollection = new ArrayList<>();
int upperBound = Math.min(TOP_NUMBER_TO_COLLECT, sortedCounts.size());
System.out.println("string\tcount");
for (int i = 0; i < upperBound; i++) {
StringCount stringCount = sortedCounts.get(i);
topCollection.add(stringCount.string);
System.out.println(stringCount.string + "\t" + stringCount.count);
}
}
// in this demo, strings are randomly generated numbers.
private static List<String> loadStrings() {
Random random = new Random(1);
ArrayList<String> randomStrings = new ArrayList<>();
for (int i = 0; i < 5000000; i++) {
randomStrings.add(String.valueOf(Math.round(random.nextGaussian() * 1000)));
}
return randomStrings;
}
static class StringCount implements Comparable<StringCount> {
int count = 0;
String string;
StringCount(String string) {this.string = string;}
void increment() {count++;}
#Override
public int compareTo(StringCount o) {return o.count - count;}
}
}
55 lines of code! It's like reverse code golf. The String generator creates 5 million strings instead of 500,000 because: why not?
string count
-89 2108
70 2107
77 2085
-4 2077
36 2077
65 2072
-154 2067
-172 2064
194 2063
-143 2062
The randomly generated strings can have values between -999 and 999 but because we are getting gaussian values, we will see numbers with higher scores that are closer to 0.

The Solution I chose to use was to first make a hash map with key value pairs as . I got the count by iterating over a linked list, and inserting the key value pair, Before insertion I would check for existence and if so increase the count. That part was quite straight forward.
The next part where I needed to sort it according to it's value, I used a library called guava published by google and it was able to make it very easy to sort by value instead of key using what they called a multimap. where they in a sense reverse the hash, and allow multiple values to be mapped to one key, so that I can have all my top 1000, opposed to some solutions mentioned above which didn't allow that, and would cause me to just get one value per key.
The last step was to iterate over the multimap (backwards) to get the 1000 most frequent occurrences.
Have a look at the code of the function if you're interested
private static void FindNMostFrequentOccurences(ArrayList profileName,int n) {
HashMap<String, Integer> hmap = new HashMap<String, Integer>();
//iterate through our data
for(int i = 0; i< profileName.size(); i++){
String current_id = profileName.get(i).toString();
if(hmap.get(current_id) == null){
hmap.put(current_id, 1);
} else {
int current_count = hmap.get(current_id);
current_count += 1;
hmap.put(current_id, current_count);
}
}
ListMultimap<Integer, String> multimap = ArrayListMultimap.create();
hmap.entrySet().forEach(entry -> {
multimap.put(entry.getValue(), entry.getKey());
});
for (int i = 0; i < n; i++){
if (!multimap.isEmpty()){
int lastKey = Iterables.getLast(multimap.keys());
String lastValue = Iterables.getLast(multimap.values());
multimap.remove(lastKey, lastValue);
System.out.println(i+1+": "+lastValue+", Occurences: "+lastKey);
}
}
}

You can do that with the java stream API :
List<String> input = Arrays.asList(new String[]{"aa", "bb", "cc", "bb", "bb", "aa"});
// First we compute a map of word -> occurrences
final Map<String, Long> collect = input.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
// Here we sort the map and collect the first 1000 entries
final List<Map.Entry<String, Long>> entries = new ArrayList<>(collect.entrySet());
final List<Map.Entry<String, Long>> result = entries.stream()
.sorted(Comparator.comparing(Map.Entry::getValue, Comparator.reverseOrder()))
.limit(1000)
.collect(Collectors.toList());
result.forEach(System.out::println);

how to find the duplicates in ArrayList using hashmap in java?

my program is reading large txt files(in MBs) which contain the source ip and destination ip(for example 192.168.125.10,112.25.2.1) ,,,Here read is an ArrayList in which the data is present.
i have generated unique ids(uid int type) using srcip and destip and now i am storing in
static ArrayList<Integer[]> prev = new ArrayList<Integer[]>();
where Array is
:-
static Integer[] multi1;
multi1 = new Integer[]{(int)uid,count,flag};
i have to print the all uids with there count or their frequencies using hashmap.
Plz give some solution...
for (ArrayList<String> read : readFiles.values())
{
if(file_count<=2)
{
for(int i=0 ; i<read.size() ; i++)
{
String str1=read.get(i).split(",")[0];//get only srcIP
String str2=read.get(i).split(",")[1];//get only destIP
StringTokenizer tokenizer1=new StringTokenizer(str1,".");
StringTokenizer tokenizer2=new StringTokenizer(str2,".");
if(tokenizer1.hasMoreTokens()&&tokenizer2.hasMoreTokens())
{
sip_oct1=Integer.parseInt(tokenizer1.nextToken());
sip_oct2=Integer.parseInt(tokenizer1.nextToken());
sip_oct3=Integer.parseInt(tokenizer1.nextToken());
sip_oct4=Integer.parseInt(tokenizer1.nextToken());
dip_oct1=Integer.parseInt(tokenizer2.nextToken());
dip_oct2=Integer.parseInt(tokenizer2.nextToken());
dip_oct3=Integer.parseInt(tokenizer2.nextToken());
dip_oct4=Integer.parseInt(tokenizer2.nextToken());
uid=uniqueIdGenerator(sip_oct1,sip_oct2,sip_oct3,sip_oct4,dip_oct1,dip_oct2,dip_oct3,dip_oct4);
}
multi1 = new Integer[]{(int)uid,count,flag};
prev.add(multi1);
System.out.println(prev.get(i)[0]);//getting uids from prev
Map<ArrayList<Integer []> , Integer> map = new HashMap<ArrayList<Integer[]>, Integer>();
for (int j=0 ; j<prev.size() ; j++)
{
Integer temp=map.get(prev.get(i)[0]);
count = map.get(temp);
map.put(temp, (count == null) ? 1 : count++);
}
printMap(map);
System.out.println("uids--->"+prev.get(i)[0]+" Count--- >"+count+" flag--->"+prev.get(i)[2]);
}
}
file_count++;
}
}
public static void printMap(Map<ArrayList<Integer[]>, Integer> map)
{
for (Entry<ArrayList<Integer[]>, Integer> entry : map.entrySet())
{
System.out.println(" Value : "+ entry.getValue()+"key : "+entry.getKey());
}
}
public static double uniqueIdGenerator(int oc1,int oc2,int oc3,int oc4,int oc5,int oc6,int oc7,int oc8)
{
int a,b;
double c;
a=((oc1*10+oc2)*10+oc3)*10+oc4;
b=((oc5*10+oc6)*10+oc7)*10+oc8;
c= Math.log(a)+Math.log(b);
return Math.round(c*1000);
}

Now understanding what you want, there are (at least) 2 ways of doing this.
1st: Make a list with the uid's. Then a second list where you can have a value (your uid) and keep a count. Was thinking of HashMap, but there you can not easily change the count. Maybe an ArrayList of a list with 2 values.
Then loop over your list with the uid's, check with a second for loop if the uid is already in the second list. If it is, add one to the count. If it is not, add it to the list.
2nd: Do the same thing, but then with classes (very Java). Then you can put even more info into the class ;)
Hope this helps!
*edit: #RC. indeed gives cleaner code.

Converting singly linked list to a map

I have been given an assignment to change to upgrade an existing one.
Figure out how to recode the qualifying exam problem using a Map for each terminal line, on the
assumption that the size of the problem is dominated by the number of input lines, not the 500
terminal lines
The program takes in a text file that has number, name. The number is the PC number and the name is the user who logged on. The program returns the user for each pc that logged on the most. Here is the existing code
public class LineUsageData {
SinglyLinkedList<Usage> singly = new SinglyLinkedList<Usage>();
//function to add a user to the linked list or to increment count by 1
public void addObservation(Usage usage){
for(int i = 0; i < singly.size(); ++i){
if(usage.getName().equals(singly.get(i).getName())){
singly.get(i).incrementCount(1);
return;
}
}
singly.add(usage);
}
//returns the user with the most connections to the PC
public String getMaxUsage(){
int tempHigh = 0;
int high = 0;
String userAndCount = "";
for(int i = 0; i < singly.size(); ++i){//goes through list and keeps highest
tempHigh = singly.get(i).getCount();
if(tempHigh > high){
high = tempHigh;
userAndCount = singly.get(i).getName() + " " + singly.get(i).getCount();
}
}
return userAndCount;
}
}
I am having trouble on the theoretical side. We can use a hashmap or a treemap. I am trying to think through how I would form a map that would hold the list of users for each pc? I can reuse the Usage object which will hold the name and the count of the user. I am not supposed to alter that object though

When checking if Usage is present in the list you perform a linear search each time (O(N)). If you replace your list with the Map<String,Usage>, you'll be able to search for name in sublinear time. TreeMap has O(log N) time for search and update, HashMap has amortized O(1)(constant) time.
So, the most effective data structure in this case is HashMap.
import java.util.*;
public class LineUsageData {
Map<String, Usage> map = new HashMap<String, Usage>();
//function to add a user to the map or to increment count by 1
public void addObservation(Usage usage) {
Usage existentUsage = map.get(usage.getName());
if (existentUsage == null) {
map.put(usage.getName(), usage);
} else {
existentUsage.incrementCount(1);
}
}
//returns the user with the most connections to the PC
public String getMaxUsage() {
Usage maxUsage = null;
for (Usage usage : map.values()) {
if (maxUsage == null || usage.getCount() > maxUsage.getCount()) {
maxUsage = usage;
}
}
return maxUsage == null ? null : maxUsage.getName() + " " + maxUsage.getCount();
}
// alternative version that uses Collections.max
public String getMaxUsageAlt() {
Usage maxUsage = map.isEmpty() ? null :
Collections.max(map.values(), new Comparator<Usage>() {
#Override
public int compare(Usage o1, Usage o2) {
return o1.getCount() - o2.getCount();
}
});
return maxUsage == null ? null : maxUsage.getName() + " " + maxUsage.getCount();
}
}
Map can also be iterated in the time proportional to it's size, so you can use the same procedure to find maximum element in it. I gave you two options, either manual approach, or usage of Collections.max utility method.

With simple words: You use a LinkedList (singly or doubly) when you have a list of items, and you usually plan to traverse them,
and a Map implementation when you have "Dictionary-like" entries, where a key corresponds to a value and you plan to access the value using the key.
In order to convert your SinglyLinkedList to a HashMap or TreeMap, you need find out which property of your item will be used as your key (it must be an element with unique values).
Assuming you are using the name property from your Usage class, you can do this
(a simple example):
//You could also use TreeMap, depending on your needs.
Map<String, Usage> usageMap = new HashMap<String, Usage>();
//Iterate through your SinglyLinkedList.
for(Usage usage : singly) {
//Add all items to the Map
usageMap.put(usage.getName(), usage);
}
//Access a value using its name as the key of the Map.
Usage accessedUsage = usageMap.get("AUsageName");
Also note that:
Map<string, Usage> usageMap = new HashMap<>();
Is valid, due to diamond inference.

I Solved this offline and didn't get a chance to see some of the answers which looked to be both very helpful. Sorry about that Nick and Aivean and thanks for the responses. Here is the code i ended up writing to get this to work.
public class LineUsageData {
Map<Integer, Usage> map = new HashMap<Integer, Usage>();
int hash = 0;
public void addObservation(Usage usage){
hash = usage.getName().hashCode();
System.out.println(hash);
while((map.get(hash)) != null){
if(map.get(hash).getName().equals(usage.name)){
map.get(hash).count++;
return;
}else{
hash++;
}
}
map.put(hash, usage);
}
public String getMaxUsage(){
String str = "";
int tempHigh = 0;
int high = 0;
//for loop
for(Integer key : map.keySet()){
tempHigh = map.get(key).getCount();
if(tempHigh > high){
high = tempHigh;
str = map.get(key).getName() + " " + map.get(key).getCount();
}
}
return str;
}
}

Compare Lists of Pairs to find similars

Movie1{{'hello',5},{'foo',3}}
Movie2{{'hi',2},{'foo',2}}
While testing i am testing with 2 movies each has around 20 unique words grouped in pairs of word and frequency
public ArrayList<Pair<String, Integer>> getWordsAndFrequency() {
String[] keys = description.split(" ");
String[] uniqueKeys;
int count = 0;
uniqueKeys = getUniqueKeys(keys);
for (String key : uniqueKeys) {
if (null == key) {
break;
}
for (String s : keys) {
if (key.equals(s)) {
count++;
}
}
words.add(Pair.of(key, count));
count = 0;
}
sortWords(words);
return words;
}

Your bug is your getWordsAndFrequency() method actually adds more entries to words. So each time you call it the word list gets longer and longer. To fix this, you should calculate the words and frequency once and add these Pairs to the list, then just return the list in the getWordsAndFrequency() method rather than calculating it every time.

Can you put the data (that is currently stored in an arraylist of pairs) in a hashmap?
You can then compute the intersection of the sets of keywords between two movies and add their scores
For example:
Map<String, Integer> keyWordsMovie1 = movie1.getWordsAndFrequency();
Map<String, Integer> keyWordsMovie2 = movie2.getWordsAndFrequency();
Set<String> commonKeyWords = new HashSet<String>(keyWordsMovie1.keySet()); //set of all keywords in movie1
intersection.retainAll(keyWordsMovie2.keySet());
for (String keyWord : intersection){
int freq1 = keyWordsMovie1.get(keyWord);
int freq2 = keyWordsMovie2.get(keyWord);
//you now have the frequencies of the keyword in both movies
}

How to Sort a Map<String, List<Object>> by the Key with the most values (that are not numeric) assigned to it

I have been working with Maps at present and I am baffled by how I can get my program to work effectively. I can iterate over the map get the keys and values and sort them in alphabetical and reverse alphbetical order quite easily and have used custom comparators for this. However, I am now trying to sort the map based on the key with the most values. The values are a list of objects I have created and can be thought of as this scenario.
There is an Atlas(like a catalog) that has lots of towns (the key of type string). That contains Shops(List). I want to sort this so that the town with the most shops is displayed first and goes in descending order with the secondary sorting being based on town alphabetically and return a string representing this.
I have used the Comparator interface with seperate classes for each one alphabetically and reverse alphabetically so far and wish to follow the same pattern for learning purposes However this has me completely stumped.
Example:
class Atlas {
Map<String, List<Shop> atlas = new HashMap<String, List<Shop>();
void addShop(Shop shop){
//if(Atlas already contains){
get the town and add the shop to it.
}
else{
add the town as the key and the shop as the value in the list
}
}
List<Shop> getAllShopsFromTheGivenTown(String givenTown){
//if(Atlas contains givenTown){
return the givenTown from the List.
}
else{
//Return an ArrayList emptyList
}
}
public String returnAllTownsAndShopsAlphbetically(){
String tmpString = "";
List<String> keys = new LinkedList<String>(atlas.keySet());
TownComparatorAtoZ tc = new TownComparatorAtoZ();
Collections.sort(keys, tc);
for(String town : keys){
List<Shop> shops = new LinkedList<Dealer>(atlas.get(town));
ShopComparatorAtoZ sc = new ShopComparatorAtoZ();
Collections.sort(shop, sc);
for(Shop shop : shops){
if(tmpString.isEmpty()){
tmpString = tmpString + town + ": " + shop.getName();
}
else if(tmpString.contains(town)){
tmpString = tmpString + ", " + shop.getName();
}
else{
tmpString = tmpString + " | " + town + ": " + shop.getName(); }
}
}
return tmpString;
}
}
As can be seen from above (although not the cleanest and most efficient) returns things alphabetically and will be reformatted into a string builder. However, I am wondering how I can use a comparator to achieve what I am after and if someone could provide a code snippet with an explanation of what it actually does I would be grateful as its more about understanding how to do it not just getting a copy and pasted lump of code but need to see if visually in code to understand it.
SO output I want to be something like
manchester: m&s, h&m, schuch | birmingham: game, body shop | liverpool: sports

You can try something like this:
public static Map<String, List<Shop>> mySortedMap(final Map<String, List<Shop>> orig)
{
final Comparator<String> c = new Comparator<String>()
{
#Override
public int compare(final String o1, final String o2)
{
// Compare the size of the lists. If they are the same, compare
// the keys themsevles.
final int sizeCompare = orig.get(o1).size() - orig.get(o2).size();
return sizeCompare != 0 ? sizeCompare : o1.compareTo(o2);
}
}
final Map<String, List<Shop>> ret = new TreeMap<String, List<Shop>>(c);
ret.putAll(orig);
return ret;
}
Explanation: TreeMap is the basic implementation of a SortedMap, and it can take a comparator of key values as an argument (if no comparator is passed as an argument, natural ordering of the keys prevails). Here we create an ad hoc comparator comparing the list sizes of the original map passed as an argument, and if the sizes are equal, it compares the keys themselves. Finally, we inject all elements from the origin map into it, and return it.

What if you try something like the following:
private static final Comparator<Map.Entry<String, List<Shop>>> CountThenAtoZ =
new Comparator<Map.Entry<String, List<Shop>>>() {
#Override
public int compare(Map.Entry<String, List<Shop>> x, Map.Entry<String, List<Shop>> y) {
// Compare shop count first. If equal, compare keys alphabetically.
int cmp = ((Integer)x.getValue().size()).compareTo(y.getValue().size());
return cmp != 0 ? cmp : x.getKey().compareTo(y.getKey());
}
};
...
public String returnAllTownsAndShopsAlphbetically() {
List<Map.Entry<String, List<Shop>>> entries = new ArrayList<>(atlas.entrySet());
Collections.sort(entries, CountThenAtoZ);
String result = "";
boolean firstTown = true;
for (Map.Entry<String, List<Shop>> entry : entries) {
if (!firstTown) result += " | "; else firstTown = false;
result += entry.getKey() + ": ";
boolean firstShop = true;
TreeSet<Shop> sortedShops = new TreeSet<>(new ShopComparatorAtoZ());
sortedShops.addAll(entry.getValue());
for (Shop shop : sortedShops) {
if (!firstShop) result += ", "; else firstShop = false;
result += shop.getName();
}
}
return result;
}
The way this works is to first create a list of the atlas entries in exactly the order we want. We need access to both the keys and their associated values to build the correct ordering, so sorting a List of Map.Entry instances is the most convenient.
We then walk the sorted list to build the resulting String, making sure to sort the shops alphabetically before adding them to the String.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find most occurring String within ArrayList - java

Related

How to Sort a list of strings and find the 1000 most common values in java

how to find the duplicates in ArrayList using hashmap in java?

Converting singly linked list to a map

Compare Lists of Pairs to find similars

How to Sort a Map<String, List<Object>> by the Key with the most values (that are not numeric) assigned to it

Categories

Resources