Sort Java Map output in ascending order [duplicate] - java

This question already has answers here:
Sort a Map<Key, Value> by values
(64 answers)
Closed 5 months ago.
I made this quick code for a class and I got everything to work fine as far as reading the text file and printing it out, but I can't figure out how to get it to print out in ascending order. The goal is to read a file and print out the number of times that word appears and sort it by the number of times it appears.
public class Analyser {
public static void main(String[] args) throws IOException {
File file = new File("src/txt.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
Map<String, Long> counts = new HashMap<>();
while ((line = br.readLine()) != null) {
String[] words = line.split("[\\s.;,?:!()\"]+");
for (String word : words) {
word = word.trim();
if (word.length() > 0) {
if (counts.containsKey(word)) {
counts.put(word, counts.get(word) + 1);
} else {
counts.put(word, 1L);
}
}
}
}
for (Map.Entry<String, Long> entry : counts.entrySet()) {
System.out.println(entry.getKey() + " : " + entry.getValue());
}
br.close();
}
}

A HashMap has no defined order. Maps which implement the SortedMap interface, such as TreeMap, do have an order defined by a Comparator, but they're ordered by keys, not values. And then there's LinkedHashMap, which maintains insertion order (or access order, if you want that instead). You could make use of LinkedHashMap if you make sure to insert the elements in the desired order. That, of course, means you still need to iterate your original map in the desired order somehow.
But I think, in your case, it does not matter so much that the map has the order you want. You only want order when you print the results. And you only do this once, so there's no need for some kind of caching of the order. So, I would probably just use an intermediate list or a stream.
Example using list:
List<Map.Entry<String, Long>> list = new ArrayList<>(counts.entrySet());
list.sort(Map.Entry.comparingByValue());
for (Map.Entry<String, Long> entry : list) {
System.out.println(entry.getKey() + " : " + entry.getValue());
}
Example using stream:
counts.entrySet()
.stream()
.sorted(Map.Entry.comparingByValue())
.forEachOrdered(entry -> System.out.println(entry.getKey() + " : " + entry.getValue());
The Map.Entry.comparingByValue() call returns a Comparator which is used to sort the elements. Specifically, it will sort by the natural order of the values of the entries. A Long's natural order is from smallest to largest. If you want a different order, then pass whatever Comparator you need to get that order (which may involve writing your own implementation). Though if you simply want to sort from largest to smallest (i.e., reversed natural order), just use Map.Entry.comparingByValue(Comparator.reverseOrder()) instead.
I also recommend, at least in "real" code, that you use try-with-resources instead of manually closing the reader. For example:
File file = new File("src/txt.txt");
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
// use 'br' here (the remainder of you code, minus the 'br.close()' call
}
When that try block exits, the resource referenced by br will be automatically closed for you, and in a safer way than simply calling br.close().

Since a TreeMap only sorts on the key and not the values, it won't work as you want to sort based on the count which is a value. Since you have already created your map, the easiest ways is to sort it and print as follows:
stream the entrySet() and then sort using the Entry's comparingByValue comparator.
counts.entrySet().stream()
.sorted(Entry.comparingByValue())
.forEach(System.out::println);
If you want to retain the sorted order in a different map, you can do it like this and then print the new map. Using a LinkedHashMap preserves the order.
sort as before
but collect in a map
first argument is the entry's key
second is the entry's value.
third is a BinaryOperator (Not used here but syntactically required)
fourth is a supplier which makes the map a LinkedHashMap
Map<String, Long> sortedMap =
counts.entrySet().stream().sorted(Entry.comparingByValue())
.collect(Collectors.toMap(Entry::getKey,
Entry::getValue, (a, b) -> a,
LinkedHashMap::new));
// now print the map
sortedMap.entrySet().forEach(System.out::println);

Related

Problem sorting ConcurrentHashMap by values using java.util.Collections.sort() in Java

I have this code which prints me a list of words sorted by keys (alphabetically) from counts, my ConcurrentHashMap which stores words as keys and their frequencies as values.
// Method to create a stopword list with the most frequent words from the lemmas key in the json file
private static List<String> StopWordsFile(ConcurrentHashMap<String, String> lemmas) {
// counts stores each word and its frequency
ConcurrentHashMap<String, Integer> counts = new ConcurrentHashMap<String, Integer>();
// corpus is an array list for all the individual words
ArrayList<String> corpus = new ArrayList<String>();
for (Entry<String, String> entry : lemmas.entrySet()) {
String line = entry.getValue().toLowerCase();
line = line.replaceAll("\\p{Punct}", " ");
line = line.replaceAll("\\d+"," ");
line = line.replaceAll("\\s+", " ");
line = line.trim();
String[] value = line.split(" ");
List<String> words = new ArrayList<String>(Arrays.asList(value));
corpus.addAll(words);
}
// count all the words in the corpus and store the words with each frequency i
//counts
for (String word : corpus) {
if (counts.keySet().contains(word)) {
counts.put(word, counts.get(word) + 1);
} else {counts.put(word, 1);}
}
// Create a list to store all the words with their frequency and sort it by values.
List<Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());
List<String> stopwordslist = new ArrayList<>(counts.keySet()); # this works but counts.values() gives an error
Collections.sort(stopwordslist);
System.out.println("List after sorting: " +stopwordslist);
So the output is:
List after sorting: [a, abruptly, absent, abstractmap, accept,...]
How can I sort them by values as well? when I use
List stopwordslist = new ArrayList<>(counts.values());
I get an error,
- Cannot infer type arguments for ArrayList<>
I guess that is because ArrayList can store < String > but not <String,Integer> and it gets confused.
I have also tried to do it with a custom Comparator like so:
Comparator<Entry<String, Integer>> valueComparator = new Comparator<Entry<String,Integer>>() {
#Override
public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2) {
String v1 = e1.getValue();
String v2 = e2.getValue();
return v1.compareTo(v2);
}
};
List<Entry<String, Integer>> stopwordslist = new ArrayList<Entry<String, Integer>>();
// sorting HashMap by values using comparator
Collections.sort(counts, valueComparator)
which gives me another error,
The method sort(List<T>, Comparator<? super T>) in the type Collections is not applicable for the arguments (ConcurrentHashMap<String,Integer>, Comparator<Map.Entry<String,Integer>>)
how can I sort my list by values?
my expected output is something like
[the, of, value, v, key, to, given, a, k, map, in, for, this, returns, if, is, super, null, specified, u, function, and, ...]
Let’s go through all the issues of your code
Name conventions. Method names should start with a lowercase letter.
Unnecessary use of ConcurrentHashMap. For a purely local use like within you method, an ordinary HashMap will do. For parameters, just use the Map interface, to allow the caller to use whatever Map implementation will fit.
Unnecessarily iterating over the entrySet(). When you’re only interested in the values, you don’t need to use entrySet() and call getValue() on every entry; you can iterate over values() in the first place. Likewise, you would use keySet() when you’re interested in the keys only. Only iterate over entrySet() when you need key and value (or want to perform updates).
Don’t replace pattern matches by spaces, to split by the spaces afterwards. Specify the (combined) pattern directly to split, i.e. line.split("[\\p{Punct}\\d\\s]+").
Don’t use List<String> words = new ArrayList<String>(Arrays.asList(value)); unless you specifically need the features of an ArrayList. Otherwise, just use List<String> words = Arrays.asList(value);
But when the only thing you’re doing with the list, is addAll to another collection, you can use Collections.addAll(corpus, value); without the List detour.
Don’t use counts.keySet().contains(word) as you can simply use counts.containsKey(word). But you can simplify the entire
if (counts.containsKey(word)) {
counts.put(word, counts.get(word) + 1);
} else {counts.put(word, 1);}
to
counts.merge(word, 1, Integer::sum);
The points above yield
ArrayList<String> corpus = new ArrayList<>();
for(String line: lemmas.values()) {
String[] value = line.toLowerCase().trim().split("[\\p{Punct}\\d\\s]+");
Collections.addAll(corpus, value);
}
for (String word : corpus) {
counts.merge(word, 1, Integer::sum);
}
But there is no point in performing two loops, the first only to store everything into a potentially large list, to iterate over it a single time. You can perform the second loop’s operation right in the first (resp. only) loop and get rid of the list.
for(String line: lemmas.values()) {
for(String word: line.toLowerCase().trim().split("[\\p{Punct}\\d\\s]+")) {
counts.merge(word, 1, Integer::sum);
}
}
You already acknowledged that you can’t sort a map, by copying the map into a list and sorting the list in your first variant. In the second variant, you created a List<Entry<String, Integer>> but then, you didn’t use it at all but rather tried to pass the map to sort. (By the way, since Java 8, you can invoke sort directly on a List, no need to call Collections.sort).
You have to keep copying the map data into a list and sorting the list. For example,
List<Map.Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());
list.sort(Map.Entry.comparingByValue());
Now, you have to decide whether you change the return type to List<Map.Entry<String, Integer>> or copy the keys of the sorted entries to a new list.
Taking all points together and staying with the original return type, the fixed code looks like
private static List<String> stopWordsFile(Map<String, String> lemmas) {
Map<String, Integer> counts = new HashMap<>();
for(String line: lemmas.values()) {
for(String word: line.toLowerCase().trim().split("[\\p{Punct}\\d\\s]+")) {
counts.merge(word, 1, Integer::sum);
}
}
List<Map.Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());
list.sort(Map.Entry.comparingByValue());
List<String> stopwordslist = new ArrayList<>();
for(Map.Entry<String, Integer> e: list) stopwordslist.add(e.getKey());
// System.out.println("List after sorting: " + stopwordslist);
return stopwordslist;
}

Print count of each word in a list in alphabetical order

I want to print the count of each word in an ArrayList in alphabetical order.
I have implemented the code but for an element "hihi" in the list, I want the count should be 2 instead of 1. How to achieve that?
{//some code
Collections.sort(list);
System.out.println("Words with the count");
Map<String, Long> st1=new TreeMap<>();
for(String k : list){
st1.put(k,st1.getOrDefault(k, 0L)+1);
}
for(String k : st1.keySet()){
System.out.println(k+": "+st1.get(k));
}
//end of class}
Order is not guaranteed in a HashSet by design.
From the docs:
It makes no guarantees as to the iteration order of the set;
If you're wanting to use a Set you need to use a set that guarantees order. LinkedHashSet guarantees insertion order.
Look at using TreeSet over HashSet. TreeSet will order the elements using their natural ordering. You can pass in a Comparator to the constructor of the TreeSet if you would like different sort order.
You should note that your current code is a little inefficient in that it has to iterate through the entire list for every single word to get the frequency count. You would be better off processing the list up front and counting the frequency and storing this in a map. For example:
Map<String, Long> frequency = new TreeMap<>();
for (String word : list) {
frequency.put(word, frequency.getOrDefault(word, 0L) + 1);
}
for (String word : frequency.keySet()) {
System.out.println(word + ": " + frequency.get(word));
}
You can also do this using streams as follows.
list.stream()
.collect(Collectors.groupingBy(Function.identity(), TreeMap::new, Collectors.counting()))
.forEach((k, v) -> System.out.println(k + ": " + v));

How to read strings off of .txt file and sort them into an ArrayList based on the number of occurrences?

I have a program that reads a .txt file, creates a HashMap containing each unique string and its number of occurrences, and I would like to create an ArrayList that displays these unique strings in descending order in terms of their number of appearances.
Currently, my program sorts in descending order from an alphabetical standpoint (using ASCII values I assume).
How can I sort this in descending order in terms of their number of appearances?
Here's the relevant part of the code:
Scanner in = new Scanner(new File("C:/Users/ahz9187/Desktop/counter.txt"));
while(in.hasNext()){
String string = in.next();
//makes sure unique strings are not repeated - adds a new unit if new, updates the count if repeated
if(map.containsKey(string)){
Integer count = (Integer)map.get(string);
map.put(string, new Integer(count.intValue()+1));
} else{
map.put(string, new Integer(1));
}
}
System.out.println(map);
//places units of map into an arrayList which is then sorted
//Using ArrayList because length does not need to be designated - can take in the units of HashMap 'map' regardless of length
ArrayList arraylist = new ArrayList(map.keySet());
Collections.sort(arraylist); //this method sorts in ascending order
//Outputs the list in reverse alphabetical (or descending) order, case sensitive
for(int i = arraylist.size()-1; i >= 0; i--){
String key = (String)arraylist.get(i);
Integer count = (Integer)map.get(key);
System.out.println(key + " --> " + count);
}
In Java 8:
public static void main(final String[] args) throws IOException {
final Path path = Paths.get("C:", "Users", "ahz9187", "Desktop", "counter.txt");
try (final Stream<String> lines = Files.lines(path)) {
final Map<String, Integer> count = lines.
collect(HashMap::new, (m, v) -> m.merge(v, 1, Integer::sum), Map::putAll);
final List<String> ordered = count.entrySet().stream().
sorted((l, r) -> Integer.compare(l.getValue(), r.getValue())).
map(Entry::getKey).
collect(Collectors.toList());
ordered.forEach(System.out::println);
}
}
First read the file using the Files.lines method which gives your a Stream<String> of the lines.
Now collect the lines into a Map<String, Integer> using the Map.merge method which takes a key and a value and also a lambda that is applied to the old value and the new value if the key is already present.
You now have your counts.
Now take a Stream of the entrySet of the Map and sort that by the value of each Entry and then take the key. Collect that to a List. You now have a List of your values sorted by count.
Now simply use forEach to print them.
If still using Java 7 you can use the Map to provide the sort order:
final Map<String, Integer> counts = /*from somewhere*/
final List<String> sorted = new ArrayList<>(counts.keySet());
Collections.sort(sorted, new Comparator<String>() {
#Override
public int compare(final String o1, final String o2) {
return counts.get(o1).compareTo(counts.get(o2));
}
});
You haven't shown the declaration of your map, but for the purpose of this answer I'm assuming that your map is declared like this:
Map<String,Integer> map = new HashMap<String,Integer>();
You need to use a Comparator in the call to sort, but it needs to compare by the count, while remembering the string. So you need to put objects in the list that have both the string and the count.
One type that provides this capability, and that is easily available from the Map.entrySet method, is the type Map.Entry.
The last part rewritten with Map.Entry and a Comparator:
ArrayList<Map.Entry<String,Integer>> arraylist = new ArrayList<Map.Entry<String,Integer>>(map.entrySet());
Collections.sort(arraylist, new Comparator<Map.Entry<String,Integer>>() {
#Override
public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2) {
// Compares by count in descending order
return e2.getValue() - e1.getValue();
}
});
// Outputs the list in reverse alphabetical (or descending) order, case sensitive
for (Map.Entry<String,Integer> entry : arraylist) {
System.out.println(entry.getKey() + " --> " + entry.getValue());
}

Why is the remove function not working for hashmaps?

I am working on a simple project that obtains data from an input file, gets what it needs and prints it to a file. I am basically getting word frequency so each key is a string and the value is its frequency in the document. The problem however, is that I need to print out these values to a file in descending order of frequency. After making my hashmap, this is the part of my program that sorts it and writes it to a file.
//Hashmap I create
Map<String, Integer> map = new ConcurrentHashMap<String, Integer>();
int valueMax = -1;
//function to sort hashmap
while (map.isEmpty() == false){
for (Entry<String, Integer> entry: map.entrySet()){
if (entry.getValue() > valueMax){
max = entry.getKey();
System.out.println("max: " + max);
valueMax = entry.getValue();
System.out.println("value: " + valueMax);
}
}
map.remove(max);
out.write(max + "\t" + valueMax + "\n");
System.out.println(max + "\t" + valueMax);
}
When I run this i get:
t 9
t 9
t 9
t 9
t 9
....
so it appears the remove function is not working as it keeps getting the same value. I'm thinking i have an issue with a scope rule or I just don't understand hashmaps very well.
If anyone knows of a better way to sort a hashmap and print it, I would welcome a suggestion.
thanks
Your code doesn't work because on every subsequent iteration, entry.getValue() > valueMax is never true because you don't reset valueMax on re-entry into the while loop.
You don't need to muck around with double-looping over a concurrently accessible map though.
ConcurrentSkipListMap has a lastKey method that returns the greatest key and doesn't require iteration over the entire map.
From your code it looks like you aren't resetting valueMax at the end of your loop. This means the first time round the loop you'll find the maximum but you'll never find any subsequent values because you'll still be comparing to the overall maximum.
Hashmap: no order. You can use ArrayList, which implements List to have an order.
Take a look : http://docs.oracle.com/javase/7/docs/api/java/util/ArrayList.html
I guess it's because there no entries in map with key > initial value of valueMax, so condition if (entry.getValue() > valueMax) never true.
Additionally, there's TreeMap which holds it's contents sorted, so you can just iterate over it's entrySet() w/o any additional logic.
What about something like (not tested)
final Map<String, Integer> map = new ConcurrentHashMap<String, Integer>();
final Comparator <String, String> comparator = new Comparator ()
{
compare(String o1, String o2)
{
return map.get(o1) - map.get(o2);
}
};
final TreeMap <String, Integer> sortedMap = new TreeMap (comparator);
sortedMap.addAll(map);
System.out.println(sortedMap);

Sorted map not outputting sorted. Do I understand maps.

I have been reading up on maps and understand some of the differences in tree maps and hash, sorted maps. I was trying to get a map to be sorted when outputting it.
What I needed to be able to do was:
Take a text file and read in the content.
Break it into separate words. Use the words as the key and the value as how many times the key occurs in the txt file.
If the word is at the end of a sentence I am to make it a separate key. E.g., my and my. are two separate keys.
My problem is that no matter if I declare it as a tree, hash or sorted map, I can't get it to output/iterate through in an ordered way. I wanted it to output with the highest occurring value first, but I can't even get it to output with the key in any order.
public static Map<String, Integer> createDictionary(String _filename)
{
TreeMap<String, Integer> dictionary = new TreeMap<String, Integer>(); // Changed Hash to _______
try {
FileReader myFileReader=new FileReader(_filename); // File reader stream open
BufferedReader myBuffReader=new BufferedReader(myFileReader);
String str = "\0";
while (str != null) { // While there are still strings in the file
str = myBuffReader.readLine(); // We read a line into the str variable
if (str != null) { // Make sure its not the last line/EOF
// System.out.println(str); // Used for testing.
StringTokenizer myTokenStr=new StringTokenizer(str," \t"); // Create a StringToken obj from the string
while (myTokenStr.hasMoreTokens()) {
String tokStr = myTokenStr.nextToken(); // Each token is put into an individual string
// System.out.println(tokStr);
if (dictionary.containsKey(tokStr)) {
int value = dictionary.get(tokStr); // Add one to the integer value
// dictionary.remove(tokStr); // Was doing this way but just using put method works
// dictionary.put(tokStr, value + 1);
dictionary.put(tokStr, value + 1);
}
else {
dictionary.put(tokStr, 1); // Add the string as the key with an int value of one for the value
}
}
}
}
myBuffReader.close(); // Close stream
myFileReader.close(); // Close stream
}
catch (FileNotFoundException e) {
System.out.println("File Not Found");
}
catch (IOException e) { }
// System.out.println(dictionary.entrySet());
return dictionary;
}
Your map is sorted alphabetically, not by number of occurrences. You need to postprocess the map after the initial parsing. I would suggest:
Parse file into HashMap<String, Integer>
Iterate through HashMap, and add elements to a TreeMap<Integer, Set<String> > (see below).
Output the TreeMap.
You can achieve step 2. by something like:
TreeMap<Integer, Set<String> > treeMap = new TreeMap<Integer, Set<String> > ();
for (Map.Entry<String, Integer> entry: hashMap) {
Set<String> set = treeMap.get(entry.value());
if (set == null) {
set = new TreeSet<String>();
treeMap.put(entry.value(), set);
}
set.add(entry.key());
}
Using TreeSet here sorts the words with same number of occurrences alphabetically, you could use any other Set or List though.
For descending order in step 3.:
for (Map.Entry<Integer, Set<String> > entry: treeMap.descendingMap())
for (String word: entry.getValue())
System.out.println(String.format("%d: %s", entry.getKey(), word));
That should do it.
This is the documentation for TreeMap, lifted from its Javadoc:
public class TreeMap extends AbstractMap
implements NavigableMap, Cloneable, Serializable
A Red-Black tree based NavigableMap implementation. The map is sorted according
to the natural ordering of its keys, or by a Comparator provided at map creation
time, depending on which constructor is used.
In your case, the keys would be strings, and you should expect that iteration will reveal the map to be sorted according to their 'natural order'. Here's an example of the output generated by a TreeMap consisting of String keys and Integer values:
Map<String, Integer> map = new TreeMap<String, Integer>();
map.put("Hello", Integer.valueOf(8));
map.put("Abraham", Integer.valueOf(81));
map.put("Smell", Integer.valueOf(-1));
map.put("Carpet", Integer.valueOf(4));
map.put("Sex", Integer.valueOf(23));
for(String key: map.keySet()) {
System.out.printf("Map entry %s: %d\n", key, map.get(key));
}
Output:
Map entry Abraham: 81
Map entry Carpet: 4
Map entry Hello: 8
Map entry Sex: 23
Map entry Smell: -1
As you can see, iterating over the map's keys produces as ordered result. This order is defined by the natural order of String. Unfortunately, you cannot implement a SortedMap that sorts on values, which is what I believe you want to do. You can however, sort the entries in the Map outside of it. See more details in this other SO post: TreeMap sort by value.
Map is a kind of messy abstraction for this sort of thing, but I'm going to throw out Guava's Multiset as a way to address this use case, as it's expressly designed for "counting occurrences of things."
In particular,
return Multisets.copyHighestCountFirst(HashMultiset.copyOf(listOfWords));
returns a Multiset that iterates over elements in order of descending frequency in listOfWords.
There are many questions on SO, by the way, relating to ordering maps by values instead of keys, but I prefer this solution.

Categories