Can you please give me advice on how to print word occurrences from the most frequent value to the least frequent?
I've tried different methods, so I stopped on the Map, it gives me a much closer result.
public class InputOutput {
private String wordsFrequency() {
StringBuilder result = new StringBuilder();
try {
Map<String, Integer> map = new HashMap<>();
BufferedReader reader = new BufferedReader(new FileReader("words.txt"));
String words;
while ((words = reader.readLine()) != null) {
Scanner scan = new Scanner(words);
while (scan.hasNext()) {
String word = scan.next();
if (map.containsKey(word))
map.put(word, map.get(word) + 1);
else
map.put(word, 1);
}
scan.close();
}
reader.close();
Set<Entry<String, Integer>> entrySet = map.entrySet();
for (Entry<String, Integer> entry : entrySet) {
result.append(entry.getKey()).append("\t").append(entry.getValue()).append("\n");
}
} catch (IOException e) {
e.printStackTrace();
}
return result.toString();
}
public static void main(String[] args) {
InputOutput requestedData = new InputOutput();
System.out.println(requestedData.wordsFrequency());
}
}
File contents:
the day is sunny the the
the sunny is is is is is is
Expected output:
is 7
the 4
sunny 2
day 1
The output I'm getting:
the 4
is 7
sunny 2
day 1
List<Map.Entry<String, Integer>> frequencies = new ArrayList<>(map.entrySet());
frequencies.sort(Comparator.comparing(e -> e.getValue()).reversed());
A List may be sorted, or a TreeSet can be sorted (SortedSet) using a Comparator. Here with a function returning a Comparable value.
I'm sure there's a cleaner way to do it, but without using streams here's what I came up with:
String src = "the day is sunny the the the sunny is is is is is is";
try (Scanner scanner = new Scanner(new StringReader(src))) {
Map<String, Integer> map = new HashMap<>();
while (scanner.hasNext()) {
String word = scanner.next();
map.merge(word, 1, (a, b) -> a + 1);
}
Map<Integer, Collection<String>> cntMap = new TreeMap<>(Comparators.reverseOrder());
for (Entry<String, Integer> entry : map.entrySet()) {
Collection<String> list = cntMap.get(entry.getValue());
if (list == null) {
list = new TreeSet<>();
cntMap.put(entry.getValue(), list);
}
list.add(entry.getKey());
}
for (Entry<Integer, Collection<String>> entry : cntMap.entrySet()) {
System.out.println(entry.getValue() + " : " + entry.getKey());
}
}
You already have your data, here is how to get them in reverse sorted order.
declare a SortedSet using the comparator to compare the the values of the entries
then add the entries to the SortedSet and the will be sorted as they are entered.
Entry.comparingByValue(Comparator.reversed()) is used to sort on the count only and in reversed order.
SortedSet<Entry<String,Integer>> set
= new TreeSet(Entry.comparingByValue(Comparator.reverseOrder()));
set.addAll(map.entrySet());
Then print them.
set.forEach(e-> System.out.printf("%-7s : %d%n", e.getKey(), e.getValue()));
For your data, this would print
is : 7
the : 4
sunny : 2
day : 1
The issues with the code you've provided:
In case of the exception, the stream would not be closed. More over ever if all the data would be successfully read from a file, but exception occur during closing the reader you'll the data because lines of code that are responsible for processing the map will not be executed. Use try with resources to ensure that your resources would be properly closed.
Don't cram too much logic into one method. There are at least two responsibilities, and they should reside in separate methods, as the Single responsibility principle suggests.
Instead of utilizing Scanner you can split the line that has been read from a file.
And your current logic lucks sorting. That's why your current and expected output don't match.
You can generate a map Map<String, Integer> representing the frequency of each word.
Then create a list of entries of this map, sort it based on values in descending order.
And finally turn the sorted list of entries into a list of strings which you can print.
private static Map<String, Integer> wordsFrequency(String file) {
Map<String, Integer> frequencies = new HashMap<>();
try (var reader = Files.newBufferedReader(Path.of(file))) {
String[] words = reader.readLine().split(" ");
for (String word : words) {
// frequencies.merge(word, 1, Integer::sum); // an equivalent of the 2 lines below
int count = frequencies.getOrDefault(word, 0);
frequencies.put(word, count + 1);
}
} catch (IOException e) {
e.printStackTrace();
}
return frequencies;
}
public static List<String> mapToSortedList(Map<String, Integer> map) {
List<Map.Entry<String, Integer>> entries = new ArrayList<>(map.entrySet());
// sorting the list of entries
entries.sort(Map.Entry.<String, Integer>comparingByValue().reversed());
List<String> result = new ArrayList<>();
for (Map.Entry<String, Integer> entry :entries) {
result.add(entry.getKey() + " " + entry.getValue());
}
return result;
}
public static void main(String[] args) {
mapToSortedList(wordsFrequency("filePath.txt")).forEach(System.out::println);
}
Related
How can I count the same Strings from an array and write them out in the console?
The order of the items should correspond to the order of the first appearance of the item. If there are are two or more items of a kind, add an "s" to the item name.
String[] array = {"Apple","Banana","Apple","Peanut","Banana","Orange","Apple","Peanut"};
Output:
3 Apples
2 Bananas
2 Peanuts
1 Orange
I tried this:
String[] input = new String[1000];
Scanner sIn = new Scanner(System.in);
int counter =0;
String inputString = "start";
while(inputString.equals("stop")==false){
inputString = sIn.nextLine();
input[counter]=inputString;
counter++;
}
List<String> asList = Arrays.asList(input);
Map<String, Integer> map = new HashMap<String, Integer>();
for (String s : input) {
map.put(s, Collections.frequency(asList, s));
}
System.out.println(map);
But I don't know how to get the elements out of the Map and sort them like I would like.
You can use a Map to put your result, here is a simple example:
public static void main(String args[]){
String[] array = {"Apple","Banana","Apple","Peanut","Banana","Orange","Apple","Peanut"};
Map<String, Integer> result = new HashMap<>();
for(String s : array){
if(result.containsKey(s)){
//if the map contain this key then just increment your count
result.put(s, result.get(s)+1);
}else{
//else just create a new node with 1
result.put(s, 1);
}
}
System.out.println(result);
}
Use Java streams groupingBy and collect the results into a Map<String, Long> as shown below:
String[] array = {"Apple","Banana","Apple","Peanut","Banana","Orange","Apple", "Peanut"};
Map<String, Long> map = Stream.of(array).collect(Collectors.
groupingBy(Function.identity(), //use groupingBy array element
Collectors.counting())); //count number of occurances
System.out.println(map);//output the results of the Map
Java 8 would allow a pretty elegant way of doing this with groupingBy and counting. Using a LinkedHashMap instead of the default map should handle the ordering:
Arrays.stream(array)
.collect(Collectors.groupingBy(Function.identity(),
LinkedHashMap::new,
Collectors.counting()))
.entrySet()
.forEach(e -> System.out.println(e.getValue() +
"\t" +
e.getKey() +
(e.getValue() > 1 ? "s" : "")));
use java 8
Map<String, Long> myMap = Stream.of(array).collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
I have finish my code to find top 20 words after search many times , but it is not in descending word.
I need to add the code to Sort the list by frequency in a descending order, if two words have the same number account:
{ 'cat' => 43, 'c' => 43 }
the output should be
c
cat
My code is :
public static void main(String[] args) throws IOException{
String delimiters = ".;_?>*/";
String[] result = new String[20];
List<String> listArray = new ArrayList<String>();
Map<String, Integer> map = new HashMap<String, Integer>();
FileReader fileR = new FileReader("D:/test.txt");
BufferedReader bufferedR = new BufferedReader(in);
String line;
while ((line = bufferedR.readLine()) != null) {
StringTokenizer sToken = new StringTokenizer(line, delimiters);
while (sToken.hasMoreTokens()) {
String token = sToken.nextToken().trim().toLowerCase();
if (map.containsKey(token)) {
int val = map.get(token);
val++;
map.put(token, val);
} else{
map.put(token, 1);
}
}
}
bufferedR.close();
for(int i=0;i<result.length;i++){
int mValu=0;
String wKey="";
for(Map.Entry<String,Integer> entry:map.entrySet()){
if(entry.getValue()>mValu){
mValue=entry.getValue();
wKey=entry.getKey();
}
}
map.remove(wKey);
result[i]=wKey;
}
for (int i = 0 ; i<result.length;i++){
System.out.println(result[i]);
}
}
}
When i research about this subject i found this code but don't know how fit it into my code:
List<Map.Entry<String, Integer>> entries = new `ArrayList`<Map.Entry<String, Integer>>(map.entrySet());
Collections.sort(entries, new Comparator<Map.Entry<String, Integer>>() {
public int compare(Map.Entry<String, Integer> a, Map.Entry<String, Integer> b) {
return Integer.compare(b.getValue(), a.getValue());
}
});
Or have a better idea how to get frequency in a descending order?!
Thanks for help.
You can do:
import java.util.Comparator.*;
import java.util.stream.Collectors.*;
Map<String, Integer> map = // ...
List<String> ss = map.entrySet().stream()
.sorted(comparing(e -> e.getValue())
.reversed()
.thenComparing(e -> e.getKey()))
.map(e -> e.getKey())
.collect(toList());
What you need to do is sort your map. From what I understand is that you want the top 20 values, and if two values have the same token, then it should be in lexicographical order.
My solution would be to sort your map first by key (token in your case) and then sort by values.
This way the sorted order of the tokens will stay intact, and the map order will be in the way you want the output to be in.
caution
Make sure that the sorting algorithm you use is an in place sorting algo like Quicksort, otherwise the above solution won't work.
This is what I have tried and somehow I get the feeling that this is not right or this is not the best performing application, so is there a better way to do the searching and fetching the duplicate values from a Map or as a matter of fact any collection. And a better way to traverse through a collection.
public class SearchDuplicates{
public static void main(String[] args) {
Map<Integer, String> directory=new HashMap<Integer, String>();
Map<Integer, String> repeatedEntries=new HashMap<Integer, String>();
// adding data
directory.put(1,"john");
directory.put(2,"michael");
directory.put(3,"mike");
directory.put(4,"anna");
directory.put(5,"julie");
directory.put(6,"simon");
directory.put(7,"tim");
directory.put(8,"ashley");
directory.put(9,"john");
directory.put(10,"michael");
directory.put(11,"mike");
directory.put(12,"anna");
directory.put(13,"julie");
directory.put(14,"simon");
directory.put(15,"tim");
directory.put(16,"ashley");
for(int i=1;i<=directory.size();i++) {
String result=directory.get(i);
for(int j=1;j<=directory.size();j++) {
if(j!=i && result==directory.get(j) &&j<i) {
repeatedEntries.put(j, result);
}
}
System.out.println(result);
}
for(Entry<Integer, String> entry : repeatedEntries.entrySet()) {
System.out.println("repeated "+entry.getValue());
}
}
}
Any help would be appreciated. Thanks in advance
You can use a Set to determine whether entries are duplicate. Also, repeatedEntries might as well be a Set, since the keys are meaningless:
Map<Integer, String> directory=new HashMap<Integer, String>();
Set<String> repeatedEntries=new HashSet<String>();
Set<String> seen = new HashSet<String>();
// ... initialize directory, then:
for(int j=1;j<=directory.size();j++){
String val = directory.get(j);
if (!seen.add(val)) {
// if add failed, then val was already seen
repeatedEntries.add(val);
}
}
At the cost of extra memory, this does the job in linear time (instead of quadratic time of your current algorithm).
EDIT: Here's a version of the loop that doesn't rely on the keys being consecutive integers starting at 1:
for (String val : directory.values()) {
if (!seen.add(val)) {
// if add failed, then val was already seen
repeatedEntries.add(val);
}
}
That will detect duplicate values for any Map, regardless of the keys.
You can use this to found word count
Map<String, Integer> repeatedEntries = new HashMap<String, Integer>();
for (String w : directory.values()) {
Integer n = repeatedEntries.get(w);
n = (n == null) ? 1 : ++n;
repeatedEntries.put(w, n);
}
and this to print the stats
for (Entry<String, Integer> e : repeatedEntries.entrySet()) {
System.out.println(e);
}
List, Vector have a method contains(Object o) which return Boolean value based either this object is exist in collection or not.
You can use Collection.frequency to find all possible duplicates in any collection using
Collections.frequency(list, "a")
Here is a proper example
Most generic method to find
Set<String> uniqueSet = new HashSet<String>(list);
for (String temp : uniqueSet) {
System.out.println(temp + ": " + Collections.frequency(list, temp));
}
References from above link itself
I have file which has String in the form key/value pair like people and count, example would be
"Reggy, 15"
"Jenny, 20"
"Reggy, 4"
"Jenny, 5"
and in the output I should have summed up all count values based on key so for our example output would be
"Reggy, 19"
"Jenny, 25"
Here is my approach:
Read each line and for each line get key and count using scanner and having , as delimiter
Now see if key is already present before if then just add currentValues to previousValues if not then take currentValue as value of HashMap.
Sample Implementation:
public static void main(final String[] argv) {
final File file = new File("C:\\Users\\rachel\\Desktop\\keyCount.txt");
try {
final Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
if (scanner.hasNext(".*,")) {
String key;
final String value;
key = scanner.next(".*,").trim();
if (!(scanner.hasNext())) {
// pick a better exception to throw
throw new Error("Missing value for key: " + key);
}
key = key.substring(0, key.length() - 1);
value = scanner.next();
System.out.println("key = " + key + " value = " + value);
}
}
} catch (final FileNotFoundException ex) {
ex.printStackTrace();
}
}
Part I am not clear about is how to divide key/value pair while reading them in and creating HashMap based on that.
Also is the approach am suggestion an optimal one or is there a way to enhance the performance more.
Since this is almost certainly a learning exercise, I'll stay away from writing code, letting you have all the fun.
Create a HashMap<String,Integer>. Every time that you see a key/value pair, check if the hash map has a value for the key (use 'containsKey(key)'). If it does, get that old value using get(key), add the new value, and store the result back using put(key, newValue). If the key is not there yet, add a new one - again, using put. Don't forget to make an int out if the String value (use Integer.valueOf(value) for that).
As far as optimizing goes, any optimization at this point would be premature: it does not even work! However, it's hard to get much faster than a single loop that you have, which is also rather straightforward.
Try this:
Map<String, Long> map = new HashMap<String, Long>();
while (scanner.hasNextLine()) {
if (scanner.hasNext(".*,")) {
....
if(map.containsKey(key))
map.put(key, map.get(key) + Long.valueOf(value));
else
map.put(key, Long.valueOf(value));
}
}
Simplest way I can think about splitting the values:
BufferedReader reader = new BufferedReader(new FileReader(file));
Map<String, Integer> mapping = new HashMap<String,Integer>();
String currentLine;
while ((currentLine = reader.readLine()) != null) {
String[] pair = currentLine.split(",");
if(pair.length != 2){ //could be less strict
throw new DataFormatException();
}
key = pair[0];
value = Integer.parseInt(pair[1]);
if(map.contains(key)){
value += map.get(key);
}
map.put(key,value);
}
It is most likely not the most efficient way in terms of performance, but is pretty straightforward. Scanner is usually used for parsing, but the parsing here doesn't look as complex, is just a split of strings.
For reading in, personally, I'd use:
Scanner.nextLine(), String.split(","), and Integer.valueOf(value)
Kind of late but clean solution with time complexity of O(n). This solution bypasses sort of arrays
public class Solution {
public static void main(String[] args) {
// Anagram
String str1 = "School master";
String str2 = "The classroom";
char strChar1[] = str1.replaceAll("[\\s]", "").toLowerCase().toCharArray();
char strChar2[] = str2.replaceAll("[\\s]", "").toLowerCase().toCharArray();
HashMap<Character, Integer> map = new HashMap<Character, Integer>();
for (char c : strChar1) {
if(map.containsKey(c)){
int value=map.get(c)+1;
map.put(c, value);
}else{
map.put(c, 1);
}
}
for (char c : strChar2) {
if(map.containsKey(c)){
int value=map.get(c)-1;
map.put(c, value);
}else{
map.put(c, 1);
}
}
for (char c : map.keySet()) {
if (map.get(c) != 0) {
System.out.println("Not anagram");
}
}
System.out.println("Is anagram");
}
}
public Map<String, Integer> mergeMaps(#NonNull final Map<String, Integer> mapOne,
#NonNull final Map<String, Integer> mapTwo) {
return Stream.of(mapOne.entrySet(), mapTwo.entrySet())
.flatMap(Collection::stream)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, Integer::sum));
}
I want to retrieve k,v-pairs from a HashMap.
The entrys are like this:
a = 3,4
b = 5,6
and so on. I need combinations of these values.
a=3, b=5
a=3, b=6
a=4, b=5
a=4, b=6
I don't know how many keys and how many entrys the values have. With entrySet I can get the values but not combinations. It looks like recursion but how?
Here's my code:
HashMap<String, String[]> map = new HashMap<String, String[]>();
BufferedReader file = new BufferedReader(new FileReader("test.txt"));
String str;
while ((str = file.readLine()) != null) {
// ... logic
map.put(key, value);
}
System.out.println("number of keys: " + map.size());
for (Map.Entry<String, String[]> entry : map.entrySet()) {
for (String value : entry.getValue()) {
System.out.println(entry.getKey() + ": " + value);
}
}
file.close();
You can try the following code:
public void mapPermute(Map<String, String[]> map, String currentPermutation) {
String key = map.keySet().iterator().next(); // get the topmost key
// base case
if (map.size() == 1) {
for (String value : map.get(key)) {
System.out.println(currentPermutation + key + "=" + value);
}
} else {
// recursive case
Map<String, String[]> subMap = new HashMap<String, String[]>(map);
for (String value : subMap.remove(key)) {
mapPermute(subMap, currentPermutation + key + "=" + value + ", ");
}
}
}
No guarantees on memory efficiency or speed. If you want to preserve the order of the keys in the map, you will have to pass in a TreeMap and change the code to use a TreeMap under the recursive case.
As the base case suggests, I'm assuming you have at least one entry in your map.
You can obtain a Cartesian product of map key-value combinations using a map and reduce approach.
Try it online!
Map<String, String[]> map = Map.of(
"a", new String[]{"3", "4"},
"b", new String[]{"5", "6"});
List<Map<String, String>> comb = map.entrySet().stream()
// Stream<List<Map<String,String>>>
.map(e -> Arrays.stream(e.getValue())
.map(v -> Map.of(e.getKey(), v))
.collect(Collectors.toList()))
// summation of pairs of list into a single list
.reduce((list1, list2) -> list1.stream()
// combinations of inner maps
.flatMap(map1 -> list2.stream()
// concatenate into a single map
.map(map2 -> {
Map<String, String> m = new HashMap<>();
m.putAll(map1);
m.putAll(map2);
return m;
}))
// list of combinations
.collect(Collectors.toList()))
// otherwise, an empty list
.orElse(Collections.emptyList());
// output, order may vary
comb.forEach(System.out::println);
Output, order may vary:
{a=3, b=5}
{a=3, b=6}
{a=4, b=5}
{a=4, b=6}
See also: Cartesian product of map values
It looks to me like you really want a MultiMap. In particular, ArrayListMultimap allows duplicate entries:
ArrayListMultimap<String, String> map = ArrayListMultimap.create();
for each line in file:
parse key k
for each value in line:
parse value v
map.put(k, v);
for (Map.Entry<String, String> entry : map.entries()) {
String key = entry.getKey();
String value = entry.getValue();
}
If you want a cartesian product of maps, you could compute that directly using recursion, or you could iterate over the maps: create a list of iterators and iterate odometer-style; when iterator N reaches its end, advance iterator N+1 and reset iterators 1..N.
Just poked around and found this SO question.
So I'd recommend you use guava's Sets.cartesianProduct for the cartesian product. Here's my poking around code, which you could adapt to your input logic:
String key1 = "a";
Set<Integer> values1 = Sets.newLinkedHashSet(Arrays.asList(1, 2, 3, 4));
String key2 = "b";
Set<Integer> values2 = Sets.newLinkedHashSet(Arrays.asList(5, 6, 7));
String key3 = "c";
Set<Integer> values3 = Sets.newLinkedHashSet(Arrays.asList(8, 9));
List<String> keys = Arrays.asList(key1, key2, key3);
Set<List<Integer>> product = Sets.cartesianProduct(values1, values2, values3);
for (List<Integer> values : product) {
for (int i = 0; i < keys.size(); ++i) {
String key = keys.get(i);
int value = values.get(i);
System.out.print(key + "=" + value + "; ");
}
System.out.println();
}