Java 8 Stream to determine a maximum count in a text file - java

For my assignment I have to replace for loops with streams that count the frequency of words in a text document, and I am having trouble figuring the TODO part out.
String filename = "SophieSallyJack.txt";
if (args.length == 1) {
filename = args[0];
}
Map<String, Integer> wordFrequency = new TreeMap<>();
List<String> incoming = Utilities.readAFile(filename);
wordFrequency = incoming.stream()
.map(String::toLowerCase)
.filter(word -> !word.trim().isEmpty())
.collect(Collectors.toMap(word -> word, word -> 1, (a, b) -> a + b, TreeMap::new));
int maxCnt = 0;
// TODO add a single statement that uses streams to determine maxCnt
for (String word : incoming) {
Integer cnt = wordFrequency.get(word);
if (cnt != null) {
if (cnt > maxCnt) {
maxCnt = cnt;
}
}
}
System.out.print("Words that appear " + maxCnt + " times:");
I have tried this:
wordFrequency = incoming.parallelStream().
collect(Collectors.toConcurrentMap(w -> w, w -> 1, Integer::sum));
But that is not right and I'm not sure how to incorporate maxCnt into the stream.

Assuming you have all the words extracted from a file in a List<String> this word count for each word can be computed using this approach,
Map<String, Long> wordToCountMap = words.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
The most freequent word can then be computed using the above map like so,
Entry<String, Long> mostFreequentWord = wordToCountMap.entrySet().stream()
.max(Map.Entry.comparingByValue())
.orElse(new AbstractMap.SimpleEntry<>("Invalid", 0l));
You may change the above two pipelines together if you wish like this,
Entry<String, Long> mostFreequentWord = words.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.max(Map.Entry.comparingByValue())
.orElse(new AbstractMap.SimpleEntry<>("Invalid", 0l));
Update
As per the following discussion it is always good to return an Optional from your computation like so,
Optional<Entry<String, Long>> mostFreequentWord = words.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.max(Map.Entry.comparingByValue());

Well, you have done almost everything you needed with that TreeMap, but it seems you don't know that it has a method called lastEntry and that is the only one you need to call after you computed wordFrequency to get the word with the highest frequency.
The only problem is that this is not very optimal, since TreeMap sorts the data on each insert and you don't need sorted data, you need the max. Sorting in case of TreeMap is O(nlogn), while inserting into a HashMap is O(n).
So instead of using that TreeMap, all you need to change is to a HashMap:
wordFrequency = incoming.stream()
.map(String::toLowerCase)
.filter(word -> !word.trim().isEmpty())
.collect(Collectors.toMap(
Function.identity(),
word -> 1,
(a, b) -> a + b,
HashMap::new));
Once you have this Map, you need to find max - this operation is O(n) in general and could be achieved with a stream or without one:
Collections.max(wordFrequency.entrySet(), Map.Entry.comparingByValue())
This approach with give you O(n) for HashMap insert, and O(n) for finding the max - thus O(n) in general, so it's faster than TreeMap

Ok, first of all, your wordFrequency line can make use of Collectors#groupingBy and Collectors#counting instead of writing your own accumulator:
List<String> incoming = Arrays.asList("monkey", "dog", "MONKEY", "DOG", "giraffe", "giraffe", "giraffe", "Monkey");
wordFrequency = incoming.stream()
.filter(word -> !word.trim().isEmpty()) // filter first, so we don't lowercase empty strings
.map(String::toLowerCase)
.collect(Collectors.groupingBy(s -> s, Collectors.counting()));
Now that we got that out of the way... Your TODO line says use streams to determine maxCnt. You can do that easily by using max with naturalOrder:
int maxCnt = wordFrequency.values()
.stream()
.max(Comparator.naturalOrder())
.orElse(0L)
.intValue();
However, your comments make me think that what you actually want is a one-liner to print the most frequent words (all of them), i.e. the words that have maxCnt as value in wordFrequency. So what we need is to "reverse" the map, grouping the words by count, and then pick the entry with highest count:
wordFrequency.entrySet().stream() // {monkey=3, dog=2, giraffe=3}
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toList()))).entrySet().stream() // reverse map: {3=[monkey, giraffe], 2=[dog]}
.max(Comparator.comparingLong(Map.Entry::getKey)) // maxCnt and all words with it: 3=[monkey, giraffe]
.ifPresent(e -> {
System.out.println("Words that appear " + e.getKey() + " times: " + e.getValue());
});
This solution prints all the words with maxCnt, instead of just one:
Words that appear 3 times: [monkey, giraffe].
Of course, you can concatenate the statements to get one big do-it-all statement, like this:
incoming.stream() // [monkey, dog, MONKEY, DOG, giraffe, giraffe, giraffe, Monkey]
.filter(word -> !word.trim().isEmpty()) // filter first, so we don't lowercase empty strings
.map(String::toLowerCase)
.collect(groupingBy(s -> s, counting())).entrySet().stream() // {monkey=3, dog=2, giraffe=3}
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toList()))).entrySet().stream() // reverse map: {3=[monkey, giraffe], 2=[dog]}
.max(Comparator.comparingLong(Map.Entry::getKey)) // maxCnt and all words with it: 3=[monkey, giraffe]
.ifPresent(e -> {
System.out.println("Words that appear " + e.getKey() + " times: " + e.getValue());
});
But now we're stretching the meaning of "one statement" :)

By piecing together information I was able to successfully replace the for loop with
int maxCnt = wordFrequency.values().stream().max(Comparator.naturalOrder()).get();
System.out.print("Words that appear " + maxCnt + " times:");
I appreciate all the help.

Related

Streams use for map computation from list with counter

I have the following for loop which I would like to replace by a simple Java 8 stream statement:
List<String> words = new ArrayList<>("a", "b", "c");
Map<String, Long> wordToNumber = new LinkedHashMap<>();
Long index = 1L;
for (String word : words) {
wordToNumber.put(word, index++);
}
I basically want a sorted map (by insertion order) of each word to its number (which is incremented at each for loop by 1), but done simpler, if possible with Java 8 streams.
Map<String, Long> wordToNumber =
IntStream.range(0, words.size())
.boxed()
.collect(Collectors.toMap(
words::get,
x -> Long.valueOf(x) + 1,
(left, right) -> { throw new RuntimeException();},
LinkedHashMap::new
));
You can replace that (left, right) -> { throw new RuntimeException();} depending on how you want to merge two elements.
The following should work (though it's not clear why Long is needed because the size of List is int)
Map<String, Long> map = IntStream.range(0, words.size())
.boxed().collect(Collectors.toMap(words::get, Long::valueOf));
The code above works if there's no duplicate in the words list.
If duplicate words are possible, a merge function needs to be provided to select which index should be stored in the map (first or last)
Map<String, Long> map = IntStream.range(0, words.size())
.boxed().collect(
Collectors.toMap(words::get, Long::valueOf,
(w1, w2) -> w2, // keep the index of the last word as in the initial code
LinkedHashMap::new // keep insertion order
));
Similarly, the map can be built by streaming words and using external variable to increment the index (AtomicLong and getAndIncrement() may be used instead of long[]):
long[] index = {1L};
Map<String, Long> map = words.stream()
.collect(
Collectors.toMap(word -> word, word -> index[0]++,
(w1, w2) -> w2, // keep the index of the last word
LinkedHashMap::new // keep insertion order
));
A slightly different solution. The Integer::max is the merge function which gets called if the same word appears twice. In this case it picks the last position since that effectively what the code sample in the question does.
#Test
public void testWordPosition() {
List<String> words = Arrays.asList("a", "b", "c", "b");
AtomicInteger index = new AtomicInteger();
Map<String, Integer> map = words.stream()
.map(w -> new AbstractMap.SimpleEntry<>(w, index.incrementAndGet()))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, Integer::max));
System.out.println(map);
}
Output:
{a=1, b=4, c=3}
Edit:
Incorporating Alex's suggestions in the comments, it becomes:
#Test
public void testWordPosition() {
List<String> words = Arrays.asList("a", "b", "c", "b");
AtomicLong index = new AtomicLong();
Map<String, Long> map = words.stream()
.collect(Collectors.toMap(w -> w, w -> index.incrementAndGet(), Long::max));
System.out.println(map);
}
I basically want a sorted map (by insertion order) of each word to its
number (which is incremented at each for loop by 1), but done simpler,
if possible with Java 8 streams.
You can do it concisely using the following Stream:
AtomicLong index = new AtomicLong(1);
words.stream().forEach(word -> wordToNumber.put(word, index.getAndIncrement()));
Personally, I think that either
Map<String, Long> wordToNumber = new LinkedHashMap<>();
for(int i = 0; i < words.size(); i++){
wordToNumber.put(words.get(i), (long) (i + 1));
}
or
Map<String, Long> wordToNumber = new LinkedHashMap<>();
for (String word : words) {
wordToNumber.put(word, index++);
}
is simpler enough.

Count unique chars and validate String in some cases using Java Stream

I'm trying to write a method that will validate String. If string has same amount of every char like "aabb", "abcabc", "abc" it is valid or if contains one extra symbol like "ababa" or "aab" it is also valid other cases - invalid.
Update: sorry, I forget to mention such cases like abcabcab -> a-3, b-3, c-2 -> 2 extra symbols (a, b) -> invalid. And my code doesn't cover such cases.
Space is a symbol, caps letters are different from small letters. Now I have this, but it looks ambiguous (especially last two methods):
public boolean validate(String line) {
List<Long> keys = countMatches(countChars(line));
int matchNum = keys.size();
if (matchNum < 2) return true;
return matchNum == 2 && Math.abs(keys.get(0) - keys.get(1)) == 1;
}
Counting unique symbols entry I'd wish to get List<long>, but I don't know how:
private Map<Character, Long> countChars(String line) {
return line.chars()
.mapToObj(c -> (char) c)
.collect(groupingBy(Function.identity(), HashMap::new, counting()));
}
private List<Long> countMatches(Map<Character, Long> countedEntries) {
return new ArrayList<>(countedEntries.values()
.stream()
.collect(groupingBy(Function.identity(), HashMap::new, counting()))
.keySet());
}
How can I optimize a method above? I need just List<Long>, but have to create a map.
As I could observe, you are looking for distinct frequencies using those two methods. You can merge that into one method to use a single stream pipeline as below :
private List<Long> distinctFrequencies(String line) {
return line.chars().mapToObj(c -> (char) c)
.collect(Collectors.groupingBy(Function.identity(),
Collectors.counting()))
.values().stream()
.distinct()
.collect(Collectors.toList());
}
Of course, all you need to change in your validate method now is the assignment
List<Long> keys = distinctFrequencies(line);
With some more thought around it, if you wish to re-use the API Map<Character, Long> countChars somewhere else as well, you could have modified the distinct frequencies API to use it as
private List<Long> distinctFrequencies(String line) {
return countChars(line).values()
.stream()
.distinct()
.collect(Collectors.toList());
}
you could perform an evaluation if every char in a string has the same occurence count using the stream api like this:
boolean valid = "aabbccded".chars()
.boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.values().stream()
.reduce((a, b) -> a == b ? a : -1L)
.map(v -> v > 0)
.get();
EDIT:
after reading the comments, I now believe to have understood the requirement.
a string is considered valid if all chars in it have the same occurrence count like aabb
or if there is a single extra character like abb
the string abcabcab is invalid as it has 3a 3b and 2c and thus, it has 1
extra a and 1 extra b, that is too much. hence, you can't perform the validation with a frequency list, you need additional information about how often the char lengths differ -> Map
here is a new trial:
TreeMap<Long, Long> map = "abcabcab".chars()
.boxed()
.collect(groupingBy(Function.identity(), counting()))
.values().stream()
.collect(groupingBy(Function.identity(), TreeMap::new, counting()));
boolean valid = map.size() == 1 || // there is only a single char length
( map.size() == 2 && // there are two and there is only 1 extra char
((map.lastKey() - map.firstKey()) * map.lastEntry().getValue() <= 1));
the whole validation could be executed in a single statement by using the Collectors.collectingAndThen method that #Nikolas used in his answer or you could use a reduction as well:
boolean valid = "aabcc".chars()
.boxed()
.collect(groupingBy(Function.identity(), counting()))
.values().stream()
.collect(groupingBy(Function.identity(), TreeMap::new, counting()))
.entrySet().stream()
.reduce((min, high) -> {
min.setValue((min.getKey() - high.getKey()) * high.getValue()); // min.getKey is the min char length
return min; // high.getKey is a higher char length
// high.getValue is occurrence count of higher char length
}) // this is always negative
.map(min -> min.getValue() >= -1)
.get();
Use Collector.collectingAndThen that is a collector that uses a downstream Collector and finisher Function that maps the result.
Use the Collectors.groupingBy and Collectors.counting to get the frequency of each character in the String.
// Results in Map<Integer, Long>
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting())
Use the map -> new HashSet<>(map.values()).size() == 1 that checks whether all frequencies are equal - if so, there is one distinct value.
Wrapping these two in Collector.collectingAndThen looks like:
String line = "aabbccdeed";
boolean isValid = line.chars() // IntStream of characters
.boxed() // boxed as Stream<Integer>
.collect(Collectors.collectingAndThen( // finisher's result type
Collectors.groupingBy( // grouped Map<Integer, Integer>
Function.identity(), // ... of each character
Collectors.counting()), // ... frequency
map -> new HashSet<>(map.values()).size() == 1 // checks the frequencies
));
// aabbccded -> false
// aabbccdeed -> true
You can do like this:
first count every character occurrence.
then find min value for occurrence.
and at the last step sum all values that the difference with the smallest value(minValue) is less than or equal to one.
public static boolean validate(String line) {
Map<Character, Long> map = line.chars()
.mapToObj(c -> (char) c)
.collect(groupingBy(Function.identity(), Collectors.counting()));
long minValue = map.values().stream().min(Long::compareTo).orElse(0l);
return map.values().stream().mapToLong(a -> Math.abs(a - minValue)).sum() <= 1;
}

Lambda to populate Map

I am trying to fill up a map with words and the number of their occurrences. I am trying to write a lambda to do it, like so:
Consumer<String> wordCount = word -> map.computeIfAbsent(word, (w) -> (new Integer(1) + 1).intValue());
map is Map<String, Integer>. It should just insert the word in the map as a key if it is absent and if it is present it should increase its integer value by 1. This one is not correct syntax-wise.
You can't increment the count using computeIfAbsent, since it will only be computed the first time.
You probably meant:
map.compute(word, (w, i) -> i == null ? 1 : i + 1);
This is what Collectors are for.
Assuming you have some Stream<String> words:
Map<String, Long> countedWords = words
.collect(Collectors
.groupingBy(
Function.identity(),
Collectors.counting());
It doesn't compile because you can't call a method on a primitive:
new Integer(1) -> 1 // unboxing was applied
(1 + 1).intValue() // incorrect
I would write it with Map#put and Map#getOrDefault:
Consumer<String> consumer = word -> map.put(word, map.getOrDefault(word, 0) + 1);

split string and store it into HashMap java 8

I want to split below string and store it into HashMap.
String responseString = "name~peter-add~mumbai-md~v-refNo~";
first I split the string using delimeter hyphen (-) and storing it into ArrayList as below:
public static List<String> getTokenizeString(String delimitedString, char separator) {
final Splitter splitter = Splitter.on(separator).trimResults();
final Iterable<String> tokens = splitter.split(delimitedString);
final List<String> tokenList = new ArrayList<String>();
for(String token: tokens){
tokenList.add(token);
}
return tokenList;
}
List<String> list = MyClass.getTokenizeString(responseString, "-");
and then using the below code to convert it to HashMap using stream.
HashMap<String, String> = list.stream()
.collect(Collectors.toMap(k ->k.split("~")[0], v -> v.split("~")[1]));
The stream collector doesnt work as there is no value against refNo.
It works correctly if I have even number of elements in ArrayList.
Is there any way to handle this? Also suggest how I can use stream to do these two tasks (I dont want to use getTokenizeString() method) using stream java 8.
Unless Splitter is doing any magic, the getTokenizeString method is obsolete here. You can perform the entire processing as a single operation:
Map<String,String> map = Pattern.compile("\\s*-\\s*")
.splitAsStream(responseString.trim())
.map(s -> s.split("~", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length>1? a[1]: ""));
By using the regular expression \s*-\s* as separator, you are considering white-space as part of the separator, hence implicitly trimming the entries. There’s only one initial trim operation before processing the entries, to ensure that there is no white-space before the first or after the last entry.
Then, simply split the entries in a map step before collecting into a Map.
First of all, you don't have to split the same String twice.
Second of all, check the length of the array to determine if a value is present for a given key.
HashMap<String, String> map=
list.stream()
.map(s -> s.split("~"))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
This is assuming you want to put the key with a null value if a key has no corresponding value.
Or you can skip the list variable :
HashMap<String, String> map1 =
MyClass.getTokenizeString(responseString, "-")
.stream()
.map(s -> s.split("~"))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
private final String dataSheet = "103343262,6478342944, 103426540,84528784843, 103278808,263716791426, 103426733,27736529279,
103426000,27718159078, 103218982,19855201547, 103427376,27717278645,
103243034,81667273413";
final int chunk = 2;
AtomicInteger counter = new AtomicInteger();
Map<String, String> pairs = Arrays.stream(dataSheet.split(","))
.map(String::trim)
.collect(Collectors.groupingBy(i -> counter.getAndIncrement() / chunk))
.values()
.stream()
.collect(toMap(k -> k.get(0), v -> v.get(1)));
result:
pairs =
"103218982" -> "19855201547"
"103278808" -> "263716791426"
"103243034" -> "81667273413"
"103426733" -> "27736529279"
"103426540" -> "84528784843"
"103427376" -> "27717278645"
"103426000" -> "27718159078"
"103343262" -> "6478342944"
We need to group each 2 elements into key, value pairs, so will partion the list into chunks of 2, (counter.getAndIncrement() / 2) will result same number each 2 hits ex:
IntStream.range(0,6).forEach((i)->System.out.println(counter.getAndIncrement()/2));
prints:
0
0
1
1
2
2
You may use the same idea to partition list into chunks.
Another short way to do :
String responseString = "name~peter-add~mumbai-md~v-refNo~";
Map<String, String> collect = Arrays.stream(responseString.split("-"))
.map(s -> s.split("~", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
System.out.println(collect);
First you split the String on basis of - , then you map like map(s -> s.split("~", 2))it to create Stream<String[]> like [name, peter][add, mumbai][md, v][refNo, ] and at last you collect it to toMap as a[0] goes to key and a[1] goes to value.

grouping List of Objects and counting using Java collection

Which Java Collection class is better to group the list of objects?
I have a list of messages from users like below:
aaa hi
bbb hello
ccc Gm
aaa Can?
CCC yes
ddd No
From this list of message object I want to count and display aaa(2)+bbb(1)+ccc(2)+ddd(1). Any code help?
You can use Map<String, Integer> where the keys represent the individual strings, and the map value is the counter for each one.
So you can do something like:
// where ever your input comes from: turn it into lower case,
// so that "ccc" and "CCC" go for the same counter
String item = userinput.toLowerCase();
// as you want a sorted list of keys, you should use a TreeMap
Map<String, Integer> stringsWithCount = new TreeMap<>();
for (String item : str) {
if (stringsWithCount.contains(item)) {
stringsWithCount.put(item, stringsWithCount.get(item)+1));
} else {
stringsWithCount.put(item, 0);
}
}
And then you can iterate the map when done:
for (Entry<String, Integer> entry : stringsWithCount.entrySet()) {
and build your result string.
That was like the old-school implementation; if you want to be fancy and surprise your teachers, you can go for the Java8/lambda/stream solution.
( where i wouldn't recommend that unless you really invest the time to completely understand the following solution; as this is untested from my side)
Arrays.stream(someListOrArrayContainingItems)
.collect(Collectors
.groupingBy(s -> s, TreeMap::new, Collectors.counting()))
.entrySet()
.stream()
.flatMap(e -> Stream.of(e.getKey(), String.valueOf(e.getValue())))
.collect(Collectors.joining())
Putting the pieces together from a couple of the other answers, adapting to your code from the other question and fixing a few trivial errors:
// as you want a sorted list of keys, you should use a TreeMap
Map<String, Integer> stringsWithCount = new TreeMap<>();
for (Message msg : convinfo.messages) {
// where ever your input comes from: turn it into lower case,
// so that "ccc" and "CCC" go for the same counter
String item = msg.userName.toLowerCase();
if (stringsWithCount.containsKey(item)) {
stringsWithCount.put(item, stringsWithCount.get(item) + 1);
} else {
stringsWithCount.put(item, 1);
}
}
String result = stringsWithCount
.entrySet()
.stream()
.map(entry -> entry.getKey() + '(' + entry.getValue() + ')')
.collect(Collectors.joining("+"));
System.out.println(result);
This prints:
aaa(2)+bbb(1)+ccc(2)+ddd(1)
You need a MultiSet from guava. That collection type is tailor-made for this kind of task:
MultiSet<String> multiSet = new MultiSet<>();
for (String line : lines) { // somehow you read the lines
multiSet.add(line.split(" ")[0].toLowerCase());
}
boolean first = true;
for (Multiset.Entry<String> entry : multiset.entrySet()) {
if (!first) {
System.out.println("+");
}
first = false;
System.out.print(entry.getElement() + "(" + entry.getCount() + ")");
}
Assuming that you use Java 8, it could be something like this using the Stream API:
List<Message> messages = ...;
// Convert your list as a Stream
// Extract only the login from the Message Object
// Lowercase the login to be able to group ccc and CCC together
// Group by login using TreeMap::new as supplier to sort the result alphabetically
// Convert each entry into login(count)
// Join with a +
String result =
messages.stream()
.map(Message::getLogin)
.map(String::toLowerCase)
.collect(
Collectors.groupingBy(
Function.identity(), TreeMap::new, Collectors.counting()
)
)
.entrySet()
.stream()
.map(entry -> entry.getKey() + '(' + entry.getValue() + ')')
.collect(Collectors.joining("+"))
System.out.println(result);
Output:
aaa(2)+bbb(1)+ccc(2)+ddd(1)
If you want to group your messages by login and have the result as a collection, you can proceed as next:
Map<String, List<Message>> groupedMessages =
messages.stream()
.collect(
Collectors.groupingBy(
message -> message.getLogin().toLowerCase(),
TreeMap::new,
Collectors.toList()
)
);

Categories