How to count words in Map via Stream

How to count words in Map via Stream - java

I'm working with List<String> -- it contais a big text. Text looks like:
List<String> lines = Arrays.asList("The first line", "The second line", "Some words can repeat", "The first the second"); //etc
I need to calculate words in it with output:
first - 2
line - 2
second - 2
can - 1
repeat - 1
some - 1
words - 1
Words shorter than 4 symbols should be skipped, that's why "the" and "can" are not in the output. Here I wrote the example, but originally if the word is rare and entry < 20, i should skip this word. Then sort the map by Key in alphabetical order.
Using only streams, without "if", "while" and "for" constructions.
What I have implemented:
Map<String, Integer> wordCount = Stream.of(list)
.flatMap(Collection::stream)
.flatMap(str -> Arrays.stream(str.split("\\p{Punct}| |[0-9]|…|«|»|“|„")))
.filter(str -> (str.length() >= 4))
.collect(Collectors.toMap(
i -> i.toLowerCase(),
i -> 1,
(a, b) -> java.lang.Integer.sum(a, b))
);
wordCount contains Map with words and its entries. But how can I skip rare words? Should I create new stream? If yes, how can I get the value of Map? I tried this, but it's not correct:
String result = Stream.of(wordCount)
.filter(i -> (Map.Entry::getValue > 10));
My calculations shoud return a String:
"word" - number of entries
Thank you!

Given the stream that already done:
List<String> lines = Arrays.asList(
"For the rabbit, it was a bad day.",
"An Antillean rabbit is very abundant.",
"She put the rabbit back in the cage and closed the door securely, then ran away.",
"The rabbit tired of her inquisition and hopped away a few steps.",
"The Dean took the rabbit and went out of the house and away."
);
Map<String, Integer> wordCounts = Stream.of(lines)
.flatMap(Collection::stream)
.flatMap(str -> Arrays.stream(str.split("\\p{Punct}| |[0-9]|…|«|»|“|„")))
.filter(str -> (str.length() >= 4))
.collect(Collectors.toMap(
String::toLowerCase,
i -> 1,
Integer::sum)
);
System.out.println("Original:" + wordCounts);
Original output:
Original:{dean=1, took=1, door=1, very=1, went=1, away=3, antillean=1, abundant=1, tired=1, back=1, then=1, house=1, steps=1, hopped=1, inquisition=1, cage=1, securely=1, rabbit=5, closed=1}
You can do:
String results = wordCounts.entrySet()
.stream()
.filter(wordToCount -> wordToCount.getValue() > 2) // 2 is rare
.sorted(Map.Entry.comparingByKey()).map(wordCount -> wordCount.getKey() + " - " + wordCount.getValue())
.collect(Collectors.joining(", "));
System.out.println(results);
Filtered output:
away - 3, rabbit - 5

You can't exclude any values that are less than rare until you have computed the frequency count.
Here is how I might go about it.
do the frequency count (I chose to do it slightly differently than you).
then stream the entrySet of the map and filter out values less than a certain frequency.
then reconstruct the map using a TreeMap to sort the words in lexical order
List<String> list = Arrays.asList(....);
int wordRarity = 10; // minimum frequency to accept
int wordLength = 4; // minimum word length to accept
Map<String, Long> map = list.stream()
.flatMap(str -> Arrays.stream(
str.split("\\p{Punct}|\\s+|[0-9]|…|«|»|“|„")))
.filter(str -> str.length() >= wordLength)
.collect(Collectors.groupingBy(String::toLowerCase,
Collectors.counting()))
// here is where the rare words are filtered out.
.entrySet().stream().filter(e->e.getValue() > wordRarity)
.collect(Collectors.toMap(Entry::getKey, Entry::getValue,
(a,b)->a,TreeMap::new));
}
Note that the (a,b)->a lambda is a merge function to handle duplicates and is not used. Unfortunately, one cannot specify a Supplier without specifying the merge function.
The easiest way to print them is as follows:
map.entrySet().forEach(e -> System.out.printf("%s - %s%n",
e.getKey(), e.getValue()));

Related

Creating a Map from an Array of int - Operator '%' cannot be applied to 'java.lang.Object'

I have code like this, which is supposed to create a Map from an array of integers. The key represents the number of digits.
public static Map<Integer, List<String>> groupByDigitNumbersArray(int[] x) {
return Arrays.stream(x) // array to stream
.filter(n -> n >= 0) // filter negative numbers
.collect(Collectors.groupingBy(n -> Integer.toString((Integer) n).length(), // group by number of digits
Collectors.mapping(d -> (d % 2 == 0 ? "e" : "o") + d,
Collectors.toList()))); // if even e odd o add to list
}
The problem is in the line with mapping().
I'm getting an error:
Operator '%' cannot be applied to 'java.lang.Object', 'int'
Does someone know how to solve this?

The flavor of collect() that expects a Collector as an argument isn't available with primitive streams. Even without a modulus operator %, your code will not compile - comment out the downstream collector of groupingBy() to see what I'm talking about.
You need to apply boxed() operation in order to convert an IntStream into a stream of objects Stream<Integer>.
Your method might look like this:
public static Map<Integer, List<String>> groupByDigitNumbersArray(int[] x) {
return Arrays.stream(x) // creates a stream over the given array
.filter(n -> n >= 0) // retain positive numbers and zero
.boxed() // <- converting IntStream into a Stream<Integer>
.collect(Collectors.groupingBy(
n -> String.valueOf(n).length(), // group by number of digits
Collectors.mapping(d -> (d % 2 == 0 ? "e" : "o") + d, // if even concatinate 'e', if odd 'o'
Collectors.toList()))); // collect to list
}
I've changed the classifier function of groupingBy() to be more readable.

Finding 1st free "index" using java streams

I need to find 1st free index in my file system having stream of names as source.
Consider list: ["New2", "New4", "New0", "New1", ...]
1st unused index of those will be 3.
int index = 0;
try (IntStream indexes = names.stream()
.filter(name -> name.startsWith("New"))
.mapToInt(Integer::parseInt)
.distinct()
.sorted())
{
// I was thinking about making possible indexes stream, removing existig ones from try-with-resource block, and getting .min().
IntStream.rangeClosed(0, 10)... // Idk what to do.
}
I am asking someone to help me find right syntax for my idea or propose better solution.

The most efficient way is to collect into a BitSet:
int first = names.stream()
.filter(name -> name.startsWith("New"))
.mapToInt(s -> Integer.parseInt(s.substring(3)))
.collect(BitSet::new, BitSet::set, BitSet::or).nextClearBit(0);
Note that the bits are intrinsically sorted and distinct. Also, there will always be a “free” index. If there is no gap between 0 and the maximum number, the next free will be maximum+1, if there are no matching elements at all, the next free will be zero.
Starting with Java 9, we can do even more efficient with
int first = names.stream()
.filter(name -> name.startsWith("New"))
.mapToInt(s -> Integer.parseInt(s, 3, s.length(), 10))
.collect(BitSet::new, BitSet::set, BitSet::or).nextClearBit(0);
which parses the relevant part of the string directly, saving the substring operation.

You could:
Extract the numeric part from each name
Store the used indexes in a set
Iterate over the range from 0 until the size of the list
The first index not in the used set is available
For example like this:
List<String> names = Arrays.asList("New2", "New4", "New0", "New1");
Set<Integer> taken = names.stream()
.map(s -> s.replaceAll("\\D+", ""))
.map(Integer::parseInt)
.collect(Collectors.toSet());
int first = IntStream.range(0, names.size())
.filter(index -> !taken.contains(index))
.findFirst()
.orElse(names.size());

For the fun of it, if you know you have up to 63 entries...
private static int firstMissing(List<Long> input) {
if (!input.contains(0L)) {
return 0;
}
long firstMissing = Long.lowestOneBit(~input.stream().reduce(1L, (i, j) -> i | 1L << j));
int result = 0;
while (firstMissing != 0) {
++result;
firstMissing = firstMissing >> 1;
}
return result - 1;
}
That's what #Holger did (+1 from me), but without the extra penalty of using BitSet.

Count occurrences of names in a List and print most popular ones out [duplicate]

This question already has answers here:
The Most Efficient Way To Find Top K Frequent Words In A Big Word Sequence
(19 answers)
Closed 6 years ago.
So i have a list
List<String> names = new ArrayList<String>();
names.add("Mike");
names.add("Matthew");
names.add("Kelly");
names.add("Elon");
names.add("Paul");
names.add("Paul");
names.add("Paul");
names.add("Paul");
names.add("Kelly");
I need to count all the names and then print out 3 most popular ones in descending order
Output:
Paul : 4
Kelly : 2
Mike : 1
What have i tried?
I tried from most basic stuff that i have learned to Maps, treemaps and hashmaps. With last 3 i have had some success but this i could not put them into descending order. I found some tutorials from google but they were all so complicated, yes i could just copy them and get my code working but i prefer learning from it.
Got any suggestions what would be most clearest approach. Since i have never worked with maps before i do not know much about them at the moment i am writing this.
In the end the output should look like this:
Output:
Paul : 44,44%
Kelly : 22,22%
Mike : 11,11%

You can do it using Java 8 :
// creating a map with name as key and as value the number of time that name it repeat
Map<String, Long> nameWithVlaues = names.stream()
.collect(Collectors.groupingBy(s -> s,
Collectors.counting()));
// using a stream of de keyset of the precedent map
nameWithVlaues.keySet()
.stream()
// sort the content of this stream using the value contained in the Map
.sorted((val1, val2) -> nameWithVlaues.get(val2).compareTo(nameWithVlaues.get(val1)))
// internal iterator over this stream
.forEachOrdered(name -> {
// getting the percent of ppl with this name
Long percent = (nameWithVlaues.get(name) * 100 / names.size());
// printing it
System.out.println(name + " : " + percent + "%");
});
Without comments its seem clearer :D
Map<String, Long> nameWithVlaues = names.stream()
.collect(Collectors.groupingBy(s -> s,
Collectors.counting()));
nameWithVlaues.keySet()
.stream()
.sorted((val1, val2) -> nameWithVlaues.get(val2).compareTo(nameWithVlaues.get(val1)))
.forEachOrdered(name -> {
Long percent = (nameWithVlaues.get(name) * 100 / names.size());
System.out.println(name + " : " + percent + "%");
});

Another solution using java 8 could be like this :
// creating a new comparator that compare two values by the number of their occurences in the list
Comparator<String> comparatorOfValues = (val1, val2) -> {
Long countVal1 = countIteration(val1, names);
Long countVal2 = countIteration(val2, names);
return - countVal1.compareTo(countVal2);
};
// maping function to write the result like this : NAME : 50%
Function<String, String> mapingFunction = name -> {
return name + " : " + countIteration(name, names) * 100 / names.size() + "%";
};
// applying to names stream the comparator and the maping function and collect result as list
List<String> result2 = names.stream()
.distinct()
.sorted(comparatorOfValues)
.map(mapingFunction)
.collect(Collectors.toList());
result2.forEach(System.out::println);
And the function that count the number of iteration in a list :
// function that count how many values in that collection matching the name
public static Long countIteration(String name, Collection<String> collection) {
return collection.stream()
.filter(val -> name.equals(val))
.count();
}

There are several ways of doing this. For starters, you could code the following:
Using a hash table like structure (HashMap), you count the number of times each name has occurred (frequency of each name) in the list.
Now that you have the map, you can iterate over all its entries (i.e. key-value or name-frequency) pairs and choose that key (or name) that has highest frequency. Remember the simple, linear search algorithm while keeping the maximum you have seen so far? You can find the percentage in the next exercise (leave it for now). This would print, e.g. {Paul: 4}. You should not forget to remove this entry from the Map once you are done iterating.
Now you know how to get to the next most frequent entry, right?

How do I keep track of input words for both line and placement?

In Java, I'm working on a program that reads a given text file and records words for the number of times they appear, and every spot in which they appear (in the format "lineNumber, wordNumber").
Though my methods for using the information are solid, I'm having trouble coming up with an algorithm that properly counts both the lines and the placements (beyond the words in the first line).
For example, if the text is
hello there
who are you hello
The word objects would be given the information
hello appearances: 2 [1-1] [2-4]
there appearances: 1 [1-2]
who appearances: 1 [2-1]
are appearances: 1 [2-2]
you appearances: 1 [2-3]
Here's a basic version of what I have:
lineNumber = 0;
wordNumber = 0;
while (inputFile.hasNextLine())
{
lineNumber++;
while (inputFile.hasNext())
{
wordNumber++;
word = inputFile.next();
//an algorithm to remove cases that aren't letters goes here
Word w = new Word(word);
w.setAppearance(lineNumber, wordNumber);
}
But of course the problem with this approach is that the hasNext() conflicts with the hasNextLine() since HasNext() apparently goes to the next line in the text file automatically, so lineNumber doesn't get a chance to increment, so any word after line 1 gets incorrect recordings.
How could I fix this? If this is complex enough that I'd need another import, what should I use?

You don't need 2 while statements. Grab the entire line and then use the String.split function to get words from the line (you split it by space character).
Also, this might help for reading line by line.

Firstly, no need for the outer while - delete it.
Secondly, no need for the Word class - delete it.
Next, you need a structure that can store multiple values for each word. A suitable structure would be a Map<String, List<Map.Entry<Integer, Integer>>>.
This code does the whole job in a few lines:
Map<String, List<Map.Entry<Integer, Integer>>> map = new HashMap<>();
for (int lineNumber = 1; inputFile.hasNext(); lineNumber++) {
int wordNumber = 0;
for (String word : inputFile.next().split(" "))
map.merge(word, new LinkedList<>(Arrays.asList(
new AbstractMap.SimpleEntry<>(lineNumber, ++wordNumber))),
(a, b) -> {a.addAll(b); return a;});
}
map.entrySet().stream().map(e -> String.format("%s appearances: %d %s",
e.getKey(), e.getValue().size(), e.getValue().stream()
.map(d -> String.format("[%d-%d]", d.getKey(),d.getValue())).collect(Collectors.joining(" "))))
.forEach(System.out::println);
Here's some test code:
Scanner inputFile = new Scanner(new ByteArrayInputStream("foo bar baz foo foo\nbar foo bar\nfoo foo".getBytes()));
Map<String, List<Map.Entry<Integer, Integer>>> map = new HashMap<>();
for (int lineNumber = 1; inputFile.hasNext(); lineNumber++) {
int wordNumber = 0;
for (String word : inputFile.next().split(" "))
map.merge(word, new LinkedList<>(Arrays.asList(
new AbstractMap.SimpleEntry<>(lineNumber, ++wordNumber))),
(a, b) -> {a.addAll(b); return a;});
}
map.entrySet().stream().map(e -> String.format("%s appearances: %d %s",
e.getKey(), e.getValue().size(), e.getValue().stream()
.map(d -> String.format("[%d-%d]", d.getKey(),d.getValue())).collect(Collectors.joining(" "))))
.forEach(System.out::println);
Output:
bar appearances: 3 [2-1] [6-1] [8-1]
foo appearances: 6 [1-1] [4-1] [5-1] [7-1] [9-1] [10-1]
baz appearances: 1 [3-1]

Checking range of List in forEach lambda loop Java 8

I want to learn how I can check the range of a list in java 8.
For example,
My Code:
List<String> objList = new ArrayList<>();
objList.add("Peter");
objList.add("James");
objList.add("Bart");
objList.stream().map((s) -> s + ",").forEach(System.out::print);
My out come is
Peter,James,Bart,
but I want to know how I can get rid of the last ,
Note: I know I must use filter here , yet I do not know how and I know there is another way to solve this which is as follows
String result = objList.stream()
.map(Person::getFirstName)
.collect(Collectors.joining(","));
yet I want to know how to check the range and get rid of , in my first code.

There's no direct way to get the index of a stream item while you're processing the items themselves. There are several alternatives, though.
One way is to run the stream over the indexes and then get the elements from the list. For each element index it maps i to the i'th element and appends a "," for all indexes except the last:
IntStream.range(0, objList.size())
.mapToObj(i -> objList.get(i) + (i < objList.size()-1 ? "," : ""))
.forEach(System.out::print);
A second, more concise variation is to special case the first element instead of the last one:
IntStream.range(0, objList.size())
.mapToObj(i -> (i > 0 ? "," : "") + objList.get(i))
.forEach(System.out::print);
A third way is to use the particular reduce operation that is applied "between" each two adjacent elements. The problem with this technique is that it does O(n^2) copying, which will become quite slow for large streams.
System.out.println(objList.stream().reduce((a,b) -> a + "," + b).get());
A fourth way is to special-case the last element by limiting the stream to length n-1. This requires a separate print statement, which isn't as pretty though:
objList.stream()
.limit(objList.size()-1)
.map(s -> s + ",")
.forEach(System.out::print);
System.out.print(objList.get(objList.size()-1));
A fifth way is similar to the above, special-casing the first element instead of the last:
System.out.print(objList.get(0));
objList.stream()
.skip(1)
.map(s -> "," + s)
.forEach(System.out::print);
Really, though the point of the joining collector is to do this ugly and irritating special-casing for you, so you don't have to do it yourself.

You could do this:
objList.stream().flatMap((s) -> Stream.of(s, ','))
.limit(objList.size() * 2 - 1).forEach(System.out::print);
flatMap replaces each element of the original stream with the elements in the streams returned from the mapping function.
So if your stream was originally
"Peter" - "James" - "Bart"
The above mapping function changes it to
"Peter" - "," - "James" - "," - "Bart" - ","
Then the limit removes the last "," by shortening the stream to be at most the length of the value that is passed to it, which in this case is the size of the stream - 1. The size of the stream was 2 * the size of the list before limit because flatMap doubled it's length.
Note that this will throw an IllegalArgumentException if the list is empty, because the value passed to limit will be -1. You should check for this first if that is a possibility.

What about:
String concat = objList.stream().reduce(",", String::concat);
System.out.println(concat);

We can also try using limit and skip methods of stream API to this problem. Here is my try to this problem.
returnData.stream().limit(returnData.size()-1).forEach(s->System.out.print(s+","));
returnData.stream().skip(returnData.size()-1).forEach(s->System.out.print(s));
returnData is a List of Integers having values 2,4,7,14. The output will look like 2,4,7,14

objList.stream().filter(s -> { return !s.equals("Bart") })
This will reduce the stream to the strings which are NOT equal to Bart
And this will print the last value of a map :
Map<Integer, String> map = new HashMap<>();
map.put(0, "a");
map.put(1, "c");
map.put(2, "d");
Integer lastIndex = map.keySet().size() - 1;
Stream<String> lastValueStream = map.values().stream().filter(s -> s.equals(map.get(lastIndex)));

try this,
int[] counter = new int[]{0};
objList.stream()
.map(f -> (counter[0]++ > 0? "," : "") + f)
.forEach(System.out::print);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to count words in Map via Stream - java

Related

Creating a Map from an Array of int - Operator '%' cannot be applied to 'java.lang.Object'

Finding 1st free "index" using java streams

Count occurrences of names in a List and print most popular ones out [duplicate]

How do I keep track of input words for both line and placement?

Checking range of List in forEach lambda loop Java 8

Categories

Resources