Count unique chars and validate String in some cases using Java Stream - java

I'm trying to write a method that will validate String. If string has same amount of every char like "aabb", "abcabc", "abc" it is valid or if contains one extra symbol like "ababa" or "aab" it is also valid other cases - invalid.
Update: sorry, I forget to mention such cases like abcabcab -> a-3, b-3, c-2 -> 2 extra symbols (a, b) -> invalid. And my code doesn't cover such cases.
Space is a symbol, caps letters are different from small letters. Now I have this, but it looks ambiguous (especially last two methods):
public boolean validate(String line) {
List<Long> keys = countMatches(countChars(line));
int matchNum = keys.size();
if (matchNum < 2) return true;
return matchNum == 2 && Math.abs(keys.get(0) - keys.get(1)) == 1;
}
Counting unique symbols entry I'd wish to get List<long>, but I don't know how:
private Map<Character, Long> countChars(String line) {
return line.chars()
.mapToObj(c -> (char) c)
.collect(groupingBy(Function.identity(), HashMap::new, counting()));
}
private List<Long> countMatches(Map<Character, Long> countedEntries) {
return new ArrayList<>(countedEntries.values()
.stream()
.collect(groupingBy(Function.identity(), HashMap::new, counting()))
.keySet());
}
How can I optimize a method above? I need just List<Long>, but have to create a map.

As I could observe, you are looking for distinct frequencies using those two methods. You can merge that into one method to use a single stream pipeline as below :
private List<Long> distinctFrequencies(String line) {
return line.chars().mapToObj(c -> (char) c)
.collect(Collectors.groupingBy(Function.identity(),
Collectors.counting()))
.values().stream()
.distinct()
.collect(Collectors.toList());
}
Of course, all you need to change in your validate method now is the assignment
List<Long> keys = distinctFrequencies(line);
With some more thought around it, if you wish to re-use the API Map<Character, Long> countChars somewhere else as well, you could have modified the distinct frequencies API to use it as
private List<Long> distinctFrequencies(String line) {
return countChars(line).values()
.stream()
.distinct()
.collect(Collectors.toList());
}

you could perform an evaluation if every char in a string has the same occurence count using the stream api like this:
boolean valid = "aabbccded".chars()
.boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.values().stream()
.reduce((a, b) -> a == b ? a : -1L)
.map(v -> v > 0)
.get();
EDIT:
after reading the comments, I now believe to have understood the requirement.
a string is considered valid if all chars in it have the same occurrence count like aabb
or if there is a single extra character like abb
the string abcabcab is invalid as it has 3a 3b and 2c and thus, it has 1
extra a and 1 extra b, that is too much. hence, you can't perform the validation with a frequency list, you need additional information about how often the char lengths differ -> Map
here is a new trial:
TreeMap<Long, Long> map = "abcabcab".chars()
.boxed()
.collect(groupingBy(Function.identity(), counting()))
.values().stream()
.collect(groupingBy(Function.identity(), TreeMap::new, counting()));
boolean valid = map.size() == 1 || // there is only a single char length
( map.size() == 2 && // there are two and there is only 1 extra char
((map.lastKey() - map.firstKey()) * map.lastEntry().getValue() <= 1));
the whole validation could be executed in a single statement by using the Collectors.collectingAndThen method that #Nikolas used in his answer or you could use a reduction as well:
boolean valid = "aabcc".chars()
.boxed()
.collect(groupingBy(Function.identity(), counting()))
.values().stream()
.collect(groupingBy(Function.identity(), TreeMap::new, counting()))
.entrySet().stream()
.reduce((min, high) -> {
min.setValue((min.getKey() - high.getKey()) * high.getValue()); // min.getKey is the min char length
return min; // high.getKey is a higher char length
// high.getValue is occurrence count of higher char length
}) // this is always negative
.map(min -> min.getValue() >= -1)
.get();

Use Collector.collectingAndThen that is a collector that uses a downstream Collector and finisher Function that maps the result.
Use the Collectors.groupingBy and Collectors.counting to get the frequency of each character in the String.
// Results in Map<Integer, Long>
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting())
Use the map -> new HashSet<>(map.values()).size() == 1 that checks whether all frequencies are equal - if so, there is one distinct value.
Wrapping these two in Collector.collectingAndThen looks like:
String line = "aabbccdeed";
boolean isValid = line.chars() // IntStream of characters
.boxed() // boxed as Stream<Integer>
.collect(Collectors.collectingAndThen( // finisher's result type
Collectors.groupingBy( // grouped Map<Integer, Integer>
Function.identity(), // ... of each character
Collectors.counting()), // ... frequency
map -> new HashSet<>(map.values()).size() == 1 // checks the frequencies
));
// aabbccded -> false
// aabbccdeed -> true

You can do like this:
first count every character occurrence.
then find min value for occurrence.
and at the last step sum all values that the difference with the smallest value(minValue) is less than or equal to one.
public static boolean validate(String line) {
Map<Character, Long> map = line.chars()
.mapToObj(c -> (char) c)
.collect(groupingBy(Function.identity(), Collectors.counting()));
long minValue = map.values().stream().min(Long::compareTo).orElse(0l);
return map.values().stream().mapToLong(a -> Math.abs(a - minValue)).sum() <= 1;
}

Related

Java Cannot convert from String to Int [duplicate]

public static int construction(String myString) {
Set<Character> set = new HashSet<>();
int count = myString.chars() // returns IntStream
.mapToObj(c -> (char)c) // Stream<Character> why is this required?
.mapToInt(c -> (set.add(c) == true ? 1 : 0)) // IntStream
.sum();
return count;
}
The above code will not compile without:
.mapObj(c -> (char)c)
// <Character> Stream<Character> java.util.stream.IntStream.mapToObj(IntFunction<? extends Character> mapper)
If i remove it, I get the following error
The method mapToInt((<no type> c) -> {}) is undefined for the type IntStream
Can someone explain this? It seems like I am starting with and IntStream, converting to a Stream of Characters and then back to IntStream.
The method CharSequence::chars returns the IntStream, which of course doesn't provide any method converting to int, such as mapToInt, but mapToObj instead. Therefore the method IntStream::map(IntUnaryOperator mapper) which both takes returns int as well shall be used since IntUnaryOperator does the same like Function<Integer, Integer> or UnaryOperator<Integer>:
int count = myString.chars() // IntStream
.map(c -> (set.add((char) c) ? 1 : 0)) // IntStream
.sum();
long count = myString.chars() // IntStream
.filter(c -> set.add((char) c)) // IntStream
.count();
Also, using Set<Integer> helps you to avoid conversion to a Character:
Set<Integer> set = new HashSet<>();
int count = myString.chars() // IntStream
.map(c -> (set.add(c) ? 1 : 0)) // IntStream
.sum();
long count = myString.chars() // IntStream
.filter(set::add) // IntStream
.count();
However, regardless of what you try to achieve, your code is wrong by principle. See the Stateless behaviors. Consider using the following snippet which lambda expressions' results are not dependent on the result of a non-deterministic operation, such as Set::add.
Stream pipeline results may be nondeterministic or incorrect if the behavioral parameters to the stream operations are stateful.
long count = myString.chars() // IntStream
.distinct() // IntStream
.count();
You can also collect to a set and then take the size without using an explicit map.
It does not require using external state to contain the characters.
long count = str.chars().boxed().collect(Collectors.toSet()).size();
But imho, the more direct approach which was already mentioned is cleaner in appearance and the one I would prefer to use.
long count = str.chars().distinct().count();
Because String.chars() is already returning an IntStream and IntStream does not have mapToInt function
You could use a filter instead then count:
int count = myString.chars()
.filter(c -> set.add(c) == true)
.count();
I admit that I made this so slubby last midnight!
As mentioned by the comments, here is the required fixes.
Thank you for mentioning.
long count = myString.chars()
.filter(c -> set.add((char)c))
.count();

Concatenating repeating String n times to themself basing on value from Collectors.groupingBy(K, V)

Let's say that I have a method which is finding possible substrings in the String and return the particular one which will occur more than one time, otherwise, it's returning -1. For example for abcdabcd it returns abcd. My current solution is pretty close to the ideal, but I want to return instead of abcd, concatenated occurrence with result abcdabcd basing on Collectors.groupingBy value because according to the Key : Value pairs: "abcd" occurred twice.
public static String StringPeriods(String str) {
List<String> substrings = new ArrayList<>();
for (int i = 0; i < str.length(); i++) {
for (int j = i + 1; j <= str.length(); j++) {
substrings.add(str.substring(i, j));
}
}
return substrings.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.stream()
.filter(stringLongEntry -> stringLongEntry.getValue() > 1)
.map(Entry::getKey)
.//magic here
.findFirst()
.orElse("-1");
}
Moreover, I would avoid reopening stream, instead of using concat() method or another simple solution. I will be grateful for suggestions on how to reach a goal.
You can solve your problem in multiple steps:
// Step 1: group by counting
Map<String, Long> grouping = substrings.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
// Step 2: find the max value
Long maxValue = grouping.entrySet().stream()
.max(Map.Entry.comparingByValue())
.get()
.getValue();
// Step 2: filter the entry which have max value and then max length,
// In the end repeat your String maxValue time
return grouping.entrySet()
.stream()
.filter(entry -> entry.getValue() == maxValue)
.max(Map.Entry.comparingByKey(Comparator.comparingInt(String::length)))
.map(entry -> entry.getKey().repeat(maxValue.intValue()))
.get();
I don't get you, when you use orElse("-1"), I think it is useless, just use get() in the end, and if you want to avoid empty strings, then just make a check in the start of your method:
if (str.isEmpty()) {
return "-1";
}
Note: I used repeat which exist in Java11, if you are using an old version, there are many ways to repeat a string.
Or as #Holger mention, you in one shot use:
return substrings.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.stream()
.max(Map.Entry.<String, Long>comparingByValue().thenComparingInt(e -> e.getKey().length()))
.map(entry -> entry.getKey().repeat(entry.getValue().intValue()))
.get();

Proper usage of Streams in Java

I've a use-case where I need to parse key-value pairs (separated by =) and put these key-value pairs in a LinkedHashMap.
I want to ignore the following type of Strings
key is empty or contains only spaces
value is empty or contains only spaces
those Strings which don't contain a =.
Now, I have solved it using imperative style and by using streams also.
The following are the 2 variants:
Solution by iterative style - for loop and lots of if
public static Map<String, String> getMap1(String[] array) {
Map<String, String> map = new LinkedHashMap<>();
for (int i = 0; i < array.length; i++) {
String currentString = array[i];
int index = currentString.indexOf('=');
// ignoring strings that don't contain '='
if (index == -1) continue;
String key = currentString.substring(0, index).trim();
String value = currentString.substring(index + 1).trim();
// ignoring strings with empty key or value
if (key.length() == 0 || value.length() == 0) continue;
map.put(key, value);
}
return map;
}
Solution that uses Streams - pretty clean code
public static Map<String, String> getMap(String[] array) {
return Arrays.stream(array)
.filter(s -> s.indexOf('=') != -1) // ignore strings that don't contain '='
.filter(s -> s.substring(0, s.indexOf('=')).trim().length() != 0) // key should be present
.filter(s -> s.substring(s.indexOf('=') + 1).trim().length() != 0) // value should be present
.collect(Collectors.toMap(
s -> s.substring(0, s.indexOf('=')).trim(),
s -> s.substring(s.indexOf('=') + 1).trim(),
(first, second) -> second,
LinkedHashMap::new));
}
I'm worried here because while using Streams, I'm calling the indexOf method multiple times. (And for big strings, I can end-up recalculating the same thing again and again).
Is there a way I can avoid re-computation done by indexOf method in such a way that the code is still clean. (I know talking about clean-code is very subjective, but I want don't want to open multiple streams, of loop through the original string-array and subsequently pre-computing the indices of = and re-using that).
Clubbing multiple filters into a single filter again seem to be an option but that would make my predicate pretty ugly.
(This is a result of my idle musing where I wish to learn/improve).
What about this:
String[] array = {"aaa2=asdas","aaa=asdasd"};
LinkedHashMap<String, String> aaa = Arrays.stream(array)
.map(s -> s.split("=", 2))
.filter(s -> s.length == 2) // ignore strings that don't contain '='
.peek(s -> { s[0] = s[0].trim(); })
.peek(s -> { s[1] = s[1].trim(); })
.filter(s -> s[0].length() != 0) // key should be present
.filter(s -> s[1].length() != 0) // value should be present
.collect(Collectors.toMap(
s -> s[0],
s -> s[1],
(first, second) -> second,
LinkedHashMap::new));
I'd use split instead of indexOf and StringUtils to check that your keys and values are not empty.
public static Map<String, String> getMap(String[] array) {
return Arrays.stream(array)
.filter(s -> s.contains("="))
.map(s -> s.split("="))
.filter(s -> s.length == 2 && isNotBlank(s[0]) && isNotBlank(s[1]))
.collect(Collectors.toMap(
s -> s[0].trim(),
s -> s[1].trim()));
}

Java 8 Stream to determine a maximum count in a text file

For my assignment I have to replace for loops with streams that count the frequency of words in a text document, and I am having trouble figuring the TODO part out.
String filename = "SophieSallyJack.txt";
if (args.length == 1) {
filename = args[0];
}
Map<String, Integer> wordFrequency = new TreeMap<>();
List<String> incoming = Utilities.readAFile(filename);
wordFrequency = incoming.stream()
.map(String::toLowerCase)
.filter(word -> !word.trim().isEmpty())
.collect(Collectors.toMap(word -> word, word -> 1, (a, b) -> a + b, TreeMap::new));
int maxCnt = 0;
// TODO add a single statement that uses streams to determine maxCnt
for (String word : incoming) {
Integer cnt = wordFrequency.get(word);
if (cnt != null) {
if (cnt > maxCnt) {
maxCnt = cnt;
}
}
}
System.out.print("Words that appear " + maxCnt + " times:");
I have tried this:
wordFrequency = incoming.parallelStream().
collect(Collectors.toConcurrentMap(w -> w, w -> 1, Integer::sum));
But that is not right and I'm not sure how to incorporate maxCnt into the stream.
Assuming you have all the words extracted from a file in a List<String> this word count for each word can be computed using this approach,
Map<String, Long> wordToCountMap = words.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
The most freequent word can then be computed using the above map like so,
Entry<String, Long> mostFreequentWord = wordToCountMap.entrySet().stream()
.max(Map.Entry.comparingByValue())
.orElse(new AbstractMap.SimpleEntry<>("Invalid", 0l));
You may change the above two pipelines together if you wish like this,
Entry<String, Long> mostFreequentWord = words.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.max(Map.Entry.comparingByValue())
.orElse(new AbstractMap.SimpleEntry<>("Invalid", 0l));
Update
As per the following discussion it is always good to return an Optional from your computation like so,
Optional<Entry<String, Long>> mostFreequentWord = words.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.max(Map.Entry.comparingByValue());
Well, you have done almost everything you needed with that TreeMap, but it seems you don't know that it has a method called lastEntry and that is the only one you need to call after you computed wordFrequency to get the word with the highest frequency.
The only problem is that this is not very optimal, since TreeMap sorts the data on each insert and you don't need sorted data, you need the max. Sorting in case of TreeMap is O(nlogn), while inserting into a HashMap is O(n).
So instead of using that TreeMap, all you need to change is to a HashMap:
wordFrequency = incoming.stream()
.map(String::toLowerCase)
.filter(word -> !word.trim().isEmpty())
.collect(Collectors.toMap(
Function.identity(),
word -> 1,
(a, b) -> a + b,
HashMap::new));
Once you have this Map, you need to find max - this operation is O(n) in general and could be achieved with a stream or without one:
Collections.max(wordFrequency.entrySet(), Map.Entry.comparingByValue())
This approach with give you O(n) for HashMap insert, and O(n) for finding the max - thus O(n) in general, so it's faster than TreeMap
Ok, first of all, your wordFrequency line can make use of Collectors#groupingBy and Collectors#counting instead of writing your own accumulator:
List<String> incoming = Arrays.asList("monkey", "dog", "MONKEY", "DOG", "giraffe", "giraffe", "giraffe", "Monkey");
wordFrequency = incoming.stream()
.filter(word -> !word.trim().isEmpty()) // filter first, so we don't lowercase empty strings
.map(String::toLowerCase)
.collect(Collectors.groupingBy(s -> s, Collectors.counting()));
Now that we got that out of the way... Your TODO line says use streams to determine maxCnt. You can do that easily by using max with naturalOrder:
int maxCnt = wordFrequency.values()
.stream()
.max(Comparator.naturalOrder())
.orElse(0L)
.intValue();
However, your comments make me think that what you actually want is a one-liner to print the most frequent words (all of them), i.e. the words that have maxCnt as value in wordFrequency. So what we need is to "reverse" the map, grouping the words by count, and then pick the entry with highest count:
wordFrequency.entrySet().stream() // {monkey=3, dog=2, giraffe=3}
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toList()))).entrySet().stream() // reverse map: {3=[monkey, giraffe], 2=[dog]}
.max(Comparator.comparingLong(Map.Entry::getKey)) // maxCnt and all words with it: 3=[monkey, giraffe]
.ifPresent(e -> {
System.out.println("Words that appear " + e.getKey() + " times: " + e.getValue());
});
This solution prints all the words with maxCnt, instead of just one:
Words that appear 3 times: [monkey, giraffe].
Of course, you can concatenate the statements to get one big do-it-all statement, like this:
incoming.stream() // [monkey, dog, MONKEY, DOG, giraffe, giraffe, giraffe, Monkey]
.filter(word -> !word.trim().isEmpty()) // filter first, so we don't lowercase empty strings
.map(String::toLowerCase)
.collect(groupingBy(s -> s, counting())).entrySet().stream() // {monkey=3, dog=2, giraffe=3}
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toList()))).entrySet().stream() // reverse map: {3=[monkey, giraffe], 2=[dog]}
.max(Comparator.comparingLong(Map.Entry::getKey)) // maxCnt and all words with it: 3=[monkey, giraffe]
.ifPresent(e -> {
System.out.println("Words that appear " + e.getKey() + " times: " + e.getValue());
});
But now we're stretching the meaning of "one statement" :)
By piecing together information I was able to successfully replace the for loop with
int maxCnt = wordFrequency.values().stream().max(Comparator.naturalOrder()).get();
System.out.print("Words that appear " + maxCnt + " times:");
I appreciate all the help.

split string and store it into HashMap java 8

I want to split below string and store it into HashMap.
String responseString = "name~peter-add~mumbai-md~v-refNo~";
first I split the string using delimeter hyphen (-) and storing it into ArrayList as below:
public static List<String> getTokenizeString(String delimitedString, char separator) {
final Splitter splitter = Splitter.on(separator).trimResults();
final Iterable<String> tokens = splitter.split(delimitedString);
final List<String> tokenList = new ArrayList<String>();
for(String token: tokens){
tokenList.add(token);
}
return tokenList;
}
List<String> list = MyClass.getTokenizeString(responseString, "-");
and then using the below code to convert it to HashMap using stream.
HashMap<String, String> = list.stream()
.collect(Collectors.toMap(k ->k.split("~")[0], v -> v.split("~")[1]));
The stream collector doesnt work as there is no value against refNo.
It works correctly if I have even number of elements in ArrayList.
Is there any way to handle this? Also suggest how I can use stream to do these two tasks (I dont want to use getTokenizeString() method) using stream java 8.
Unless Splitter is doing any magic, the getTokenizeString method is obsolete here. You can perform the entire processing as a single operation:
Map<String,String> map = Pattern.compile("\\s*-\\s*")
.splitAsStream(responseString.trim())
.map(s -> s.split("~", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length>1? a[1]: ""));
By using the regular expression \s*-\s* as separator, you are considering white-space as part of the separator, hence implicitly trimming the entries. There’s only one initial trim operation before processing the entries, to ensure that there is no white-space before the first or after the last entry.
Then, simply split the entries in a map step before collecting into a Map.
First of all, you don't have to split the same String twice.
Second of all, check the length of the array to determine if a value is present for a given key.
HashMap<String, String> map=
list.stream()
.map(s -> s.split("~"))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
This is assuming you want to put the key with a null value if a key has no corresponding value.
Or you can skip the list variable :
HashMap<String, String> map1 =
MyClass.getTokenizeString(responseString, "-")
.stream()
.map(s -> s.split("~"))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
private final String dataSheet = "103343262,6478342944, 103426540,84528784843, 103278808,263716791426, 103426733,27736529279,
103426000,27718159078, 103218982,19855201547, 103427376,27717278645,
103243034,81667273413";
final int chunk = 2;
AtomicInteger counter = new AtomicInteger();
Map<String, String> pairs = Arrays.stream(dataSheet.split(","))
.map(String::trim)
.collect(Collectors.groupingBy(i -> counter.getAndIncrement() / chunk))
.values()
.stream()
.collect(toMap(k -> k.get(0), v -> v.get(1)));
result:
pairs =
"103218982" -> "19855201547"
"103278808" -> "263716791426"
"103243034" -> "81667273413"
"103426733" -> "27736529279"
"103426540" -> "84528784843"
"103427376" -> "27717278645"
"103426000" -> "27718159078"
"103343262" -> "6478342944"
We need to group each 2 elements into key, value pairs, so will partion the list into chunks of 2, (counter.getAndIncrement() / 2) will result same number each 2 hits ex:
IntStream.range(0,6).forEach((i)->System.out.println(counter.getAndIncrement()/2));
prints:
0
0
1
1
2
2
You may use the same idea to partition list into chunks.
Another short way to do :
String responseString = "name~peter-add~mumbai-md~v-refNo~";
Map<String, String> collect = Arrays.stream(responseString.split("-"))
.map(s -> s.split("~", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
System.out.println(collect);
First you split the String on basis of - , then you map like map(s -> s.split("~", 2))it to create Stream<String[]> like [name, peter][add, mumbai][md, v][refNo, ] and at last you collect it to toMap as a[0] goes to key and a[1] goes to value.

Categories