Get all values if Keys contain same substring in Java HashMap - java

I am trying to get all Values from Keys that contain the same substring. For example:
If a Key's string is "AAABBB'
and another Key's string is 'XXXBBB'
I want to get the Value from both of those Keys. (Since BBB matches)
The relationship of the substring match should be 3 characters in length. The prefix is from index 0-3 and the suffix index is from 3-6.
For example: AAABBB
(AAA is the suffix and BBB is the prefix.)
(The relationship AABBBA is ignored because AAB and BBA do not match.)
I'm trying to avoid using nested for loops because my algorithm will run very slow at O(N^2). I'm working on finding a solution with 1 HashMap and 1 for loop.
HashMap a = new HashMap();
map.put("AAABBB", 1);
map.put("CCCPPP", 2);
map.put("XXXBBB", 3);
map.put("AAAOOO",4);
for (Entry<String, String> entry : a.entrySet()) {
String prefix = entry.getKey().substring(0,3);
String suffix = entry.getKey().substring(3,6);
if(map.contains("ANY_SUBSTRING" + suffix){
System.out.println(map.get("ANY_SUBSTRING" + suffix);
}
}
Output: (1,3)
AAABBB => 1
XXXBBB => 3

I have following approach with streams.
Define a function to extract the suffix or prefix of each key of your map
Stream your maps entryset and group by prefix/suffix
filter those out which have no prefix/suffix incommon
Using your example map and assuming each key length is = 6
Map<String,Integer> map = new HashMap<>();
map.put("AAABBB", 1);
map.put("CCCPPP", 2);
map.put("XXXBBB", 3);
map.put("AAAOOO",4);
Function<Entry<String, Integer>,String> prefix = e -> e.getKey().substring(0,3);
Function<Entry<String, Integer>,String> suffix = e -> e.getKey().substring(3);
Map<String,List<Integer>> resultBySuffix =
map.entrySet().stream()
.collect(Collectors.groupingBy( suffix ,
Collectors.mapping(Entry::getValue, Collectors.toList())
)).entrySet().stream()
.filter(e -> e.getValue().size() > 1)
.collect(Collectors.toMap(Entry::getKey, Entry::getValue));
System.out.println(resultBySuffix);
Map<String,List<Integer>> resultByPrefix =
map.entrySet().stream()
.collect(Collectors.groupingBy( prefix ,
Collectors.mapping(Entry::getValue, Collectors.toList())
)).entrySet().stream()
.filter(e -> e.getValue().size() > 1)
.collect(Collectors.toMap(Entry::getKey, Entry::getValue));
System.out.println(resultByPrefix);
No idea what the time complexity of the above example is like. But I think you can see what is going on (in terms of readability)

Related

Split String into Map using Java Streams

I want to split the following String and store it into a Map.
String = "key_a:<value_a1>\r\n\r\nkey_b:<value_b1>\r\n\r\nkey_c:<value_c1, value_c2, value_c3>"
The string can have line breaks in between the pairs. A key can have multiple values that are separated by a , and begin with a < and end with a >.
Now this String needs to be converted to a Map<String, List<String>>.
The structure of the map should look like this:
key_a={value_a1},
key_b={value_b1},
key_c={value_c1, value_c2, value_c3}
I currently only have the logic for splitting apart the different key-value-pairs from each other, but I don't know how to implement the logic that splits the values apart from each other, removes the brackets and maps the attributes.
String strBody = "key_a:<value_a1>\r\n\r\nkey_b:<value_b1>\r\n\r\nkey_c:<value_c1, value_c2, value_c3>"
Map<String, List<String>> map = Pattern.compile("\\r?\\n")
.splitAsStream(strBody)
.map(s -> s.split(":"))
//...logic for splitting values apart from each other, removing <> brackets and storing it in the map
)
You can filter the arrays having two values and then use Collectors.groupingBy to group the elements into Map, You can find more examples here about groupingBy and `mapping
Map<String, List<String>> map = Pattern.compile("\\r?\\n")
.splitAsStream(strBody)
.map(s -> s.split(":"))
.filter(arr -> arr.length == 2)
.collect(Collectors.groupingBy(arr -> arr[0],
Collectors.mapping(arr -> arr[1].replaceAll("[<>]", ""),
Collectors.toList())));
An additional approach which also splits the list of values:
Map<String,List<String>> result =
Pattern.compile("[\\r\\n]+")
.splitAsStream(strBody)
.map(s -> s.split(":"))
.map(arr -> new AbstractMap.SimpleEntry<>(
arr[0],
Arrays.asList(arr[1].replaceAll("[<>]", "").split("\\s*,\\s"))))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
Your input has two \r\n to separate the entries, you need to split it by it as well, otherwise you will get empty entries, which you then need to filter out.
I'd remove the angle brackets from the string before processing it in the stream.
And then only the step of collection remains.
Map<String, String> map = Pattern.compile("\\r?\\n\\r?\\n")
.splitAsStream(strBody.replaceAll("[<>]",""))
.map(s -> s.split(":"))
.collect(Collectors.toMap(e -> e[0], e-> e[1]));
Try this.
String strBody = "key_a:<value_a1>\r\n\r\nkey_b:<value_b1>\r\n\r\nkey_c:<value_c1, value_c2, value_c3>";
Map<String, List<String>> result = Arrays.stream(strBody.split("\\R\\R"))
.map(e -> e.split(":", 2))
.collect(Collectors.toMap(a -> a[0],
a -> List.of(a[1].replaceAll("^<|>$", "").split("\\s,\\s*"))));
System.out.println(result);
output
{key_c=[value_c1, value_c2, value_c3], key_b=[value_b1], key_a=[value_a1]}

Java 8 Stream to determine a maximum count in a text file

For my assignment I have to replace for loops with streams that count the frequency of words in a text document, and I am having trouble figuring the TODO part out.
String filename = "SophieSallyJack.txt";
if (args.length == 1) {
filename = args[0];
}
Map<String, Integer> wordFrequency = new TreeMap<>();
List<String> incoming = Utilities.readAFile(filename);
wordFrequency = incoming.stream()
.map(String::toLowerCase)
.filter(word -> !word.trim().isEmpty())
.collect(Collectors.toMap(word -> word, word -> 1, (a, b) -> a + b, TreeMap::new));
int maxCnt = 0;
// TODO add a single statement that uses streams to determine maxCnt
for (String word : incoming) {
Integer cnt = wordFrequency.get(word);
if (cnt != null) {
if (cnt > maxCnt) {
maxCnt = cnt;
}
}
}
System.out.print("Words that appear " + maxCnt + " times:");
I have tried this:
wordFrequency = incoming.parallelStream().
collect(Collectors.toConcurrentMap(w -> w, w -> 1, Integer::sum));
But that is not right and I'm not sure how to incorporate maxCnt into the stream.
Assuming you have all the words extracted from a file in a List<String> this word count for each word can be computed using this approach,
Map<String, Long> wordToCountMap = words.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
The most freequent word can then be computed using the above map like so,
Entry<String, Long> mostFreequentWord = wordToCountMap.entrySet().stream()
.max(Map.Entry.comparingByValue())
.orElse(new AbstractMap.SimpleEntry<>("Invalid", 0l));
You may change the above two pipelines together if you wish like this,
Entry<String, Long> mostFreequentWord = words.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.max(Map.Entry.comparingByValue())
.orElse(new AbstractMap.SimpleEntry<>("Invalid", 0l));
Update
As per the following discussion it is always good to return an Optional from your computation like so,
Optional<Entry<String, Long>> mostFreequentWord = words.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.max(Map.Entry.comparingByValue());
Well, you have done almost everything you needed with that TreeMap, but it seems you don't know that it has a method called lastEntry and that is the only one you need to call after you computed wordFrequency to get the word with the highest frequency.
The only problem is that this is not very optimal, since TreeMap sorts the data on each insert and you don't need sorted data, you need the max. Sorting in case of TreeMap is O(nlogn), while inserting into a HashMap is O(n).
So instead of using that TreeMap, all you need to change is to a HashMap:
wordFrequency = incoming.stream()
.map(String::toLowerCase)
.filter(word -> !word.trim().isEmpty())
.collect(Collectors.toMap(
Function.identity(),
word -> 1,
(a, b) -> a + b,
HashMap::new));
Once you have this Map, you need to find max - this operation is O(n) in general and could be achieved with a stream or without one:
Collections.max(wordFrequency.entrySet(), Map.Entry.comparingByValue())
This approach with give you O(n) for HashMap insert, and O(n) for finding the max - thus O(n) in general, so it's faster than TreeMap
Ok, first of all, your wordFrequency line can make use of Collectors#groupingBy and Collectors#counting instead of writing your own accumulator:
List<String> incoming = Arrays.asList("monkey", "dog", "MONKEY", "DOG", "giraffe", "giraffe", "giraffe", "Monkey");
wordFrequency = incoming.stream()
.filter(word -> !word.trim().isEmpty()) // filter first, so we don't lowercase empty strings
.map(String::toLowerCase)
.collect(Collectors.groupingBy(s -> s, Collectors.counting()));
Now that we got that out of the way... Your TODO line says use streams to determine maxCnt. You can do that easily by using max with naturalOrder:
int maxCnt = wordFrequency.values()
.stream()
.max(Comparator.naturalOrder())
.orElse(0L)
.intValue();
However, your comments make me think that what you actually want is a one-liner to print the most frequent words (all of them), i.e. the words that have maxCnt as value in wordFrequency. So what we need is to "reverse" the map, grouping the words by count, and then pick the entry with highest count:
wordFrequency.entrySet().stream() // {monkey=3, dog=2, giraffe=3}
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toList()))).entrySet().stream() // reverse map: {3=[monkey, giraffe], 2=[dog]}
.max(Comparator.comparingLong(Map.Entry::getKey)) // maxCnt and all words with it: 3=[monkey, giraffe]
.ifPresent(e -> {
System.out.println("Words that appear " + e.getKey() + " times: " + e.getValue());
});
This solution prints all the words with maxCnt, instead of just one:
Words that appear 3 times: [monkey, giraffe].
Of course, you can concatenate the statements to get one big do-it-all statement, like this:
incoming.stream() // [monkey, dog, MONKEY, DOG, giraffe, giraffe, giraffe, Monkey]
.filter(word -> !word.trim().isEmpty()) // filter first, so we don't lowercase empty strings
.map(String::toLowerCase)
.collect(groupingBy(s -> s, counting())).entrySet().stream() // {monkey=3, dog=2, giraffe=3}
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toList()))).entrySet().stream() // reverse map: {3=[monkey, giraffe], 2=[dog]}
.max(Comparator.comparingLong(Map.Entry::getKey)) // maxCnt and all words with it: 3=[monkey, giraffe]
.ifPresent(e -> {
System.out.println("Words that appear " + e.getKey() + " times: " + e.getValue());
});
But now we're stretching the meaning of "one statement" :)
By piecing together information I was able to successfully replace the for loop with
int maxCnt = wordFrequency.values().stream().max(Comparator.naturalOrder()).get();
System.out.print("Words that appear " + maxCnt + " times:");
I appreciate all the help.

split string and store it into HashMap java 8

I want to split below string and store it into HashMap.
String responseString = "name~peter-add~mumbai-md~v-refNo~";
first I split the string using delimeter hyphen (-) and storing it into ArrayList as below:
public static List<String> getTokenizeString(String delimitedString, char separator) {
final Splitter splitter = Splitter.on(separator).trimResults();
final Iterable<String> tokens = splitter.split(delimitedString);
final List<String> tokenList = new ArrayList<String>();
for(String token: tokens){
tokenList.add(token);
}
return tokenList;
}
List<String> list = MyClass.getTokenizeString(responseString, "-");
and then using the below code to convert it to HashMap using stream.
HashMap<String, String> = list.stream()
.collect(Collectors.toMap(k ->k.split("~")[0], v -> v.split("~")[1]));
The stream collector doesnt work as there is no value against refNo.
It works correctly if I have even number of elements in ArrayList.
Is there any way to handle this? Also suggest how I can use stream to do these two tasks (I dont want to use getTokenizeString() method) using stream java 8.
Unless Splitter is doing any magic, the getTokenizeString method is obsolete here. You can perform the entire processing as a single operation:
Map<String,String> map = Pattern.compile("\\s*-\\s*")
.splitAsStream(responseString.trim())
.map(s -> s.split("~", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length>1? a[1]: ""));
By using the regular expression \s*-\s* as separator, you are considering white-space as part of the separator, hence implicitly trimming the entries. There’s only one initial trim operation before processing the entries, to ensure that there is no white-space before the first or after the last entry.
Then, simply split the entries in a map step before collecting into a Map.
First of all, you don't have to split the same String twice.
Second of all, check the length of the array to determine if a value is present for a given key.
HashMap<String, String> map=
list.stream()
.map(s -> s.split("~"))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
This is assuming you want to put the key with a null value if a key has no corresponding value.
Or you can skip the list variable :
HashMap<String, String> map1 =
MyClass.getTokenizeString(responseString, "-")
.stream()
.map(s -> s.split("~"))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
private final String dataSheet = "103343262,6478342944, 103426540,84528784843, 103278808,263716791426, 103426733,27736529279,
103426000,27718159078, 103218982,19855201547, 103427376,27717278645,
103243034,81667273413";
final int chunk = 2;
AtomicInteger counter = new AtomicInteger();
Map<String, String> pairs = Arrays.stream(dataSheet.split(","))
.map(String::trim)
.collect(Collectors.groupingBy(i -> counter.getAndIncrement() / chunk))
.values()
.stream()
.collect(toMap(k -> k.get(0), v -> v.get(1)));
result:
pairs =
"103218982" -> "19855201547"
"103278808" -> "263716791426"
"103243034" -> "81667273413"
"103426733" -> "27736529279"
"103426540" -> "84528784843"
"103427376" -> "27717278645"
"103426000" -> "27718159078"
"103343262" -> "6478342944"
We need to group each 2 elements into key, value pairs, so will partion the list into chunks of 2, (counter.getAndIncrement() / 2) will result same number each 2 hits ex:
IntStream.range(0,6).forEach((i)->System.out.println(counter.getAndIncrement()/2));
prints:
0
0
1
1
2
2
You may use the same idea to partition list into chunks.
Another short way to do :
String responseString = "name~peter-add~mumbai-md~v-refNo~";
Map<String, String> collect = Arrays.stream(responseString.split("-"))
.map(s -> s.split("~", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
System.out.println(collect);
First you split the String on basis of - , then you map like map(s -> s.split("~", 2))it to create Stream<String[]> like [name, peter][add, mumbai][md, v][refNo, ] and at last you collect it to toMap as a[0] goes to key and a[1] goes to value.

grouping List of Objects and counting using Java collection

Which Java Collection class is better to group the list of objects?
I have a list of messages from users like below:
aaa hi
bbb hello
ccc Gm
aaa Can?
CCC yes
ddd No
From this list of message object I want to count and display aaa(2)+bbb(1)+ccc(2)+ddd(1). Any code help?
You can use Map<String, Integer> where the keys represent the individual strings, and the map value is the counter for each one.
So you can do something like:
// where ever your input comes from: turn it into lower case,
// so that "ccc" and "CCC" go for the same counter
String item = userinput.toLowerCase();
// as you want a sorted list of keys, you should use a TreeMap
Map<String, Integer> stringsWithCount = new TreeMap<>();
for (String item : str) {
if (stringsWithCount.contains(item)) {
stringsWithCount.put(item, stringsWithCount.get(item)+1));
} else {
stringsWithCount.put(item, 0);
}
}
And then you can iterate the map when done:
for (Entry<String, Integer> entry : stringsWithCount.entrySet()) {
and build your result string.
That was like the old-school implementation; if you want to be fancy and surprise your teachers, you can go for the Java8/lambda/stream solution.
( where i wouldn't recommend that unless you really invest the time to completely understand the following solution; as this is untested from my side)
Arrays.stream(someListOrArrayContainingItems)
.collect(Collectors
.groupingBy(s -> s, TreeMap::new, Collectors.counting()))
.entrySet()
.stream()
.flatMap(e -> Stream.of(e.getKey(), String.valueOf(e.getValue())))
.collect(Collectors.joining())
Putting the pieces together from a couple of the other answers, adapting to your code from the other question and fixing a few trivial errors:
// as you want a sorted list of keys, you should use a TreeMap
Map<String, Integer> stringsWithCount = new TreeMap<>();
for (Message msg : convinfo.messages) {
// where ever your input comes from: turn it into lower case,
// so that "ccc" and "CCC" go for the same counter
String item = msg.userName.toLowerCase();
if (stringsWithCount.containsKey(item)) {
stringsWithCount.put(item, stringsWithCount.get(item) + 1);
} else {
stringsWithCount.put(item, 1);
}
}
String result = stringsWithCount
.entrySet()
.stream()
.map(entry -> entry.getKey() + '(' + entry.getValue() + ')')
.collect(Collectors.joining("+"));
System.out.println(result);
This prints:
aaa(2)+bbb(1)+ccc(2)+ddd(1)
You need a MultiSet from guava. That collection type is tailor-made for this kind of task:
MultiSet<String> multiSet = new MultiSet<>();
for (String line : lines) { // somehow you read the lines
multiSet.add(line.split(" ")[0].toLowerCase());
}
boolean first = true;
for (Multiset.Entry<String> entry : multiset.entrySet()) {
if (!first) {
System.out.println("+");
}
first = false;
System.out.print(entry.getElement() + "(" + entry.getCount() + ")");
}
Assuming that you use Java 8, it could be something like this using the Stream API:
List<Message> messages = ...;
// Convert your list as a Stream
// Extract only the login from the Message Object
// Lowercase the login to be able to group ccc and CCC together
// Group by login using TreeMap::new as supplier to sort the result alphabetically
// Convert each entry into login(count)
// Join with a +
String result =
messages.stream()
.map(Message::getLogin)
.map(String::toLowerCase)
.collect(
Collectors.groupingBy(
Function.identity(), TreeMap::new, Collectors.counting()
)
)
.entrySet()
.stream()
.map(entry -> entry.getKey() + '(' + entry.getValue() + ')')
.collect(Collectors.joining("+"))
System.out.println(result);
Output:
aaa(2)+bbb(1)+ccc(2)+ddd(1)
If you want to group your messages by login and have the result as a collection, you can proceed as next:
Map<String, List<Message>> groupedMessages =
messages.stream()
.collect(
Collectors.groupingBy(
message -> message.getLogin().toLowerCase(),
TreeMap::new,
Collectors.toList()
)
);

String manipulation in Java 8 Streams

I have a stream of Strings like-
Token1:Token2:Token3
Here ':' is delimiter character. Here Token3 String may contain delimiter character in it or may be absent.
We have to convert this stream into map with Token1 as key and value is array of two strings- array[0] = Token2 and array[1] = Token3 if Token3 is present, else null.
I have tried something like-
return Arrays.stream(inputArray)
.map( elem -> elem.split(":"))
.filter( elem -> elem.length==2 )
.collect(Collectors.toMap( e-> e[0], e -> {e[1],e[2]}));
But It didn't work. Beside that it do not handle the case if Token3 is absent or contain delimiter character in it.
How can I accomplish it in Java8 lambda expressions?
You can map every input string to the regex Matcher, then leave only those which actually match and collect via toMap collector using Matcher.group() method:
Map<String, String[]> map = Arrays.stream(inputArray)
.map(Pattern.compile("([^:]++):([^:]++):?(.+)?")::matcher)
.filter(Matcher::matches)
.collect(Collectors.toMap(m -> m.group(1), m -> new String[] {m.group(2), m.group(3)}));
Full test:
String[] inputArray = {"Token1:Token2:Token3:other",
"foo:bar:baz:qux", "test:test"};
Map<String, String[]> map = Arrays.stream(inputArray)
.map(Pattern.compile("([^:]++):([^:]++):?(.+)?")::matcher)
.filter(Matcher::matches)
.collect(Collectors.toMap(m -> m.group(1), m -> new String[] {m.group(2), m.group(3)}));
map.forEach((k, v) -> {
System.out.println(k+" => "+Arrays.toString(v));
});
Output:
test => [test, null]
foo => [bar, baz:qux]
Token1 => [Token2, Token3:other]
The same problem could be solved with String.split as well. You just need to use two-arg split version and specify how many parts at most do you want to have:
Map<String, String[]> map = Arrays.stream(inputArray)
.map(elem -> elem.split(":", 3)) // 3 means that no more than 3 parts are necessary
.filter(elem -> elem.length >= 2)
.collect(Collectors.toMap(m -> m[0],
m -> new String[] {m[1], m.length > 2 ? m[2] : null}));
The result is the same.
You can achieve what you want with the following:
return Arrays.stream(inputArray)
.map(elem -> elem.split(":", 3)) // split into at most 3 parts
.filter(arr -> arr.length >= 2) // discard invalid input (?)
.collect(Collectors.toMap(arr -> arr[0], arr -> Arrays.copyOfRange(arr, 1, 3))); // will add null as the second element if the array length is 2

Categories