JavaPairRDD convert key-value into key-list - java

I have a JavaPairRDD containing (Key, Value) which I want to group by Keys and make the "second column" a list with all values seen for that key. I am currently using the groupby() function, which does the key aggrupation correctly but converts my values to an Iterable of Long. This is,
Key1 Iterable<Long>
Key2 Iterable<Long>
...
Is there any way to force this function to use a List of Longs instead of an Iterable object?
Key1 List<Long>
Key2 List<Long>
...
I read something about a function called combineByKey() but I think this is not a use case. Probably I need to use reduceByKey but I am not seeing it. It should be something like this:
myRDD.reduceByKey((a,b) -> new ArrayList<Long>()) //and add b to a
In the end, I want to combine values to obtain a Key n, List<Long> RDD.
Thank you for your time.

You can try something like this:
JavaPairRDD <String, List<long>> keyValuePairs = rdd.map(t -> {
return new Tuple2(t._1, Arrays.asList(new long[]{t._2}));
}).reduceByKey((a, b) -> {
a.addAll(b);
return a;
});
First, you map to convert the value into a list of longs. Then reduceByKey and combine the lists using addAll method on arraylist.

Related

Generate a Map from a list using Streams with Java 8

I have a list of String.
I want to store each string as key and the string's length as value in a Map (say HashMap).
I'm not able to achieve it.
List<String> ls = Arrays.asList("James", "Sam", "Scot", "Elich");
Map<String,Integer> map = new HashMap<>();
Function<String, Map<String, Integer>> fs = new Function<>() {
#Override
public Map<String, Integer> apply(String s) {
map.put(s,s.length());
return map;
}
};
Map<String, Integer> nmap = ls
.stream()
.map(fs).
.collect(Collectors.toMap()); //Lost here
System.out.println(nmap);
All strings are unique.
There's no need to wrap each and every string with its own map, as the function you've created does.
Instead, you need to provide proper arguments while calling Collectors.toMap() :
keyMapper - a function responsible for extracting a key from the stream element.
valueMapper - a function that generates a value from the stream element.
Hence, you need the stream element itself to be a key we can use Function.identity(), which is more descriptive than lambda str -> str, but does precisely the same.
Map<String,Integer> lengthByStr = ls.stream()
.collect(Collectors.toMap(
Function.identity(), // extracting a key
String::length // extracting a value
));
In case when the source list might contain duplicates, you need to provide the third argument - mergeFunction that will be responsible for resolving duplicates.
Map<String,Integer> lengthByStr = ls.stream()
.collect(Collectors.toMap(
Function.identity(), // key
String::length, // value
(left, right) -> left // resolving duplicates
));
You said there would be no duplicate Strings. But if one gets by you can use distinct() (which internally uses set) to ensure it doesn't cause issues.
a-> a is a shorthand for using the stream value. Essentially a lambda that returns its argument.
distinct() removes any duplicate strings
Map<String, Integer> result = names.stream().distinct()
.collect(Collectors.toMap(a -> a, String::length));
If you want to get the length of a String, you can do it immediately as someString.length(). But suppose you want to get a map of all the Strings keyed by a particular length. You can do it using Collectors.groupingBy() which by default puts duplicates in a list. In this case, the duplicate would be the length of the String.
use the length of the string as a key.
the value will be a List<String> to hold all strings that match that length.
List<String> names = List.of("James", "Sam", "Scot",
"Elich", "lucy", "Jennifer","Bob", "Joe", "William");
Map<Integer, List<String>> lengthMap = names.stream()
.distinct()
.collect(Collectors.groupingBy(String::length));
lengthMap.entrySet().forEach(System.out::println);
prints
3=[Sam, Bob, Joe]
4=[Scot, lucy]
5=[James, Elich]
7=[William]
8=[Jennifer]

Java 8 Need advice with stream

I have a List:
class DummyClass {
List<String> rname;
String name;
}
The values in my List look like this:
list.add(DummyClass(Array.asList("a","b"),"apple"))
list.add(DummyClass(Array.asList("a","b"),"banana"))
list.add(DummyClass(Array.asList("a","c"),"orange"))
list.add(DummyClass(null,"apple"))
I want to convert the above List into a Map<String, Set>, where key is rname and value is Set of name field.
{
"a"-> ["apple", "orange", "banana"],
"b"-> ["apple", "banana"]
"c" -> ["orange"]
}
I am trying to use java stream and facing null pointer exception . Can someone please guide
Map<String, Set<String>> map =
list.stream()
.collect(Collectors.groupingBy(DummyClass::rname,
Collectors.mapping(DummyClass::getName,
Collectors.toSet())));
I am not able to process {(Array.asList("a","b"))}each element of list in stream.
There is some flaw here :
Collectors.groupingBy(DummyClass::rname,
Collectors.mapping(DummyClass::getName,
Collectors.toSet())))
where I am processing the entire list together, rather than each element . Shall I use another stream
You need to do a filter - many of the util classes to construct collections no longer allow null e.g. Map.of or the groupingBy you have above.
You can filter or first map, replace null with a string and then group by.
Map<String, Set<String>> map =
list.stream().filter(v-> v.getName() != null)
.collect(Collectors.groupingBy(DummyClass::rname,
Collectors.mapping(DummyClass::getName,
Collectors.toSet())));
Or if you don't want to drop null values, do a map and produce a key that all null names can be grouped under something like:
Map<String, Set<String>> map =
list.stream().map(v-> Map.entry(v.getName() == null? "null": v.getName(), v))
.collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.mapping(Map.Entry::getKey,
Collectors.toSet())));
The groupingBy that I have above needs to be changed as it now has a Map.Entry rather than your desired type.
I'm writing this on a mobile...without an editor so will leave that part to you :)

Getting multiple list of properties from a List of Objects in Java 8

Considering I have a list of objects List<Emp> where Emp has 3 properties name, id, and age. What is the fastest way to get 3 lists like List<String> names, List<String> ids, and List<Integer> ages.
The simplest I could think of is to iterate over the entire list and keep adding to these 3 lists. But, I was wondering if there is an easier way to do it with Java 8 streams?
Thanks in advance.
It's a very interesting question, however, there is no dedicated collector to handle such use case.
All you can is to use 3 iterations (Streams) respectively:
List<String> names = employees.stream().map(Emp::name).collect(Collectors.toList());
List<Integer> ids = employees.stream().map(Emp::id).collect(Collectors.toList());
List<Integer> ages = employees.stream().map(Emp::age).collect(Collectors.toList());
Edit - write the own collector: you can use the overloaded method Stream::collect(Supplier, BiConsumer, BiConsumer) to implement your own collector doing what you need:
Map<String, List<Object>> newMap = employees.stream().collect(
HashMap::new, // Supplier of the Map
(map, emp) -> { // BiConsumer accumulator
map.compute("names", remappingFunction(emp.getName()));
map.compute("ages", remappingFunction(emp.getAge()));
map.compute("ids", remappingFunction(emp.getId()));
},
(map1, map2) -> {} // BiConsumer combiner
);
Practically, all it does is extracting the wanted value (name, age...) and adding it to the List under the specific key "names", "ages" etc. using the method Map::compute that allows to compute a new value based on the existing (null by default if the key has not been used).
The remappingFunction that actually creates a new List or adds a value looks like:
private static BiFunction<String, List<Object>, List<Object>> remappingFunction(Object object) {
return (key, list) -> {
if (list == null)
list = new ArrayList<>();
list.add(object);
return list;
};
}
Java 8 Stream has some API to split the list into partition, such as:
1. Collectros.partitioningBy(..) - which create two partitions based on some Predicate and return Map<Boolean, List<>> with values;
2. Collectors.groupingBy() - which allows to group stream by some key and return resulting Map.
But, this is not really your case, since you want to put all properties of the Emp object to different Lists. I'm not sure that this can be achieved with such API, maybe with some dirty workarounds.
So, yes, the cleanest way will be to iterate through the Emp list and out all properties to the three Lists manually, as you have proposed.

Extract any value from hashmap (one for each key)

I have a large map with different keys and several values (DepthFeed) associated to each. I would like to get any value (DepthFeed) from that to be able to extract the name of the instrument one for each key.
I have this map
private static Map<Integer, List<DepthFeed>> mapDepthFeed = new HashMap<>();
From that I would like to do something like, however not returning the keyset integer. Instead I want a List<DepthFeed> back (containing one row for each key)
List<DepthFeed> d = mapPriceFeed.values().stream().distinct().collect(Collectors.toList());
Use
List<DepthFeed> result = mapDepthFeed.values().stream()
.filter(list -> !list.isEmpty())
.map(list -> list.get(0))
.collect(Collectors.toList());
This way you will get the first element from each non-empty list stored in map values.

How to get certain information out of arraylist grouped into other lists in Java

I wrote a program, that reads multiple (similar) textfiles out of a Folder. Im splitting the information by space and store everything in one arraylist which contains data kind of this:
key1=hello
key2=good
key3=1234
...
key15=repetition
key1=morning
key2=night
key3=5678
...
Now I'm looking for a way to get those information out of this list and somehow grouped by their keys into other lists. So im looking for a way to get a result like this:
keyList1 = {hello,morning}
keyList2 = {good,night}
and so on.
So I have to check very line for a keyword such as "key1" and split the value at the "=" and go on and on.
I think, the datastructure that suits your (described) needs best is a MultiMap. It is like a map, but with the possibility to store more than one value for a key.
For example the implementation from the guava project.
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Multimap.html
First, you have to iterate over the arraylist:
final Multimap<String, String> multimap = ArrayListMultimap.create();
for ( String element : arrayList ) {
String[] splitted = element.split( "=" );
multimap.put( splitted[0], splitted[1] );
}
You get a List of values the following way:
for (String key : multimap.keySet()) {
List<String> values = multimap.get(key);
}
You might want to add some sanity checks for the splitting of your Strings.
(Code is untested)
It looks like you may be looking something like this kind of grouping (assuming you have access to Java 8)
List<String> pairs = Files.readAllLines(Paths.get("input.txt"));
Map<String, List<String>> map = pairs
.stream()
.map(s -> s.split("="))
.collect(
Collectors.groupingBy(
arr -> arr[0],
LinkedHashMap::new,//to preserve order of keys
Collectors.mapping(arr -> arr[1],
Collectors.toList())));
System.out.println(pairs);
System.out.println("---");
System.out.println(map);
Output:
[key1=hello, key2=good, key3=1234, key15=repetition, key1=morning, key2=night, key3=5678]
---
{key1=[hello, morning], key2=[good, night], key3=[1234, 5678], key15=[repetition]}

Categories