How to group into map of arrays? - java

Can a groupingBy operation on a stream produce a map where the values are arrays rather than lists or some other collection type?
For example: I have a class Thing. Things have owners, so Thing has a getOwnerId method. In a stream of things I want to group the things by owner ID so that things with the same owner ID end up in an array together. In other words I want a map like the following where the keys are owner IDs and the values are arrays of things belonging to that owner.
Map<String, Thing[]> mapOfArrays;
In my case, since I need to pass the map values to a library method that requires an array, it would be most convenient to collect into a Map<String, Thing[]>.
Collecting the whole stream into one array is easy (it doesn’t even require an explicit collector):
Thing[] arrayOfThings = Stream.of(new Thing("owner1"), new Thing("owner2"), new Thing("owner1"))
.toArray(Thing[]::new);
[Belongs to owner1, Belongs to owner2, Belongs to owner1]
Groping by owner ID is easy too. For example, to group into lists:
Map<String, List<Thing>> mapOfLists = Stream.of(new Thing("owner1"), new Thing("owner2"), new Thing("owner1"))
.collect(Collectors.groupingBy(Thing::getOwnerId));
{owner1=[Belongs to owner1, Belongs to owner1], owner2=[Belongs to owner2]}
Only this example gives me a map of lists. There are 2-arg and 3-arg groupingBy methods that can give me a map of other collection types (like sets). I figured, if I can pass a collector that collects into an array (similar to the collection into an array in the first snippet above) to the two-arg Collectors.groupingBy​(Function<? super T,? extends K>, Collector<? super T,A,D>), I’d be set. However, none of the predefined collectors in the Collectors class seem to do anything with arrays. Am I missing a not too complicated way through?
For the sake of a complete example, here’s the class I’ve used in the above snippets:
public class Thing {
private String ownerId;
public Thing(String ownerId) {
this.ownerId = ownerId;
}
public String getOwnerId() {
return ownerId;
}
#Override
public String toString() {
return "Belongs to " + ownerId;
}
}

Using the collector from this answer by Thomas Pliakas:
Map<String, Thing[]> mapOfArrays = Stream.of(new Thing("owner1"), new Thing("owner2"), new Thing("owner1"))
.collect(Collectors.groupingBy(Thing::getOwnerId,
Collectors.collectingAndThen(Collectors.toList(),
tl -> tl.toArray(new Thing[0]))));
The idea is to collect into a list at first (which is an obvious idea since arrays have constant size) and then converting to an array before returning to the grouping by collector. collectingAndThen can do that through its so-called finisher.
To print the result for inspection:
mapOfArrays.forEach((k, v) -> System.out.println(k + '=' + Arrays.toString(v)));
owner1=[Belongs to owner1, Belongs to owner1]
owner2=[Belongs to owner2]
Edit: With thanks to Aomine for the link: Using new Thing[0] as argument to toArray was inspired by Arrays of Wisdom of the Ancients. It seems that on Intel CPUs in the end using new Thing[0] is faster than using new Thing[tl.size()]. I was surprised.

you could group first then use a subsequent toMap:
Map<String, Thing[]> result = source.stream()
.collect(groupingBy(Thing::getOwnerId))
.entrySet()
.stream()
.collect(toMap(Map.Entry::getKey,
e -> e.getValue().toArray(new Thing[0])));

Probably obvious but you could have done it via:
Stream.of(new Thing("owner1"), new Thing("owner2"), new Thing("owner1"))
.collect(Collectors.toMap(
Thing::getOwnerId,
x -> new Thing[]{x},
(left, right) -> {
Thing[] newA = new Thing[left.length + right.length];
System.arraycopy(left, 0, newA, 0, left.length);
System.arraycopy(right, 0, newA, left.length, right.length);
return newA;
}
))

Related

Getting multiple list of properties from a List of Objects in Java 8

Considering I have a list of objects List<Emp> where Emp has 3 properties name, id, and age. What is the fastest way to get 3 lists like List<String> names, List<String> ids, and List<Integer> ages.
The simplest I could think of is to iterate over the entire list and keep adding to these 3 lists. But, I was wondering if there is an easier way to do it with Java 8 streams?
Thanks in advance.
It's a very interesting question, however, there is no dedicated collector to handle such use case.
All you can is to use 3 iterations (Streams) respectively:
List<String> names = employees.stream().map(Emp::name).collect(Collectors.toList());
List<Integer> ids = employees.stream().map(Emp::id).collect(Collectors.toList());
List<Integer> ages = employees.stream().map(Emp::age).collect(Collectors.toList());
Edit - write the own collector: you can use the overloaded method Stream::collect(Supplier, BiConsumer, BiConsumer) to implement your own collector doing what you need:
Map<String, List<Object>> newMap = employees.stream().collect(
HashMap::new, // Supplier of the Map
(map, emp) -> { // BiConsumer accumulator
map.compute("names", remappingFunction(emp.getName()));
map.compute("ages", remappingFunction(emp.getAge()));
map.compute("ids", remappingFunction(emp.getId()));
},
(map1, map2) -> {} // BiConsumer combiner
);
Practically, all it does is extracting the wanted value (name, age...) and adding it to the List under the specific key "names", "ages" etc. using the method Map::compute that allows to compute a new value based on the existing (null by default if the key has not been used).
The remappingFunction that actually creates a new List or adds a value looks like:
private static BiFunction<String, List<Object>, List<Object>> remappingFunction(Object object) {
return (key, list) -> {
if (list == null)
list = new ArrayList<>();
list.add(object);
return list;
};
}
Java 8 Stream has some API to split the list into partition, such as:
1. Collectros.partitioningBy(..) - which create two partitions based on some Predicate and return Map<Boolean, List<>> with values;
2. Collectors.groupingBy() - which allows to group stream by some key and return resulting Map.
But, this is not really your case, since you want to put all properties of the Emp object to different Lists. I'm not sure that this can be achieved with such API, maybe with some dirty workarounds.
So, yes, the cleanest way will be to iterate through the Emp list and out all properties to the three Lists manually, as you have proposed.

Intersection of two collections of different objects types java 8

I have two lists of objects:
List<SampleClassOne> listOne;
List<SampleClassTwo> listTwo;
SampleClassOne:
public class SampleClassOne{
private String myFirstProperty;
//ommiting getters-setters
}
SampleClassTwo:
public class SampleClassTwo{
private String myOtherProperty;
//ommiting getters-setters
}
RootSampleClass:
public class RootSampleClass{
private SampleClassOne classOne;
private SampleClassTwo classTwo;
//ommiting getters-setters
}
Now I would like to merge two lists into new list of type RootSampleClass based on condition:
if(classOneObject.getMyFirstProperty().equals(classTwoObject.getMyOtherProperty()){
//create new RootSampleClass based on classOneObject and classTwoObject and add it to another collection
}
Pseudo code:
foreach(one: collectionOne){
foreach(two: collectionTwo){
if(one.getMyFirstProperty().equals(two.getMyOtherProperty()){
collectionThree.add(new RootSampleClass(one, two));
}
}
}
I am interested in java 8. I would like to have the best performance here that's why I am asking for existing solution without writing custom foreach.
A direct equivalent to the nested loops is
List<RootSampleClass> result = listOne.stream()
.flatMap(one -> listTwo.stream()
.filter(two -> one.getMyFirstProperty().equals(two.getMyOtherProperty()))
.map(two -> new RootSampleClass(one, two)))
.collect(Collectors.toList());
with an emphasis on direct equivalent, which includes the bad performance of doing n×m operations.
A better solution is to convert one of the lists into a data structure supporting an efficient lookup, e.g. a hash map. This consideration is independent of the question which API you use. Since you asked for the Stream API, you can do it like this:
Map<String,List<SampleClassOne>> tmp=listOne.stream()
.collect(Collectors.groupingBy(SampleClassOne::getMyFirstProperty));
List<RootSampleClass> result = listTwo.stream()
.flatMap(two -> tmp.getOrDefault(two.getMyOtherProperty(), Collections.emptyList())
.stream().map(one -> new RootSampleClass(one, two)))
.collect(Collectors.toList());
Note that both solutions will create all possible pairings in case, a property value occurs multiple times within either or both lists. If the property values are unique within each list, like IDs, you can use the following solution:
Map<String, SampleClassOne> tmp=listOne.stream()
.collect(Collectors.toMap(SampleClassOne::getMyFirstProperty, Function.identity()));
List<RootSampleClass> result = listTwo.stream()
.flatMap(two -> Optional.ofNullable(tmp.get(two.getMyOtherProperty()))
.map(one -> Stream.of(new RootSampleClass(one, two))).orElse(null))
.collect(Collectors.toList());
If you don’t mind potentially performing double lookups, you could replace the last solution with the following more readable code:
Map<String, SampleClassOne> tmp=listOne.stream()
.collect(Collectors.toMap(SampleClassOne::getMyFirstProperty, Function.identity()));
List<RootSampleClass> result = listTwo.stream()
.filter(two -> tmp.containsKey(two.getMyOtherProperty()))
.map(two -> new RootSampleClass(tmp.get(two.getMyOtherProperty()), two))
.collect(Collectors.toList());

Java 8 list to map with stream

I have a List<Item> collection.
I need to convert it into Map<Integer, Item>
The key of the map must be the index of the item in the collection.
I can not figure it out how to do this with streams.
Something like:
items.stream().collect(Collectors.toMap(...));
Any help?
As this question is identified as possible duplicate I need to add that my concrete problem was - how to get the position of the item in the list and put it as a key value
You can create a Stream of the indices using an IntStream and then convert them to a Map :
Map<Integer,Item> map =
IntStream.range(0,items.size())
.boxed()
.collect(Collectors.toMap (i -> i, i -> items.get(i)));
One more solution just for completeness is to use custom collector:
public static <T> Collector<T, ?, Map<Integer, T>> toMap() {
return Collector.of(HashMap::new, (map, t) -> map.put(map.size(), t),
(m1, m2) -> {
int s = m1.size();
m2.forEach((k, v) -> m1.put(k+s, v));
return m1;
});
}
Usage:
Map<Integer, Item> map = items.stream().collect(toMap());
This solution is parallel-friendly and does not depend on the source (you can use list without random access or Files.lines() or whatever).
Don't feel like you have to do everything in/with the stream. I would just do:
AtomicInteger index = new AtomicInteger();
items.stream().collect(Collectors.toMap(i -> index.getAndIncrement(), i -> i));
As long as you don't parallelise the stream this will work and it avoids potentially expensive and/or problematic (in the case of duplicates) get() and indexOf() operations.
(You cannot use a regular int variable in place of the AtomicInteger because variables used from outside a lambda expression must be effectively final. Note that when uncontested (as in this case), AtomicInteger is very fast and won't pose a performance problem. But if it worries you you can use a non-thread-safe counter.)
This is updated answer and has none of the problems mentioned in comments.
Map<Integer,Item> outputMap = IntStream.range(0,inputList.size()).boxed().collect(Collectors.toMap(Function.identity(), i->inputList.get(i)));
Using a third party library (protonpack for example, but there are others) you can zip the value with its index and voila:
StreamUtils.zipWithIndex(items.stream())
.collect(Collectors.toMap(Indexed::getIndex, Indexed::getValue));
although getIndex returns a long, so you may need to cast it using something similar to:
i -> Integer.valueOf((int) i.getIndex())
Eran's answer is usually the best approach for random-access lists.
If your List isn't random access, or if you have a Stream instead of a List, you can use forEachOrdered:
Stream<Item> stream = ... ;
Map<Integer, Item> map = new HashMap<>();
AtomicInteger index = new AtomicInteger();
stream.forEachOrdered(item -> map.put(index.getAndIncrement(), item));
This is safe, if the stream is parallel, even though the destination map is thread-unsafe and is operated upon as a side effect. The forEachOrdered guarantees that items are processed one-at-a-time, in order. For this reason it's unlikely that any speedup will result from running in parallel. (There might be some speedup if there are expensive operations in the pipeline before the forEachOrdered.)

How to create a List<T> from Map<K,V> and List<K> of keys?

Using Java 8 lambdas, what's the "best" way to effectively create a new List<T> given a List<K> of possible keys and a Map<K,V>? This is the scenario where you are given a List of possible Map keys and are expected to generate a List<T> where T is some type that is constructed based on some aspect of V, the map value types.
I've explored a few and don't feel comfortable claiming one way is better than another (with maybe one exception -- see code). I'll clarify "best" as a combination of code clarity and runtime efficiency. These are what I came up with. I'm sure someone can do better, which is one aspect of this question. I don't like the filter aspect of most as it means needing to create intermediate structures and multiple passes over the names List. Right now, I'm opting for Example 6 -- a plain 'ol loop. (NOTE: Some cryptic thoughts are in the code comments, especially "need to reference externally..." This means external from the lambda.)
public class Java8Mapping {
private final Map<String,Wongo> nameToWongoMap = new HashMap<>();
public Java8Mapping(){
List<String> names = Arrays.asList("abbey","normal","hans","delbrook");
List<String> types = Arrays.asList("crazy","boring","shocking","dead");
for(int i=0; i<names.size(); i++){
nameToWongoMap.put(names.get(i),new Wongo(names.get(i),types.get(i)));
}
}
public static void main(String[] args) {
System.out.println("in main");
Java8Mapping j = new Java8Mapping();
List<String> testNames = Arrays.asList("abbey", "froderick","igor");
System.out.println(j.getBongosExample1(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample2(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample3(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample4(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample5(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample6(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
}
private static class Wongo{
String name;
String type;
public Wongo(String s, String t){name=s;type=t;}
#Override public String toString(){return "Wongo{name="+name+", type="+type+"}";}
}
private static class Bongo{
Wongo wongo;
public Bongo(Wongo w){wongo = w;}
#Override public String toString(){ return "Bongo{wongo="+wongo+"}";}
}
// 1: Create a list externally and add items inside 'forEach'.
// Needs to externally reference Map and List
public List<Bongo> getBongosExample1(List<String> names){
final List<Bongo> listOne = new ArrayList<>();
names.forEach(s -> {
Wongo w = nameToWongoMap.get(s);
if(w != null) {
listOne.add(new Bongo(nameToWongoMap.get(s)));
}
});
return listOne;
}
// 2: Use stream().map().collect()
// Needs to externally reference Map
public List<Bongo> getBongosExample2(List<String> names){
return names.stream()
.filter(s -> nameToWongoMap.get(s) != null)
.map(s -> new Bongo(nameToWongoMap.get(s)))
.collect(Collectors.toList());
}
// 3: Create custom Collector
// Needs to externally reference Map
public List<Bongo> getBongosExample3(List<String> names){
Function<List<Wongo>,List<Bongo>> finisher = list -> list.stream().map(Bongo::new).collect(Collectors.toList());
Collector<String,List<Wongo>,List<Bongo>> bongoCollector =
Collector.of(ArrayList::new,getAccumulator(),getCombiner(),finisher, Characteristics.UNORDERED);
return names.stream().collect(bongoCollector);
}
// example 3 helper code
private BiConsumer<List<Wongo>,String> getAccumulator(){
return (list,string) -> {
Wongo w = nameToWongoMap.get(string);
if(w != null){
list.add(w);
}
};
}
// example 3 helper code
private BinaryOperator<List<Wongo>> getCombiner(){
return (l1,l2) -> {
l1.addAll(l2);
return l1;
};
}
// 4: Use internal Bongo creation facility
public List<Bongo> getBongosExample4(List<String> names){
return names.stream().filter(s->nameToWongoMap.get(s) != null).map(s-> new Bongo(nameToWongoMap.get(s))).collect(Collectors.toList());
}
// 5: Stream the Map EntrySet. This avoids referring to anything outside of the stream,
// but bypasses the lookup benefit from Map.
public List<Bongo> getBongosExample5(List<String> names){
return nameToWongoMap.entrySet().stream().filter(e->names.contains(e.getKey())).map(e -> new Bongo(e.getValue())).collect(Collectors.toList());
}
// 6: Plain-ol-java loop
public List<Bongo> getBongosExample6(List<String> names){
List<Bongo> bongos = new ArrayList<>();
for(String s : names){
Wongo w = nameToWongoMap.get(s);
if(w != null){
bongos.add(new Bongo(w));
}
}
return bongos;
}
}
If namesToWongoMap is an instance variable, you can't really avoid a capturing lambda.
You can clean up the stream by splitting up the operations a little more:
return names.stream()
.map(n -> namesToWongoMap.get(n))
.filter(w -> w != null)
.map(w -> new Bongo(w))
.collect(toList());
return names.stream()
.map(namesToWongoMap::get)
.filter(Objects::nonNull)
.map(Bongo::new)
.collect(toList());
That way you don't call get twice.
This is very much like the for loop, except, for example, it could theoretically be parallelized if namesToWongoMap can't be mutated concurrently.
I don't like the filter aspect of most as it means needing to create intermediate structures and multiple passes over the names List.
There are no intermediate structures and there is only one pass over the List. A stream pipeline says "for each element...do this sequence of operations". Each element is visited once and the pipeline is applied.
Here are some relevant quotes from the java.util.stream package description:
A stream is not a data structure that stores elements; instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.
Processing streams lazily allows for significant efficiencies; in a pipeline such as the filter-map-sum example above, filtering, mapping, and summing can be fused into a single pass on the data, with minimal intermediate state.
Radiodef's answer pretty much nailed it, I think. The solution given there:
return names.stream()
.map(namesToWongoMap::get)
.filter(Objects::nonNull)
.map(Bongo::new)
.collect(toList());
is probably about the best that can be done in Java 8.
I did want to mention a small wrinkle in this, though. The Map.get call returns null if the name isn't present in the map, and this is subsequently filtered out. There's nothing wrong with this per se, though it does bake null-means-not-present semantics into the pipeline structure.
In some sense we'd want a mapper pipeline operation that has a choice of returning zero or one elements. A way to do this with streams is with flatMap. The flatmapper function can return an arbitrary number of elements into the stream, but in this case we want just zero or one. Here's how to do that:
return names.stream()
.flatMap(name -> {
Wongo w = nameToWongoMap.get(name);
return w == null ? Stream.empty() : Stream.of(w);
})
.map(Bongo::new)
.collect(toList());
I admit this is pretty clunky and so I wouldn't recommend doing this. A slightly better but somewhat obscure approach is this:
return names.stream()
.flatMap(name -> Optional.ofNullable(nameToWongoMap.get(name))
.map(Stream::of).orElseGet(Stream::empty))
.map(Bongo::new)
.collect(toList());
but I'm still not sure I'd recommend this as it stands.
The use of flatMap does point to another approach, though. If you have a more complicated policy of how to deal with the not-present case, you could refactor this into a helper function that returns a Stream containing the result or an empty Stream if there's no result.
Finally, JDK 9 -- still under development as of this writing -- has added Stream.ofNullable which is useful in exactly these situations:
return names.stream()
.flatMap(name -> Stream.ofNullable(nameToWongoMap.get(name)))
.map(Bongo::new)
.collect(toList());
As an aside, JDK 9 has also added Optional.stream which creates a zero-or-one stream from an Optional. This is useful in cases where you want to call an Optional-returning function from within flatMap. See this answer and this answer for more discussion.
One approach I didn't see is retainAll:
public List<Bongo> getBongos(List<String> names) {
Map<String, Wongo> copy = new HashMap<>(nameToWongoMap);
copy.keySet().retainAll(names);
return copy.values().stream().map(Bongo::new).collect(
Collectors.toList());
}
The extra Map is a minimal performance hit, since it's just copying pointers to objects, not the objects themselves.

Java 8 Streams: Map the same object multiple times based on different properties

I was presented with an interesting problem by a colleague of mine and I was unable to find a neat and pretty Java 8 solution. The problem is to stream through a list of POJOs and then collect them in a map based on multiple properties - the mapping causes the POJO to occur multiple times
Imagine the following POJO:
private static class Customer {
public String first;
public String last;
public Customer(String first, String last) {
this.first = first;
this.last = last;
}
public String toString() {
return "Customer(" + first + " " + last + ")";
}
}
Set it up as a List<Customer>:
// The list of customers
List<Customer> customers = Arrays.asList(
new Customer("Johnny", "Puma"),
new Customer("Super", "Mac"));
Alternative 1: Use a Map outside of the "stream" (or rather outside forEach).
// Alt 1: not pretty since the resulting map is "outside" of
// the stream. If parallel streams are used it must be
// ConcurrentHashMap
Map<String, Customer> res1 = new HashMap<>();
customers.stream().forEach(c -> {
res1.put(c.first, c);
res1.put(c.last, c);
});
Alternative 2: Create map entries and stream them, then flatMap them. IMO it is a bit too verbose and not so easy to read.
// Alt 2: A bit verbose and "new AbstractMap.SimpleEntry" feels as
// a "hard" dependency to AbstractMap
Map<String, Customer> res2 =
customers.stream()
.map(p -> {
Map.Entry<String, Customer> firstEntry = new AbstractMap.SimpleEntry<>(p.first, p);
Map.Entry<String, Customer> lastEntry = new AbstractMap.SimpleEntry<>(p.last, p);
return Stream.of(firstEntry, lastEntry);
})
.flatMap(Function.identity())
.collect(Collectors.toMap(
Map.Entry::getKey, Map.Entry::getValue));
Alternative 3: This is another one that I came up with the "prettiest" code so far but it uses the three-arg version of reduce and the third parameter is a bit dodgy as found in this question: Purpose of third argument to 'reduce' function in Java 8 functional programming. Furthermore, reduce does not seem like a good fit for this problem since it is mutating and parallel streams may not work with the approach below.
// Alt 3: using reduce. Not so pretty
Map<String, Customer> res3 = customers.stream().reduce(
new HashMap<>(),
(m, p) -> {
m.put(p.first, p);
m.put(p.last, p);
return m;
}, (m1, m2) -> m2 /* <- NOT USED UNLESS PARALLEL */);
If the above code is printed like this:
System.out.println(res1);
System.out.println(res2);
System.out.println(res3);
The result would be:
{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}
{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}
{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}
So, now to my question: How should I, in a Java 8 orderly fashion, stream through the List<Customer> and then somehow collect it as a Map<String, Customer> where you split the whole thing as two keys (first AND last) i.e. the Customer is mapped twice. I do not want to use any 3rd party libraries, I do not want to use a map outside of the stream as in alt 1. Are there any other nice alternatives?
The full code can be found on hastebin for simple copy-paste to get the whole thing running.
I think your alternatives 2 and 3 can be re-written to be more clear:
Alternative 2:
Map<String, Customer> res2 = customers.stream()
.flatMap(
c -> Stream.of(c.first, c.last)
.map(k -> new AbstractMap.SimpleImmutableEntry<>(k, c))
).collect(toMap(Map.Entry::getKey, Map.Entry::getValue));
Alternative 3: Your code abuses reduce by mutating the HashMap. To do mutable reduction, use collect:
Map<String, Customer> res3 = customers.stream()
.collect(
HashMap::new,
(m,c) -> {m.put(c.first, c); m.put(c.last, c);},
HashMap::putAll
);
Note that these are not identical. Alternative 2 will throw an exception if there are duplicate keys while Alternative 3 will silently overwrite the entries.
If overwriting entries in case of duplicate keys is what you want, I would personally prefer Alternative 3. It is immediately clear to me what it does. It most closely resembles the iterative solution. I would expect it to be more performant as Alternative 2 has to do a bunch of allocations per customer with all that flatmapping.
However, Alternative 2 has a huge advantage over Alternative 3 by separating the production of entries from their aggregation. This gives you a great deal of flexibility. For example, if you want to change Alternative 2 to overwrite entries on duplicate keys instead of throwing an exception, you would simply add (a,b) -> b to toMap(...). If you decide you want to collect matching entries into a list, all you would have to do is replace toMap(...) with groupingBy(...), etc.

Categories