Java Streams: Combining two collections into a map - java

I have two Collections, a list of warehouse ids and a collection of widgets. Widgets exist in multiple warehouses in varying quantities:
List<Long> warehouseIds;
List<Widget> widgets;
Here's an example defeinition of classes:
public class Widget {
public Collection<Stock> getStocks();
}
public class Stock {
public Long getWarehouseId();
public Integer getQuantity();
}
I want to use the Streams API to create a Map, where the warehouse ID is the key, and the value is a list of Widgets with the smallest quantity at a particular warehouse. Because multiple widgets could have the same quantity, we return a list.
For example, Warehouse 111 has 5 qty of Widget A, 5 of Widget B, and 8 of Widget C.
Warehouse 222 has 0 qty of Widget A, 5 of Widget B, and 5 of Widget C
The Map returned would have the following entries:
111 => ['WidgetA', 'WidgetB']
222 => ['WidgetA']
Starting the setup of the Map with keys seems pretty easy, but I don't know how to structure the downstream reduction:
warehouseIds.stream().collect(Collectors.groupingBy(
Function::Identity,
HashMap::new,
???...
I think the problem I'm having is reducing Widgets based on the stock warehouse Id, and not knowing how to return a Collector to create this list of Widgets. Here's how I would currently get the list of widgets with the smallest stock at a particular warehouse (represented by someWarehouseId):
widgets.stream().collect(Collectors.groupingBy(
(Widget w)->
w.getStocks()
//for a specific warehouse
.stream().filter(stock->stock.getWarehouseId()==someWarehouseId)
//Get the quantity of stocks for a widget
.collect(Collectors.summingInt(Stock::getQuantity)),
//Use a tree map so the keys are sorted
TreeMap::new,
//Get the first entry
Collectors.toList())).firstEntry().getValue();
Separating this into two tasks using forEach on the warehouse list would make this job easy, but I am wondering if I can do this in a 'one-liner'.

To tacke this problem, we need to use a more proper approach than using a TreeMap to select the values having the smallest quantities.
Consider the following approach:
We make a Stream<Widget> of our initial widgets. We will need to do some processing on the stocks of each widget, but we'll also need to keep the widget around. Let's flatMap that Stream<Widget> into a Stream<Map.Entry<Stock, Widget>>: that new Stream will be composed of each Stock that we have, with its corresponding Widget.
We filter those elements to only keep the Map.Entry<Stock, Widget> where the stock has a warehouseId contained in the warehouseIds list.
Now, we need to group that Stream according to the warehouseId of each Stock. So we use Collectors.groupingBy(classifier, downstream) where the classifier returns that warehouseId.
The downstream collector collects elements that are classified to the same key. In this case, for the Map.Entry<Stock, Widget> elements that were classified to the same warehouseId, we need to keep only those where the stock has the lowest quantity. There are no built-in collectors for this, let's use MoreCollectors.minAll(comparator, downstream) from the StreamEx library. If you prefer not to use the library, I've extracted its code into this answer and will use that.
The comparator simply compares the quantity of each stock in the Map.Entry<Stock, Widget>. This makes sure that we'll keep elements with the lowest quantity for a fixed warehouseId. The downstream collector is used to reduce the collected elements. In this case, we only want to keep the widget, so we use Collectors.mapping(mapper, downstream) where the mapper returns the widget from the Map.Entry<Stock, Widget> and the downstream collectors collect into a list with Collectors.toList().
Sample code:
Map<Long, List<Widget>> map =
widgets.stream()
.flatMap(w -> w.getStocks().stream().map(s -> new AbstractMap.SimpleEntry<>(s, w)))
.filter(e -> warehouseIds.contains(e.getKey().getWarehouseId()))
.collect(Collectors.groupingBy(
e -> e.getKey().getWarehouseId(),
minAll(
Comparator.comparingInt(e -> e.getKey().getQuantity()),
Collectors.mapping(e -> e.getValue(), Collectors.toList())
)
));
with the following minAll collector:
public static <T, A, D> Collector<T, ?, D> minAll(Comparator<? super T> comparator, Collector<T, A, D> downstream) {
return maxAll(comparator.reversed(), downstream);
}
public static <T, A, D> Collector<T, ?, D> maxAll(Comparator<? super T> comparator, Collector<? super T, A, D> downstream) {
final class PairBox<U, V> {
public U a;
public V b;
PairBox(U a, V b) {
this.a = a;
this.b = b;
}
}
Supplier<A> downstreamSupplier = downstream.supplier();
BiConsumer<A, ? super T> downstreamAccumulator = downstream.accumulator();
BinaryOperator<A> downstreamCombiner = downstream.combiner();
Supplier<PairBox<A, T>> supplier = () -> new PairBox<>(downstreamSupplier.get(), null);
BiConsumer<PairBox<A, T>, T> accumulator = (acc, t) -> {
if (acc.b == null) {
downstreamAccumulator.accept(acc.a, t);
acc.b = t;
} else {
int cmp = comparator.compare(t, acc.b);
if (cmp > 0) {
acc.a = downstreamSupplier.get();
acc.b = t;
}
if (cmp >= 0)
downstreamAccumulator.accept(acc.a, t);
}
};
BinaryOperator<PairBox<A, T>> combiner = (acc1, acc2) -> {
if (acc2.b == null) {
return acc1;
}
if (acc1.b == null) {
return acc2;
}
int cmp = comparator.compare(acc1.b, acc2.b);
if (cmp > 0) {
return acc1;
}
if (cmp < 0) {
return acc2;
}
acc1.a = downstreamCombiner.apply(acc1.a, acc2.a);
return acc1;
};
Function<PairBox<A, T>, D> finisher = acc -> downstream.finisher().apply(acc.a);
return Collector.of(supplier, accumulator, combiner, finisher);
}

Related

How to insert two records in map at a time using RxJava and Lambda expressions

I have a list of records, where each record has two primary keys Primary Id and Alternate Id. I want to make a map through which I can access the processed records using either Primary Id or the Alternate Id using RxJava operations.
Current implementation:
ImmutableMap.Builder<String, Record> mapBuilder = new ImmutableMap.Builder<>();
fetchRecords()
.forEach(
record -> {
parsedRecord = dosomething(record);
mapBuilder.put(parsedRecord.getPrimaryId(), parsedRecord);
mapBuilder.put(parsedRecord.getAlternativeId(), parsedRecord);
});
return mapBuilder.build();
How I want it to look like:
fetchRecords().stream()
.map(doSomething)
.collect(Collectors.toMap(RecordData::getPrimaryId, Function.identity()));
// Need to add this as well
.collect(Collectors.toMap(RecordData::getAlterantiveId, Function.identity()));
Just wanted to know if there is a way that I can add the secondary id to record mapping as well in a single pass over fetchRecords().
I'm not familiar with rx-java but this might provide a starting point. Create a custom collector to add both keys. This is a very basic collector and does not handle duplicate keys other than using the last one provided. I simply used the new record feature introduced in Java 15 to create an immutable "class". This would work the same way for a regular class.
record ParsedRecord(String getPrimaryId,
String getAlternativeId, String someValue) {
#Override
public String toString() {
return someValue;
}
}
Map<String, ParsedRecord> map = records.stream()
.collect(twoKeys(ParsedRecord::getPrimaryId,
ParsedRecord::getAlternativeId, p -> p));
map.entrySet().forEach(System.out::println);
Prints the following:
A=value1
B=value1
C=value2
D=value2
Here is the collector. It is essentially a simplified version of toMap that takes two keys instead of one.
private static <T, K, V> Collector<T, ?, Map<K, V>> twoKeys(
Function<T, K> keyMapper1, Function<T, K> keyMapper2,
Function<T, V> valueMapper) {
return Collector.of(
() -> new HashMap<K, V>(),
(m, r) -> {
V v = valueMapper.apply(r);
m.put(keyMapper1.apply(r),v);
m.put(keyMapper2.apply(r),v);
}, (m1, m2) -> {
m1.putAll(m2);
return m1;
}, Characteristics.UNORDERED);
}
Or just keep it simple and efficient.
Map<String, ParsedRecord> map = new HashMap<>();
for(ParsedRecord pr : records) {
map.put(pr.getPrimaryId(), pr);
map.put(pr.getAlternativeId(), pr);
}

How to write a reduce function for a generic ArrayList<T> via streams

I am tasked to implement a custom class to mimic to properties of a Set (without using any data structures of Sets). As such, I chose to store the elements of type T into an ArrayList as a field for the class.
class CustomSet<T> {
private final ArrayList<T> elementList;
CustomSet() { // when initialising an empty set
this.elementList = new ArrayList<>();
}
private CustomSet(ArrayList<T> otherElementList) { // When passing in a list to construct the set
this.elementList = otherElementList;
}
static <T> CustomSet<T> of(T...elem) {
ArrayList<T> set = new ArrayList<>();
for (T e: elem) {
set.add(e);
}
return new CustomSet<T>(set);
}
// typical add, clear, remove functions for the set which can be done with methods in ArrayList
}
One of my task is to implement a reduce function which takes in a seed and a binary operator to return the reduced value (without the use of explicit loops i.e use streams) such that the following is observed
CustomSet<Integer> thisSet = CustomSet.of(1,2,3,4,5,6);
thisSet.reduce(0, (subtotal, element) -> subtotal + element) // outputs 21
CustomSet<String> otherSet = CustomSet.of("a", "b", "c", "d", "e");
otherSet.reduce("", (partialString, element) -> partialString + element); // outputs "abcde"
I have tried to write my code as such
<U> U reduce(U identity, BiFunction<U, ? super T, U> acc) {
return elementsList.stream().reduce(identity, acc, (x,y) -> x + y);
// error which says Operator "+" cannot be applied to 'U', 'U'
}
However it runs to the following error above. How do I solve this?
EDIT
It would be the same as converting the code below to one which uses streams
<U> U reduce(U identity, BiFunction<U, ? super T, U> acc) {
for (T ele: elementsList) {
identity = acc.apply(identity, ele);
}
return identity;
}
A reduction function must be associative. A BiFunction<U, ? super T, U> acc can not be associative in general. That’s why Java’s Stream API requires a compatible BinaryOperator<U> for its three-arg reduce method. A plus operator is not a compatible third argument and it is impossible to derive a valid third argument from the second, as otherwise, there was no need to insist on that third argument.
We could say, your task is not a real Reduction but a Left-Fold operation that is often confused with Reduction.
There is no direct support for Left Fold operations in the Stream API and while it’s possible to build it without an explicit loop, it’s not a clean operation, but rather a loop in disguise.
<U> U leftFold(U identity, BiFunction<U, ? super T, U> acc) {
List<U> value = Arrays.asList(identity);
elementsList.forEach(t -> value.replaceAll(u -> acc.apply(u, t)));
return value.get(0);
}
I propose this reduce method signature
<U> U reduce(U identity, BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner) {
return elementList.stream().reduce(identity, accumulator, combiner);
}
And use it like this
CustomSet<Integer> thisSet = CustomSet.of(1, 2, 3, 4, 5, 6);
thisSet.reduce(0, (subtotal, element) -> subtotal + element, Integer::sum); // outputs 21
CustomSet<String> otherSet = CustomSet.of("a", "b", "c", "d", "e");
otherSet.reduce("", (partialString, element) -> partialString + element, (a, b) -> a + b); // outputs "abcde"

Join list from two list in Java Object in stream

I have two list on two Class where id and month is common
public class NamePropeties{
private String id;
private Integer name;
private Integer months;
}
public class NameEntries {
private String id;
private Integer retailId;
private Integer months;
}
List NamePropetiesList = new ArrayList<>();
List NameEntries = new ArrayList<>();
Now i want to JOIN two list (like Sql does, JOIN ON month and id coming from two results) and return the data in new list where month and id is same in the given two list.
if i will start iterating only one and check in another list then there can be a size iteration issue.
i have tried to do it in many ways but is there is any stream way?
The general idea has been sketched in the comments: iterate one list, create a map whose keys are the attributes you want to join by, then iterate the other list and check if there's an entry in the map. If there is, get the value from the map and create a new object from the value of the map and the actual element of the list.
It's better to create a map from the list with the higher number of joined elements. Why? Because searching a map is O(1), no matter the size of the map. So, if you create a map from the list with the higher number of joined elements, then, when you iterate the second list (which is smaller), you'll be iterating among less elements.
Putting all this in code:
public static <B, S, J, R> List<R> join(
List<B> bigger,
List<S> smaller,
Function<B, J> biggerKeyExtractor,
Function<S, J> smallerKeyExtractor,
BiFunction<B, S, R> joiner) {
Map<J, List<B>> map = new LinkedHashMap<>();
bigger.forEach(b ->
map.computeIfAbsent(
biggerKeyExtractor.apply(b),
k -> new ArrayList<>())
.add(b));
List<R> result = new ArrayList<>();
smaller.forEach(s -> {
J key = smallerKeyExtractor.apply(s);
List<B> bs = map.get(key);
if (bs != null) {
bs.forEach(b -> {
R r = joiner.apply(b, s);
result.add(r);
}
}
});
return result;
}
This is a generic method that joins bigger List<B> and smaller List<S> by J join keys (in your case, as the join key is a composite of String and Integer types, J will be List<Object>). It takes care of duplicates and returns a result List<R>. The method receives both lists, functions that will extract the join keys from each list and a joiner function that will create new result R elements from joined B and S elements.
Note that the map is actually a multimap. This is because there might be duplicates as per the biggerKeyExtractor join function. We use Map.computeIfAbsent to create this multimap.
You should create a class like this to store joined results:
public class JoinedResult {
private final NameProperties properties;
private final NameEntries entries;
public JoinedResult(NameProperties properties, NameEntries entries) {
this.properties = properties;
this.entries = entries;
}
// TODO getters
}
Or, if you are in Java 14+, you might just use a record:
public record JoinedResult(NameProperties properties, NameEntries entries) { }
Or actually, any Pair class from out there will do, or you could even use Map.Entry.
With the result class (or record) in place, you should call the join method this way:
long propertiesSize = namePropertiesList.stream()
.map(p -> Arrays.asList(p.getMonths(), p.getId()))
.distinct()
.count();
long entriesSize = nameEntriesList.steram()
.map(e -> Arrays.asList(e.getMonths(), e.getId()))
.distinct()
.count();
List<JoinedResult> result = propertiesSize > entriesSize ?
join(namePropertiesList,
nameEntriesList,
p -> Arrays.asList(p.getMonths(), p.getId()),
e -> Arrays.asList(e.getMonths(), e.getId()),
JoinedResult::new) :
join(nameEntriesList,
namePropertiesList,
e -> Arrays.asList(e.getMonths(), e.getId()),
p -> Arrays.asList(p.getMonths(), p.getId()),
(e, p) -> new JoinedResult(p, e));
The key is to use generics and call the join method with the right arguments (they are flipped, as per the join keys size comparison).
Note 1: we can use List<Object> as the key of the map, because all Java lists implement equals and hashCode consistently (thus they can safely be used as map keys)
Note 2: if you are on Java9+, you should use List.of instead of Arrays.asList
Note 3: I haven't checked for neither null nor invalid arguments
Note 4: there is room for improvements, i.e. key extractor functions could be memoized, join keys could be reused instead of calculated more than once and multimap could have Object values for single elements and lists for duplicates, etc
If performance and nesting (as discussed) is not too much of a concern you could employ something along the lines of a crossjoin with filtering:
Result holder class
public class Tuple<A, B> {
public final A a;
public final B b;
public Tuple(A a, B b) {
this.a = a;
this.b = b;
}
}
Join with a predicate:
public static <A, B> List<Tuple<A, B>> joinOn(
List<A> l1,
List<B> l2,
Predicate<Tuple<A, B>> predicate) {
return l1.stream()
.flatMap(a -> l2.stream().map(b -> new Tuple<>(a, b)))
.filter(predicate)
.collect(Collectors.toList());
}
Call it like this:
List<Tuple<NamePropeties, NameEntries>> joined = joinOn(
properties,
names,
t -> Objects.equals(t.a.id, t.b.id) && Objects.equals(t.a.months, t.b.months)
);

How to use own reduce method to distinct list

The distinct method should call the the reduce method with a empty list as identity. How can I use the accumulator to check if a value of the old list is already in the new list.
#Override
public <R> R reduce(R identity, BiFunction<R, ? super E, R> accumulator) {
for (E value : this) {
identity = accumulator.apply(identity, value);
}
return identity;
}
#Override
public List<E> distinct() {
List<E> list = new LinkedList<E>();
return reduce(list, (a, b) -> );
}
You should use contains to check if an element is in the list. If it is, don't add it to the accumulator, otherwise, do add it.
return reduce(list, (acc, element) -> {
if (!acc.contains(element)) {
acc.add(element);
}
return acc;
});

Change data in an immutable way with Java stream

Consider this code:
Function<BigDecimal,BigDecimal> func1 = x -> x;//This could be anything
Function<BigDecimal,BigDecimal> func2 = y -> y;//This could be anything
Map<Integer,BigDecimal> data = new HashMap<>();
Map<Integer,BigDecimal> newData =
data.entrySet().stream().
collect(Collectors.toMap(Entry::getKey,i ->
func1.apply(i.getValue())));
List<BigDecimal> list =
newData.entrySet().stream().map(i ->
func2.apply(i.getValue())).collect(Collectors.toList());
Basically what I'm doing is updating an HashMap with func1,to apply a second trasformation with func2 and to save second time updated value in a list.
I DID all in immutable way generating the new objects newData and list.
MY QUESTION:
It is possible to do that streaming the original HashMap (data) once?
I tried this:
Function<BigDecimal,BigDecimal> func1 = x -> x;
Function<BigDecimal,BigDecimal> func2 = y -> y;
Map<Integer,BigDecimal> data = new HashMap<>();
List<BigDecimal> list = new ArrayList<>();
Map<Integer,BigDecimal> newData =
data.entrySet().stream().collect(Collectors.toMap(
Entry::getKey,i ->
{
BigDecimal newValue = func1.apply(i.getValue());
//SIDE EFFECT!!!!!!!
list.add(func2.apply(newValue));
return newValue;
}));
but doing so I have a side effect in list updating so I lost the 'immutable way' requirement.
This seems like an ideal use case for the upcoming Collectors.teeing method in JDK 12. Here's the webrev and here's the CSR. You can use it as follows:
Map.Entry<Map<Integer, BigDecimal>, List<BigDecimal>> result = data.entrySet().stream()
.collect(Collectors.teeing(
Collectors.toMap(
Map.Entry::getKey,
i -> func1.apply(i.getValue())),
Collectors.mapping(
i -> func1.andThen(func2).apply(i.getValue()),
Collectors.toList()),
Map::entry));
Collectors.teeing collects to two different collectors and then merges both partial results into the final result. For this final step I'm using JDK 9's Map.entry(K k, V v) static method, but I could have used any other container, i.e. Pair or Tuple2, etc.
For the first collector I'm using your exact code to collect to a Map, while for the second collector I'm using Collectors.mapping along with Collectors.toList, using Function.andThen to compose your func1 and func2 functions for the mapping step.
EDIT: If you cannot wait until JDK 12 is released, you could use this code meanwhile:
public static <T, A1, A2, R1, R2, R> Collector<T, ?, R> teeing(
Collector<? super T, A1, R1> downstream1,
Collector<? super T, A2, R2> downstream2,
BiFunction<? super R1, ? super R2, R> merger) {
class Acc {
A1 acc1 = downstream1.supplier().get();
A2 acc2 = downstream2.supplier().get();
void accumulate(T t) {
downstream1.accumulator().accept(acc1, t);
downstream2.accumulator().accept(acc2, t);
}
Acc combine(Acc other) {
acc1 = downstream1.combiner().apply(acc1, other.acc1);
acc2 = downstream2.combiner().apply(acc2, other.acc2);
return this;
}
R applyMerger() {
R1 r1 = downstream1.finisher().apply(acc1);
R2 r2 = downstream2.finisher().apply(acc2);
return merger.apply(r1, r2);
}
}
return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::applyMerger);
}
Note: The characteristics of the downstream collectors are not considered when creating the returned collector (left as an exercise).
EDIT 2: Your solution is absolutely OK, even though it uses two streams. My solution above streams the original map only once, but it applies func1 to all the values twice. If func1 is expensive, you might consider memoizing it (i.e. caching its results, so that whenever it's called again with the same input, you return the result from the cache instead of computing it again). Or you might also first apply func1 to the values of the original map, and then collect with Collectors.teeing.
Memoizing is easy. Just declare this utility method:
public <T, R> Function<T, R> memoize(Function<T, R> f) {
Map<T, R> cache = new HashMap<>(); // or ConcurrentHashMap
return k -> cache.computeIfAbsent(k, f);
}
And then use it as follows:
Function<BigDecimal, BigDecimal> func1 = memoize(x -> x); //This could be anything
Now you can use this memoized func1 and it will work exactly as before, except that it will return results from the cache when its apply method is invoked with an argument that has been previously used.
The other solution would be to apply func1 first and then collect:
Map.Entry<Map<Integer, BigDecimal>, List<BigDecimal>> result = data.entrySet().stream()
.map(i -> Map.entry(i.getKey(), func1.apply(i.getValue())))
.collect(Collectors.teeing(
Collectors.toMap(
Map.Entry::getKey,
Map.Entry::getValue),
Collectors.mapping(
i -> func2.apply(i.getValue()),
Collectors.toList()),
Map::entry));
Again, I'm using jdk9's Map.entry(K k, V v) static method.
Your code can be simplified this way:
List<BigDecimal> list = data.values().stream()
.map(func1)
.map(func2)
.collect(Collectors.toList());
Your goal is to apply these functions to all the BigDecimal values in the Map. You can get all these values from the map using Map::values which returns the List. Then apply the Stream to the List only. Consider the data already contains some entries:
List<BigDecimal> list = data.values().stream()
.map(func1)
.map(func2)
.collect(Collectors.toList());
I discourage you from iterating all the entries (Set<Entry<Integer, BigDecimal>>) since you only need to work with the values.
Try this way it returns Array of Object[2] the first one is the map and second one is the list
Map<Integer, BigDecimal> data = new HashMap<>();
data.put(1, BigDecimal.valueOf(30));
data.put(2, BigDecimal.valueOf(40));
data.put(3, BigDecimal.valueOf(50));
Function<BigDecimal, BigDecimal> func1 = x -> x.add(BigDecimal.valueOf(10));//This could be anything
Function<BigDecimal, BigDecimal> func2 = y -> y.add(BigDecimal.valueOf(-20));//This could be anything
Object[] o = data.entrySet().stream()
.map(AbstractMap.SimpleEntry::new)
.map(entry -> {
entry.setValue(func1.apply(entry.getValue()));
return entry;
})
.collect(Collectors.collectingAndThen(toMap(Map.Entry::getKey, Map.Entry::getValue), a -> {
List<BigDecimal> bigDecimals = a.values().stream().map(func2).collect(Collectors.toList());
return new Object[]{a,bigDecimals};
}));
System.out.println(data);
System.out.println((Map<Integer, BigDecimal>)o[0]);
System.out.println((List<BigDecimal>)o[1]);
Output:
Original Map: {1=30, 2=40, 3=50}
func1 map: {1=40, 2=50, 3=60}
func1+func2 list: [20, 30, 40]

Categories