Join list from two list in Java Object in stream - java

I have two list on two Class where id and month is common
public class NamePropeties{
private String id;
private Integer name;
private Integer months;
}
public class NameEntries {
private String id;
private Integer retailId;
private Integer months;
}
List NamePropetiesList = new ArrayList<>();
List NameEntries = new ArrayList<>();
Now i want to JOIN two list (like Sql does, JOIN ON month and id coming from two results) and return the data in new list where month and id is same in the given two list.
if i will start iterating only one and check in another list then there can be a size iteration issue.
i have tried to do it in many ways but is there is any stream way?

The general idea has been sketched in the comments: iterate one list, create a map whose keys are the attributes you want to join by, then iterate the other list and check if there's an entry in the map. If there is, get the value from the map and create a new object from the value of the map and the actual element of the list.
It's better to create a map from the list with the higher number of joined elements. Why? Because searching a map is O(1), no matter the size of the map. So, if you create a map from the list with the higher number of joined elements, then, when you iterate the second list (which is smaller), you'll be iterating among less elements.
Putting all this in code:
public static <B, S, J, R> List<R> join(
List<B> bigger,
List<S> smaller,
Function<B, J> biggerKeyExtractor,
Function<S, J> smallerKeyExtractor,
BiFunction<B, S, R> joiner) {
Map<J, List<B>> map = new LinkedHashMap<>();
bigger.forEach(b ->
map.computeIfAbsent(
biggerKeyExtractor.apply(b),
k -> new ArrayList<>())
.add(b));
List<R> result = new ArrayList<>();
smaller.forEach(s -> {
J key = smallerKeyExtractor.apply(s);
List<B> bs = map.get(key);
if (bs != null) {
bs.forEach(b -> {
R r = joiner.apply(b, s);
result.add(r);
}
}
});
return result;
}
This is a generic method that joins bigger List<B> and smaller List<S> by J join keys (in your case, as the join key is a composite of String and Integer types, J will be List<Object>). It takes care of duplicates and returns a result List<R>. The method receives both lists, functions that will extract the join keys from each list and a joiner function that will create new result R elements from joined B and S elements.
Note that the map is actually a multimap. This is because there might be duplicates as per the biggerKeyExtractor join function. We use Map.computeIfAbsent to create this multimap.
You should create a class like this to store joined results:
public class JoinedResult {
private final NameProperties properties;
private final NameEntries entries;
public JoinedResult(NameProperties properties, NameEntries entries) {
this.properties = properties;
this.entries = entries;
}
// TODO getters
}
Or, if you are in Java 14+, you might just use a record:
public record JoinedResult(NameProperties properties, NameEntries entries) { }
Or actually, any Pair class from out there will do, or you could even use Map.Entry.
With the result class (or record) in place, you should call the join method this way:
long propertiesSize = namePropertiesList.stream()
.map(p -> Arrays.asList(p.getMonths(), p.getId()))
.distinct()
.count();
long entriesSize = nameEntriesList.steram()
.map(e -> Arrays.asList(e.getMonths(), e.getId()))
.distinct()
.count();
List<JoinedResult> result = propertiesSize > entriesSize ?
join(namePropertiesList,
nameEntriesList,
p -> Arrays.asList(p.getMonths(), p.getId()),
e -> Arrays.asList(e.getMonths(), e.getId()),
JoinedResult::new) :
join(nameEntriesList,
namePropertiesList,
e -> Arrays.asList(e.getMonths(), e.getId()),
p -> Arrays.asList(p.getMonths(), p.getId()),
(e, p) -> new JoinedResult(p, e));
The key is to use generics and call the join method with the right arguments (they are flipped, as per the join keys size comparison).
Note 1: we can use List<Object> as the key of the map, because all Java lists implement equals and hashCode consistently (thus they can safely be used as map keys)
Note 2: if you are on Java9+, you should use List.of instead of Arrays.asList
Note 3: I haven't checked for neither null nor invalid arguments
Note 4: there is room for improvements, i.e. key extractor functions could be memoized, join keys could be reused instead of calculated more than once and multimap could have Object values for single elements and lists for duplicates, etc

If performance and nesting (as discussed) is not too much of a concern you could employ something along the lines of a crossjoin with filtering:
Result holder class
public class Tuple<A, B> {
public final A a;
public final B b;
public Tuple(A a, B b) {
this.a = a;
this.b = b;
}
}
Join with a predicate:
public static <A, B> List<Tuple<A, B>> joinOn(
List<A> l1,
List<B> l2,
Predicate<Tuple<A, B>> predicate) {
return l1.stream()
.flatMap(a -> l2.stream().map(b -> new Tuple<>(a, b)))
.filter(predicate)
.collect(Collectors.toList());
}
Call it like this:
List<Tuple<NamePropeties, NameEntries>> joined = joinOn(
properties,
names,
t -> Objects.equals(t.a.id, t.b.id) && Objects.equals(t.a.months, t.b.months)
);

Related

How to fetch 3 objects having the highest values from a List with Stream API

I have a method like this:
public String mostExpensiveItems() {
List<Entry> myList = getList();
List<Double> expensive = myList.stream()
.map(Entry::getAmount)
.sorted(Comparator.reverseOrder())
.limit(3)
.toList();
return "";
}
This method needs to return the product IDs of the 3 most expensive items as a string like this:
"item1, item2, item3"
I should be able to use only streams and I got stuck here. I should be able to sort the items by value then get the product IDs, but I can't seem to make it work.
Entry class
public class Entry {
private String productId;
private LocalDate date;
private String state;
private String category;
private Double amount;
public Entry(LocalDate orderDate, String state, String productId, String category, Double sales) {
this.date = orderDate;
this.productId = productId;
this.state = state;
this.category = category;
this.amount = sales;
}
public String getProductId() {
return productId;
}
Assuming product ID is inside Entry, it can be something like this.
public String mostExpensiveItems() {
List<Entry> myList = getList();
List<String> expensive = myList.stream()
.sorted(Comparator.comparing(Entry::getAmount).reversed())
.limit(3)
.map(Entry::getProductID)
.toList();
return "";
}
NB: I didn't test this out yet, but this should be able to convey the idea.
You don't to sort all the given data for this task. Because sorting is overkill, when only need to fetch 3 largest values.
Because sorting the hole data set will cost O(n log n) time. Meanwhile, this task can be done in a single pass through the list, maintaining only 3 largest previously encountered values in the sorted order. And time complexity will be very close to a linear time.
To implement the partial sorting with streams, you can define a custom collector (an object that is responsible for accumulating the data from the stream).
You can create a custom collector either inline by using one of the versions of the static method Collector.of() or by creating a class that implements the Collector interface.
These are parameters that you need to provide while defining a custom collector:
Supplier Supplier<A> is meant to provide a mutable container which store elements of the stream. In this case because we need to perform a partial sorting, PriorityQueue will be handy for that purpose as a mutable container.
Accumulator BiConsumer<A,T> defines how to add elements into the container provided by the supplier. For this task, the accumulator needs to guarantee that queue will not exceed the given size by rejecting values that are smaller than the lowest value previously added to the queue and by removing the lowest value if the size has reached the limit a new value needs to be added.
Combiner BinaryOperator<A> combiner() establishes a rule on how to merge two containers obtained while executing stream in parallel. Here combiner rely on the same logic that was described for accumulator.
Finisher Function<A,R> is meant to produce the final result by transforming the mutable container. The finisher function in the code below turns the queue into an immutable list.
Characteristics allow to provide additional information, for instance Collector.Characteristics.UNORDERED which is used in this case denotes that the order in which partial results of the reduction produced while executing in parallel is not significant, which can improve performance of this collector with parallel streams.
Note that with Collector.of() only supplier, accumulator and combiner are mandatory, other parameters are defined if needed.
The method below that generates a collector would be more reusable if we apply generic type parameter to it and declare it to expect a comparator as a parameter (will be used in the constructor of the PriorityQueue and while adding elements to the queue).
Custom collector:
public static <T> Collector<T, ?, List<T>> getMaxN(int size, Comparator<T> comparator) {
return Collector.of(
() -> new PriorityQueue<>(comparator),
(Queue<T> queue, T next) -> tryAdd(queue, next, comparator, size),
(Queue<T> left, Queue<T> right) -> {
right.forEach(next -> tryAdd(left, next, comparator, size));
return left;
},
(Queue<T> queue) -> queue.stream().toList(),
Collector.Characteristics.UNORDERED);
}
public static <T> void tryAdd(Queue<T> queue, T next, Comparator<T> comparator, int size) {
if (queue.size() == size && comparator.compare(queue.element(), next) < 0) queue.remove(); // if next value is greater than the smallest element in the queue and max size has been exceeded the smallest element needs to be removed from the queue
if (queue.size() < size) queue.add(next);
}
Stream:
public static <T> String getMostExpensive(List<T> list, Function<T, String> function,
Comparator<T> comparator, int limit) {
return list.stream()
.collect(getMaxN(limit, comparator))
.stream()
.map(function)
.collect(Collectors.joining(", "));
}
main() - demo with dummy Entries that expects only amount as a parameter.
public static void main(String[] args) {
List<Entry> entries =
List.of(new Entry("item1", 2.6), new Entry("item2", 3.5), new Entry("item3", 5.7),
new Entry("item4", 1.9), new Entry("item5", 3.2), new Entry("item6", 9.5),
new Entry("item7", 7.2), new Entry("item8", 8.1), new Entry("item9", 7.9));
System.out.println(getMostExpensive(entries, Entry::getProductId,
Comparator.comparingDouble(Entry::getAmount), 3));
}
Output
[item9, item6, item8] // final result is not sorted PriorityQueue maintains the elements in unsorted order (sorting happens only while dequeue operation happens), if these values are requeted to be sorted it could be done by changing the finisher function

How to avoid multiple Streams with Java 8

I am having the below code
trainResponse.getIds().stream()
.filter(id -> id.getType().equalsIgnoreCase("Company"))
.findFirst()
.ifPresent(id -> {
domainResp.setId(id.getId());
});
trainResponse.getIds().stream()
.filter(id -> id.getType().equalsIgnoreCase("Private"))
.findFirst()
.ifPresent(id ->
domainResp.setPrivateId(id.getId())
);
Here I'm iterating/streaming the list of Id objects 2 times.
The only difference between the two streams is in the filter() operation.
How to achieve it in single iteration, and what is the best approach (in terms of time and space complexity) to do this?
You can achieve that with Stream IPA in one pass though the given set of data and without increasing memory consumption (i.e. the result will contain only ids having required attributes).
For that, you can create a custom Collector that will expect as its parameters a Collection attributes to look for and a Function responsible for extracting the attribute from the stream element.
That's how this generic collector could be implemented.
/** *
* #param <T> - the type of stream elements
* #param <F> - the type of the key (a field of the stream element)
*/
class CollectByKey<T, F> implements Collector<T, Map<F, T>, Map<F, T>> {
private final Set<F> keys;
private final Function<T, F> keyExtractor;
public CollectByKey(Collection<F> keys, Function<T, F> keyExtractor) {
this.keys = new HashSet<>(keys);
this.keyExtractor = keyExtractor;
}
#Override
public Supplier<Map<F, T>> supplier() {
return HashMap::new;
}
#Override
public BiConsumer<Map<F, T>, T> accumulator() {
return this::tryAdd;
}
private void tryAdd(Map<F, T> map, T item) {
F key = keyExtractor.apply(item);
if (keys.remove(key)) {
map.put(key, item);
}
}
#Override
public BinaryOperator<Map<F, T>> combiner() {
return this::tryCombine;
}
private Map<F, T> tryCombine(Map<F, T> left, Map<F, T> right) {
right.forEach(left::putIfAbsent);
return left;
}
#Override
public Function<Map<F, T>, Map<F, T>> finisher() {
return Function.identity();
}
#Override
public Set<Characteristics> characteristics() {
return Collections.emptySet();
}
}
main() - demo (dummy Id class is not shown)
public class CustomCollectorByGivenAttributes {
public static void main(String[] args) {
List<Id> ids = List.of(new Id(1, "Company"), new Id(2, "Fizz"),
new Id(3, "Private"), new Id(4, "Buzz"));
Map<String, Id> idByType = ids.stream()
.collect(new CollectByKey<>(List.of("Company", "Private"), Id::getType));
idByType.forEach((k, v) -> {
if (k.equalsIgnoreCase("Company")) domainResp.setId(v);
if (k.equalsIgnoreCase("Private")) domainResp.setPrivateId(v);
});
System.out.println(idByType.keySet()); // printing keys - added for demo purposes
}
}
Output
[Company, Private]
Note, after the set of keys becomes empty (i.e. all resulting data has been fetched) the further elements of the stream will get ignored, but still all remained data is required to be processed.
IMO, the two streams solution is the most readable. And it may even be the most efficient solution using streams.
IMO, the best way to avoid multiple streams is to use a classical loop. For example:
// There may be bugs ...
boolean seenCompany = false;
boolean seenPrivate = false;
for (Id id: getIds()) {
if (!seenCompany && id.getType().equalsIgnoreCase("Company")) {
domainResp.setId(id.getId());
seenCompany = true;
} else if (!seenPrivate && id.getType().equalsIgnoreCase("Private")) {
domainResp.setPrivateId(id.getId());
seenPrivate = true;
}
if (seenCompany && seenPrivate) {
break;
}
}
It is unclear whether that is more efficient to performing one iteration or two iterations. It will depend on the class returned by getIds() and the code of iteration.
The complicated stuff with two flags is how you replicate the short circuiting behavior of findFirst() in your 2 stream solution. I don't know if it is possible to do that at all using one stream. If you can, it will involve something pretty cunning code.
But as you can see your original solution with 2 stream is clearly easier to understand than the above.
The main point of using streams is to make your code simpler. It is not about efficiency. When you try to do complicated things to make the streams more efficient, you are probably defeating the (true) purpose of using streams in the first place.
For your list of ids, you could just use a map, then assign them after retrieving, if present.
Map<String, Integer> seen = new HashMap<>();
for (Id id : ids) {
if (seen.size() == 2) {
break;
}
seen.computeIfAbsent(id.getType().toLowerCase(), v->id.getId());
}
If you want to test it, you can use the following:
record Id(String getType, int getId) {
#Override
public String toString() {
return String.format("[%s,%s]", getType, getId);
}
}
Random r = new Random();
List<Id> ids = r.ints(20, 1, 100)
.mapToObj(id -> new Id(
r.nextBoolean() ? "Company" : "Private", id))
.toList();
Edited to allow only certain types to be checked
If you have more than two types but only want to check on certain ones, you can do it as follows.
the process is the same except you have a Set of allowed types.
You simply check to see that your are processing one of those types by using contains.
Map<String, Integer> seen = new HashMap<>();
Set<String> allowedTypes = Set.of("company", "private");
for (Id id : ids) {
String type = id.getType();
if (allowedTypes.contains(type.toLowerCase())) {
if (seen.size() == allowedTypes.size()) {
break;
}
seen.computeIfAbsent(type,
v -> id.getId());
}
}
Testing is similar except that additional types need to be included.
create a list of some types that could be present.
and build a list of them as before.
notice that the size of allowed types replaces the value 2 to permit more than two types to be checked before exiting the loop.
List<String> possibleTypes =
List.of("Company", "Type1", "Private", "Type2");
Random r = new Random();
List<Id> ids =
r.ints(30, 1, 100)
.mapToObj(id -> new Id(possibleTypes.get(
r.nextInt((possibleTypes.size()))),
id))
.toList();
You can group by type and check the resulting map.
I suppose the type of ids is IdType.
Map<String, List<IdType>> map = trainResponse.getIds()
.stream()
.collect(Collectors.groupingBy(
id -> id.getType().toLowerCase()));
Optional.ofNullable(map.get("company")).ifPresent(ids -> domainResp.setId(ids.get(0).getId()));
Optional.ofNullable(map.get("private")).ifPresent(ids -> domainResp.setPrivateId(ids.get(0).getId()));
I'd recommend a traditionnal for loop. In addition of being easily scalable, this prevents you from traversing the collection multiple times.
Your code looks like something that'll be generalised in the future, thus my generic approch.
Here's some pseudo code (with errors, just for the sake of illustration)
Set<String> matches = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
for(id : trainResponse.getIds()) {
if (! matches.add(id.getType())) {
continue;
}
switch (id.getType().toLowerCase()) {
case "company":
domainResp.setId(id.getId());
break;
case "private":
...
}
}
Something along these lines can might work, it would go through the whole stream though, and won't stop at the first occurrence.
But assuming a small stream and only one Id for each type, why not?
Map<String, Consumer<String>> setters = new HashMap<>();
setters.put("Company", domainResp::setId);
setters.put("Private", domainResp::setPrivateId);
trainResponse.getIds().forEach(id -> {
if (setters.containsKey(id.getType())) {
setters.get(id.getType()).accept(id.getId());
}
});
We can use the Collectors.filtering from Java 9 onwards to collect the values based on condition.
For this scenario, I have changed code like below
final Map<String, String> results = trainResponse.getIds()
.stream()
.collect(Collectors.filtering(
id -> id.getType().equals("Company") || id.getIdContext().equals("Private"),
Collectors.toMap(Id::getType, Id::getId, (first, second) -> first)));
And getting the id from results Map.

find the largest 3 shops using java stream

I have a list of shop objects that are grouped by the item they have.
class Shop{
String shopName;
String item;
int size;
...}
How can I get a list of the 3 biggest shops (or n biggest shops) for each item?
ie. suppose I have
Shop("Walmart", "Hammer", 100);
Shop("Target", "Scissor", 30);
Shop("Walgreens", "Hammer", 300);
Shop("Glens", "Hammer", 500);
Shop("Walmart", "Scissor", 75);
Shop("Toms", "Hammer", 150);
I want to return a list of the top 3 shops grouped by item.
I grouped the items but i am not sure how to iterate through the given Map or entryset...
public class Shop {
int size;
String item;
String name;
public Shop(int size, String item, String name){
this.size = size;
this.item = item;
this.name = name;
}
//Return a list of the top 3 largest shops by item
public static void main(){
List<Shop> shops = new LinkedList<Shop>();
Comparator<Shop> shopComparator = new Comparator<Shop>(){
#Override
public int compare(Shop f1, Shop f2) {
return f1.getSize() < f2.getSize() ? 1 : -1;
}
};
shops.stream().collect(groupingBy(Shop::getItem))
.entrySet()
.stream()
.filter(entry -> entry.getValue().stream().map )
.forEach(item -> item.getValue())//Stuck here
;
}
}
The most important thing that you can learn about streams is that they aren't inherently "better" than equivalent approaches by any measure. Sometimes, they make code more readable, other times, less so. Use them to clarify your code, and avoid them when they obfuscate it.
This is a case where your code will be far more readable by using a collector for this purpose. Coding your own is fairly easy, and if you really want to understand streams better, I recommend it as a simple learning exercise.
Here, I'm using MoreCollectors.greatest() from the StreamEx library:
Comparator<Shop> bySize = Comparator.comparingInt(Shop::getSize);
Map<String, List<Shop>> biggestByItem
= shops.stream().collect(groupingBy(Shop::getItem, greatest(bySize, 3)));
This isn't better because it's shorter, or because it is faster and uses constant memory; it's better because complexity is factored out of the code, and hidden behind meaningful names that explain the behavior. Instead of littering your application with complex pipelines that need to be read, tested, and maintained independently, you have written (or referenced) a reusable collector with a clear behavior.
As I mentioned, there is a bit of a learning curve in understanding how the pieces of a Collector work together, but it's worth studying. Here's a possible implementation for a similar collector:
public static <T> Collector<T, ?, List<T>> top(int limit, Comparator<? super T> order) {
if (limit < 1) throw new IndexOutOfBoundsException(limit);
Objects.requireNonNull(order);
Supplier<Queue<T>> supplier = () -> new PriorityQueue<>(order);
BiConsumer<Queue<T>, T> accumulator = (q, e) -> collect(order, limit, q, e);
BinaryOperator<Queue<T>> combiner = (q1, q2) -> {
q2.forEach(e -> collect(order, limit, q1, e));
return q1;
};
Function<Queue<T>, List<T>> finisher = q -> {
List<T> list = new ArrayList<>(q);
Collections.reverse(list);
return list;
};
return Collector.of(supplier, accumulator, combiner, finisher, Collector.Characteristics.UNORDERED);
}
private static <T> void collect(Comparator<? super T> order, int limit, Queue<T> q, T e) {
if (q.size() < limit) {
q.add(e);
} else if (order.compare(e, q.peek()) > 0) {
q.remove();
q.add(e);
}
}
Given this factory, it's trivial to create others that give you bottom(3, bySize), etc.
You may be interested in this related question and its answers.
Well, you could take the following steps:
With groupingBy(Shop::getItem), you could create a map which sorts by the item, so your result would be a Map<String, List<Shop>>, where the list contains all shops with that item.
Now we need to sort the List<Shop> in reversed order, so the top items of the list are the shops with the largest size. In order to do this, we could use collectingAndThen as downstream collector to groupingBy.
Collectors.collectingAndThen(Collectors.toList(), finisherFunction);
Our finisher function should sort the list:
list -> {
Collections.sort(list, Comparator.comparing(Shop::size).reversed());
return list;
}
This would result in a Map<String, List<Shop>>, where the list is sorted, highest size first.
Now the only thing we need to do, is limiting the list size to 3. We could use subList. I think subList throws an exception if the list contains less than 3 items, so we need to use Math.min(3, list.size()) to take this into account.
list -> {
Collections.sort(list, Comparator.comparing(Shop::size).reversed());
return list.subList(0, Math.min(3, list.size()));
}
The whole code then looks like this:
shops.stream()
.collect(groupingBy(Shop::item, Collectors.collectingAndThen(Collectors.toList(), list -> {
Collections.sort(list, Comparator.comparing(Shop::size).reversed());
return list.subList(0, Math.min(3, list.size()));
})));
Online demo
Instead of 'manually' sorting the list and limiting it to 3, you could create a small class which automatically does this — both limit and sort the list upon adding elements.
Not as fancy as MC Emperor but it seems to work.
I started from the part you already did:
shops.stream().collect(Collectors.groupingBy(Shop::getItem))
.entrySet().stream().map(entry -> {
entry.setValue(entry.getValue().stream()
.sorted(Comparator.comparingInt(s->-s.size))
.limit(3) // only keep top 3
.collect(Collectors.toList()));
return entry;
}).forEach(item -> {
System.out.println(item.getKey()+":"+item.getValue());
});
You can use groupingBy along with limit to get desired result:
import static java.util.stream.Collectors.*;
// Define the sort logic. reversed() applies asc order (Default is desc)
Comparator<Shop> sortBySize = Comparator.comparingInt(Shop::getSize).reversed();
int limit = 3; // top n items
var itemToTopNShopsMap = list.stream().collect(
collectingAndThen(groupingBy(Shop::getItem),
itemToShopsMap -> getTopNShops(sortBySize, itemToShopsMap, limit)));
static Map<String, List<Shop>> getTopNShops(Comparator<Shop> sortBy, Map<String, List<Shop>> inMap, int limit) {
var returningMap = new HashMap<String, List<Shop>>();
for (var i : inMap.entrySet()) {
returningMap.put(i.getKey(), i.getValue().stream().sorted(sortBy).limit(Long.valueOf(limit)).collect(toList()));
}
return returningMap;
}
We took following steps:
Group the List by 'item'
For each grouping, i.e., item to list of shops entry, we sort the list of shops by predefined sort logic and collect (limit) the top n results.
Note:
In static method getTopNShops, mutation of source map is avoided. We could have written this method as a stream, but the stream version may have been less readable than the foreach loop.

Java Streams: Combining two collections into a map

I have two Collections, a list of warehouse ids and a collection of widgets. Widgets exist in multiple warehouses in varying quantities:
List<Long> warehouseIds;
List<Widget> widgets;
Here's an example defeinition of classes:
public class Widget {
public Collection<Stock> getStocks();
}
public class Stock {
public Long getWarehouseId();
public Integer getQuantity();
}
I want to use the Streams API to create a Map, where the warehouse ID is the key, and the value is a list of Widgets with the smallest quantity at a particular warehouse. Because multiple widgets could have the same quantity, we return a list.
For example, Warehouse 111 has 5 qty of Widget A, 5 of Widget B, and 8 of Widget C.
Warehouse 222 has 0 qty of Widget A, 5 of Widget B, and 5 of Widget C
The Map returned would have the following entries:
111 => ['WidgetA', 'WidgetB']
222 => ['WidgetA']
Starting the setup of the Map with keys seems pretty easy, but I don't know how to structure the downstream reduction:
warehouseIds.stream().collect(Collectors.groupingBy(
Function::Identity,
HashMap::new,
???...
I think the problem I'm having is reducing Widgets based on the stock warehouse Id, and not knowing how to return a Collector to create this list of Widgets. Here's how I would currently get the list of widgets with the smallest stock at a particular warehouse (represented by someWarehouseId):
widgets.stream().collect(Collectors.groupingBy(
(Widget w)->
w.getStocks()
//for a specific warehouse
.stream().filter(stock->stock.getWarehouseId()==someWarehouseId)
//Get the quantity of stocks for a widget
.collect(Collectors.summingInt(Stock::getQuantity)),
//Use a tree map so the keys are sorted
TreeMap::new,
//Get the first entry
Collectors.toList())).firstEntry().getValue();
Separating this into two tasks using forEach on the warehouse list would make this job easy, but I am wondering if I can do this in a 'one-liner'.
To tacke this problem, we need to use a more proper approach than using a TreeMap to select the values having the smallest quantities.
Consider the following approach:
We make a Stream<Widget> of our initial widgets. We will need to do some processing on the stocks of each widget, but we'll also need to keep the widget around. Let's flatMap that Stream<Widget> into a Stream<Map.Entry<Stock, Widget>>: that new Stream will be composed of each Stock that we have, with its corresponding Widget.
We filter those elements to only keep the Map.Entry<Stock, Widget> where the stock has a warehouseId contained in the warehouseIds list.
Now, we need to group that Stream according to the warehouseId of each Stock. So we use Collectors.groupingBy(classifier, downstream) where the classifier returns that warehouseId.
The downstream collector collects elements that are classified to the same key. In this case, for the Map.Entry<Stock, Widget> elements that were classified to the same warehouseId, we need to keep only those where the stock has the lowest quantity. There are no built-in collectors for this, let's use MoreCollectors.minAll(comparator, downstream) from the StreamEx library. If you prefer not to use the library, I've extracted its code into this answer and will use that.
The comparator simply compares the quantity of each stock in the Map.Entry<Stock, Widget>. This makes sure that we'll keep elements with the lowest quantity for a fixed warehouseId. The downstream collector is used to reduce the collected elements. In this case, we only want to keep the widget, so we use Collectors.mapping(mapper, downstream) where the mapper returns the widget from the Map.Entry<Stock, Widget> and the downstream collectors collect into a list with Collectors.toList().
Sample code:
Map<Long, List<Widget>> map =
widgets.stream()
.flatMap(w -> w.getStocks().stream().map(s -> new AbstractMap.SimpleEntry<>(s, w)))
.filter(e -> warehouseIds.contains(e.getKey().getWarehouseId()))
.collect(Collectors.groupingBy(
e -> e.getKey().getWarehouseId(),
minAll(
Comparator.comparingInt(e -> e.getKey().getQuantity()),
Collectors.mapping(e -> e.getValue(), Collectors.toList())
)
));
with the following minAll collector:
public static <T, A, D> Collector<T, ?, D> minAll(Comparator<? super T> comparator, Collector<T, A, D> downstream) {
return maxAll(comparator.reversed(), downstream);
}
public static <T, A, D> Collector<T, ?, D> maxAll(Comparator<? super T> comparator, Collector<? super T, A, D> downstream) {
final class PairBox<U, V> {
public U a;
public V b;
PairBox(U a, V b) {
this.a = a;
this.b = b;
}
}
Supplier<A> downstreamSupplier = downstream.supplier();
BiConsumer<A, ? super T> downstreamAccumulator = downstream.accumulator();
BinaryOperator<A> downstreamCombiner = downstream.combiner();
Supplier<PairBox<A, T>> supplier = () -> new PairBox<>(downstreamSupplier.get(), null);
BiConsumer<PairBox<A, T>, T> accumulator = (acc, t) -> {
if (acc.b == null) {
downstreamAccumulator.accept(acc.a, t);
acc.b = t;
} else {
int cmp = comparator.compare(t, acc.b);
if (cmp > 0) {
acc.a = downstreamSupplier.get();
acc.b = t;
}
if (cmp >= 0)
downstreamAccumulator.accept(acc.a, t);
}
};
BinaryOperator<PairBox<A, T>> combiner = (acc1, acc2) -> {
if (acc2.b == null) {
return acc1;
}
if (acc1.b == null) {
return acc2;
}
int cmp = comparator.compare(acc1.b, acc2.b);
if (cmp > 0) {
return acc1;
}
if (cmp < 0) {
return acc2;
}
acc1.a = downstreamCombiner.apply(acc1.a, acc2.a);
return acc1;
};
Function<PairBox<A, T>, D> finisher = acc -> downstream.finisher().apply(acc.a);
return Collector.of(supplier, accumulator, combiner, finisher);
}

How to eliminate duplicate entries within a stream based on a own Equal class

I do have a simialar problem like descripted here. But with two differences first I do use the stream api and second I do have an equals() and hashCode() method already. But within the stream the equalitity of the of Blogs are in this context not the same as defined in the Blog class.
Collection<Blog> elements = x.stream()
... // a lot of filter and map stuff
.peek(p -> sysout(p)) // a stream of Blog
.? // how to remove duplicates - .distinct() doesn't work
I do have a class with an equal Method lets call it ContextBlogEqual with the method
public boolean equal(Blog a, Blog b);
Is there any way removing all duplicate entries with my current stream approach based on the ContextBlogEqual#equal method?
I thought already on grouping, but this doesn't work either, because the reason why blogA and blogB is equal isn't just one parameter. Also I have no idea how I could use .reduce(..), because there is useally more than one element left.
In essence, you either have to define hashCode to make your data work with a hashtable, or a total order to make it work with a binary search tree.
For hashtables you'll need to declare a wrapper class which will override equals and hashCode.
For binary trees you can define a Comparator<Blog> which respects your equality definition and adds an arbitrary, but consistent, ordering criterion. Then you can collect into a new TreeSet<Blog>(yourComparator).
First, please note that equal(Blog, Blog) method is not enough for the most scenarios as you will need to pairwise compare all the entries which is not efficient. It's better to define the function which extracts new key from the blog entry. For example, let's consider the following Blog class:
static class Blog {
final String name;
final int id;
final long time;
public Blog(String name, int id, long time) {
this.name = name;
this.id = id;
this.time = time;
}
#Override
public int hashCode() {
return Objects.hash(name, id, time);
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null || getClass() != obj.getClass())
return false;
Blog other = (Blog) obj;
return id == other.id && time == other.time && Objects.equals(name, other.name);
}
public String toString() {
return name+":"+id+":"+time;
}
}
Let's have some test data:
List<Blog> blogs = Arrays.asList(new Blog("foo", 1, 1234),
new Blog("bar", 2, 1345), new Blog("foo", 1, 1345),
new Blog("bar", 2, 1345));
List<Blog> distinctBlogs = blogs.stream().distinct().collect(Collectors.toList());
System.out.println(distinctBlogs);
Here distinctBlogs contains three entries: [foo:1:1234, bar:2:1345, foo:1:1345]. Suppose that it's undesired, because we don't want to compare the time field. The simplest way to create new key is to use Arrays.asList:
Function<Blog, Object> keyExtractor = b -> Arrays.asList(b.name, b.id);
The resulting keys already have proper equals and hashCode implementations.
Now if you fine with terminal operation, you may create a custom collector like this:
List<Blog> distinctByNameId = blogs.stream().collect(
Collectors.collectingAndThen(Collectors.toMap(
keyExtractor, Function.identity(),
(a, b) -> a, LinkedHashMap::new),
map -> new ArrayList<>(map.values())));
System.out.println(distinctByNameId);
Here we use keyExtractor to generate the keys and merge function is (a, b) -> a which means select the previously added entry when repeating key appears. We use LinkedHashMap to preserve the order (omit this parameter if you don't care about order). Finally we dump the map values into the new ArrayList. You can move such collector creation to the separate method and generalize it:
public static <T> Collector<T, ?, List<T>> distinctBy(
Function<? super T, ?> keyExtractor) {
return Collectors.collectingAndThen(
Collectors.toMap(keyExtractor, Function.identity(), (a, b) -> a, LinkedHashMap::new),
map -> new ArrayList<>(map.values()));
}
This way the usage will be simpler:
List<Blog> distinctByNameId = blogs.stream()
.collect(distinctBy(b -> Arrays.asList(b.name, b.id)));
Essentially, you'll need a helper method like this one:
static <T, U> Stream<T> distinct(
Stream<T> stream,
Function<? super T, ? extends U> keyExtractor
) {
final Map<U, String> seen = new ConcurrentHashMap<>();
return stream.filter(t -> seen.put(keyExtractor.apply(t), "") == null);
}
It takes a Stream, and returns a new Stream that contains only distinct values given the keyExtractor. An example:
class O {
final int i;
O(int i) {
this.i = i;
}
#Override
public String toString() {
return "O(" + i + ")";
}
}
distinct(Stream.of(new O(1), new O(1), new O(2)), o -> o.i)
.forEach(System.out::println);
This yields
O(1)
O(2)
Disclaimer
As commented by Tagir Valeev here and in this similar answer by Stuart Marks, this approach has flaws. The operation as implemented here...
is unstable for ordered parallel streams
is not optimal for sequential streams
violates the stateless predicate constraint on Stream.filter()
Wrapping the above in your own library
You can of course extend Stream with your own functionality and implement this new distinct() function in there, e.g. like jOOλ or Javaslang do:
Seq.of(new O(1), new O(1), new O(2))
.distinct(o -> o.i)
.forEach(System.out::println);

Categories