How to use own reduce method to distinct list - java

The distinct method should call the the reduce method with a empty list as identity. How can I use the accumulator to check if a value of the old list is already in the new list.
#Override
public <R> R reduce(R identity, BiFunction<R, ? super E, R> accumulator) {
for (E value : this) {
identity = accumulator.apply(identity, value);
}
return identity;
}
#Override
public List<E> distinct() {
List<E> list = new LinkedList<E>();
return reduce(list, (a, b) -> );
}

You should use contains to check if an element is in the list. If it is, don't add it to the accumulator, otherwise, do add it.
return reduce(list, (acc, element) -> {
if (!acc.contains(element)) {
acc.add(element);
}
return acc;
});

Related

How to write a reduce function for a generic ArrayList<T> via streams

I am tasked to implement a custom class to mimic to properties of a Set (without using any data structures of Sets). As such, I chose to store the elements of type T into an ArrayList as a field for the class.
class CustomSet<T> {
private final ArrayList<T> elementList;
CustomSet() { // when initialising an empty set
this.elementList = new ArrayList<>();
}
private CustomSet(ArrayList<T> otherElementList) { // When passing in a list to construct the set
this.elementList = otherElementList;
}
static <T> CustomSet<T> of(T...elem) {
ArrayList<T> set = new ArrayList<>();
for (T e: elem) {
set.add(e);
}
return new CustomSet<T>(set);
}
// typical add, clear, remove functions for the set which can be done with methods in ArrayList
}
One of my task is to implement a reduce function which takes in a seed and a binary operator to return the reduced value (without the use of explicit loops i.e use streams) such that the following is observed
CustomSet<Integer> thisSet = CustomSet.of(1,2,3,4,5,6);
thisSet.reduce(0, (subtotal, element) -> subtotal + element) // outputs 21
CustomSet<String> otherSet = CustomSet.of("a", "b", "c", "d", "e");
otherSet.reduce("", (partialString, element) -> partialString + element); // outputs "abcde"
I have tried to write my code as such
<U> U reduce(U identity, BiFunction<U, ? super T, U> acc) {
return elementsList.stream().reduce(identity, acc, (x,y) -> x + y);
// error which says Operator "+" cannot be applied to 'U', 'U'
}
However it runs to the following error above. How do I solve this?
EDIT
It would be the same as converting the code below to one which uses streams
<U> U reduce(U identity, BiFunction<U, ? super T, U> acc) {
for (T ele: elementsList) {
identity = acc.apply(identity, ele);
}
return identity;
}
A reduction function must be associative. A BiFunction<U, ? super T, U> acc can not be associative in general. That’s why Java’s Stream API requires a compatible BinaryOperator<U> for its three-arg reduce method. A plus operator is not a compatible third argument and it is impossible to derive a valid third argument from the second, as otherwise, there was no need to insist on that third argument.
We could say, your task is not a real Reduction but a Left-Fold operation that is often confused with Reduction.
There is no direct support for Left Fold operations in the Stream API and while it’s possible to build it without an explicit loop, it’s not a clean operation, but rather a loop in disguise.
<U> U leftFold(U identity, BiFunction<U, ? super T, U> acc) {
List<U> value = Arrays.asList(identity);
elementsList.forEach(t -> value.replaceAll(u -> acc.apply(u, t)));
return value.get(0);
}
I propose this reduce method signature
<U> U reduce(U identity, BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner) {
return elementList.stream().reduce(identity, accumulator, combiner);
}
And use it like this
CustomSet<Integer> thisSet = CustomSet.of(1, 2, 3, 4, 5, 6);
thisSet.reduce(0, (subtotal, element) -> subtotal + element, Integer::sum); // outputs 21
CustomSet<String> otherSet = CustomSet.of("a", "b", "c", "d", "e");
otherSet.reduce("", (partialString, element) -> partialString + element, (a, b) -> a + b); // outputs "abcde"

Java 8 Streams: Collapse/abstract streams parts

Say I have this Stream:
list.stream()
.map(fn1) // part1
.map(fn2) //
.filter(fn3) //
.flatMap(fn4) // part 2
.map(fn5) //
.filter(fn6) //
.map(fn7) //
.collect(Collectors.toList())
How can I make it look like:
list.stream()
.map(fnPart1)
.map(fnPart2)
.collect(Collectors.toList())
Without manually unwinding the fnX parts and putting them together (for maintenance reasons, I want to keep them untouched, and express the fnPartX with them).
You could express and compose it with functions:
Function<Stream<T1>, Stream<T2>> fnPart1 =
s -> s.map(fn1)
.map(fn2)
.filter(fn3);
Function<Stream<T2>, Stream<T3>> fnPart2 =
s -> s.flatMap(fn4)
.map(fn5)
.filter(fn6)
.map(fn7);
fnPart1.andThen(fnPart2).apply(list.stream()).collect(Collectors.toList());
The input and output types of the functions have to match accordingly.
This can be the basis for a more complex composition construct such as:
public class Composer<T>{
private final T element;
private Composer(T element){
this.element = element;
}
public <T2> Composer<T2> andThen(Function<? super T, ? extends T2> f){
return new Composer<>(f.apply(element));
}
public T get(){
return element;
}
public static <T> Composer<T> of(T element){
return new Composer<T>(element);
}
}
This can be used like this:
Composer.of(list.stream())
.andThen(fnPart1)
.andThen(fnPart2)
.get()
.collect(Collectors.toList());
You have to use flatMap not map. I don't know what your types are so I've called them T1, T2, etc.
list.stream()
.flatMap(fnPart1)
.flatMap(fnPart2)
.collect(Collectors.toList())
Stream<T2> fnPart1(T1 t1) {
return Stream.of(t1).map(fn1).map(fn2).filter(fn3);
}
Stream<T3> fnPart2(T2 t2) {
return Stream.of(t2).flatMap(fn4).map(fn5).filter(fn6).map(fn7);
}
Of course you could remove some of the stream operations:
Stream<T2> fnPart1(T1 t1) {
return Stream.of(fn2(fn1(t1))).filter(fn3);
}
Stream<T3> fnPart2(T2 t2) {
return fn4(t2).map(fn5).filter(fn6).map(fn7);
}
Further simplification is possible since fnPart1 and fnPart2 are just dealing with single elements.

How to efficiently compute the maximum value of a collection after applying some function

Suppose you have a method like this that computes the maximum of a Collection for some ToIntFunction:
static <T> void foo1(Collection<? extends T> collection, ToIntFunction<? super T> function) {
if (collection.isEmpty())
throw new NoSuchElementException();
int max = Integer.MIN_VALUE;
T maxT = null;
for (T t : collection) {
int result = function.applyAsInt(t);
if (result >= max) {
max = result;
maxT = t;
}
}
// do something with maxT
}
With Java 8, this could be translated into
static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
T maxT = collection.stream()
.max(Comparator.comparingInt(function))
.get();
// do something with maxT
}
A disadvantage with the new version is that function.applyAsInt is invoked repeatedly for the same value of T. (Specifically if the collection has size n, foo1 invokes applyAsInt n times whereas foo2 invokes it 2n - 2 times).
Disadvantages of the first approach are that the code is less clear and you can't modify it to use parallelism.
Suppose you wanted to do this using parallel streams and only invoke applyAsInt once per element. Can this be written in a simple way?
You can use a custom collector that keeps running pair of the maximum value and the maximum element:
static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
class Pair {
int max = Integer.MIN_VALUE;
T maxT = null;
}
T maxT = collection.stream().collect(Collector.of(
Pair::new,
(p, t) -> {
int result = function.applyAsInt(t);
if (result >= p.max) {
p.max = result;
p.maxT = t;
}
},
(p1, p2) -> p2.max > p1.max ? p2 : p1,
p -> p.maxT
));
// do something with maxT
}
One advantage is that this creates a single Pair intermediate object that is used through-out the collecting process. Each time an element is accepted, this holder is updated with the new maximum. The finisher operation just returns the maximum element and disgards the maximum value.
As I stated in the comments I would suggest introducing an intermediate datastructure like:
static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
if (collection.isEmpty()) {
throw new IllegalArgumentException();
}
class Pair {
final T value;
final int result;
public Pair(T value, int result) {
this.value = value;
this.result = result;
}
public T getValue() {
return value;
}
public int getResult() {
return result;
}
}
T maxT = collection.stream().map(t -> new Pair(t, function.applyAsInt(t)))
.max(Comparator.comparingInt(Pair::getResult)).get().getValue();
// do something with maxT
}
Another way would be to use a memoized version of function:
static <T> void foo2(Collection<? extends T> collection,
ToIntFunction<? super T> function, T defaultValue) {
T maxT = collection.parallelStream()
.max(Comparator.comparingInt(ToIntMemoizer.memoize(function)))
.orElse(defaultValue);
// do something with maxT
}
Where ToIntMemoizer.memoize(function) code would be as follows:
public class ToIntMemoizer<T> {
private final Map<T, Integer> cache = new ConcurrentHashMap<>();
private ToIntMemoizer() {
}
private ToIntFunction<T> doMemoize(ToIntFunction<T> function) {
return input -> cache.computeIfAbsent(input, function::apply);
}
public static <T> ToIntFunction<T> memoize(ToIntFunction<T> function) {
return new ToIntMemoizer<T>().doMemoize(function);
}
}
This uses a ConcurrentHashMap to cache already computed results. If you don't need to support parallelism, you can perfectly use a HashMap.
One disadvantage is that the result of the function needs to be boxed/unboxed. On the other hand, as the function is memoized, a result will be computed only once for each repeated element of the collection. Then, if the function is invoked with a repeated input value, the result will be returned from the cache.
If you don't mind using third-party library, my StreamEx optimizes all these cases in special methods like maxByInt and so on. So you can simply use:
static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
T maxT = StreamEx.of(collection).parallel()
.maxByInt(function)
.get();
// do something with maxT
}
The implementation uses reduce with mutable container. This probably abuses API a little, but works fine for sequential and parallel streams and unlike collect solution defers the container allocation to the first accumulated element (thus no container is allocated if parallel subtask covers no elements which occurs quite often if you have the filtering operation upstream).

Java Streams: Combining two collections into a map

I have two Collections, a list of warehouse ids and a collection of widgets. Widgets exist in multiple warehouses in varying quantities:
List<Long> warehouseIds;
List<Widget> widgets;
Here's an example defeinition of classes:
public class Widget {
public Collection<Stock> getStocks();
}
public class Stock {
public Long getWarehouseId();
public Integer getQuantity();
}
I want to use the Streams API to create a Map, where the warehouse ID is the key, and the value is a list of Widgets with the smallest quantity at a particular warehouse. Because multiple widgets could have the same quantity, we return a list.
For example, Warehouse 111 has 5 qty of Widget A, 5 of Widget B, and 8 of Widget C.
Warehouse 222 has 0 qty of Widget A, 5 of Widget B, and 5 of Widget C
The Map returned would have the following entries:
111 => ['WidgetA', 'WidgetB']
222 => ['WidgetA']
Starting the setup of the Map with keys seems pretty easy, but I don't know how to structure the downstream reduction:
warehouseIds.stream().collect(Collectors.groupingBy(
Function::Identity,
HashMap::new,
???...
I think the problem I'm having is reducing Widgets based on the stock warehouse Id, and not knowing how to return a Collector to create this list of Widgets. Here's how I would currently get the list of widgets with the smallest stock at a particular warehouse (represented by someWarehouseId):
widgets.stream().collect(Collectors.groupingBy(
(Widget w)->
w.getStocks()
//for a specific warehouse
.stream().filter(stock->stock.getWarehouseId()==someWarehouseId)
//Get the quantity of stocks for a widget
.collect(Collectors.summingInt(Stock::getQuantity)),
//Use a tree map so the keys are sorted
TreeMap::new,
//Get the first entry
Collectors.toList())).firstEntry().getValue();
Separating this into two tasks using forEach on the warehouse list would make this job easy, but I am wondering if I can do this in a 'one-liner'.
To tacke this problem, we need to use a more proper approach than using a TreeMap to select the values having the smallest quantities.
Consider the following approach:
We make a Stream<Widget> of our initial widgets. We will need to do some processing on the stocks of each widget, but we'll also need to keep the widget around. Let's flatMap that Stream<Widget> into a Stream<Map.Entry<Stock, Widget>>: that new Stream will be composed of each Stock that we have, with its corresponding Widget.
We filter those elements to only keep the Map.Entry<Stock, Widget> where the stock has a warehouseId contained in the warehouseIds list.
Now, we need to group that Stream according to the warehouseId of each Stock. So we use Collectors.groupingBy(classifier, downstream) where the classifier returns that warehouseId.
The downstream collector collects elements that are classified to the same key. In this case, for the Map.Entry<Stock, Widget> elements that were classified to the same warehouseId, we need to keep only those where the stock has the lowest quantity. There are no built-in collectors for this, let's use MoreCollectors.minAll(comparator, downstream) from the StreamEx library. If you prefer not to use the library, I've extracted its code into this answer and will use that.
The comparator simply compares the quantity of each stock in the Map.Entry<Stock, Widget>. This makes sure that we'll keep elements with the lowest quantity for a fixed warehouseId. The downstream collector is used to reduce the collected elements. In this case, we only want to keep the widget, so we use Collectors.mapping(mapper, downstream) where the mapper returns the widget from the Map.Entry<Stock, Widget> and the downstream collectors collect into a list with Collectors.toList().
Sample code:
Map<Long, List<Widget>> map =
widgets.stream()
.flatMap(w -> w.getStocks().stream().map(s -> new AbstractMap.SimpleEntry<>(s, w)))
.filter(e -> warehouseIds.contains(e.getKey().getWarehouseId()))
.collect(Collectors.groupingBy(
e -> e.getKey().getWarehouseId(),
minAll(
Comparator.comparingInt(e -> e.getKey().getQuantity()),
Collectors.mapping(e -> e.getValue(), Collectors.toList())
)
));
with the following minAll collector:
public static <T, A, D> Collector<T, ?, D> minAll(Comparator<? super T> comparator, Collector<T, A, D> downstream) {
return maxAll(comparator.reversed(), downstream);
}
public static <T, A, D> Collector<T, ?, D> maxAll(Comparator<? super T> comparator, Collector<? super T, A, D> downstream) {
final class PairBox<U, V> {
public U a;
public V b;
PairBox(U a, V b) {
this.a = a;
this.b = b;
}
}
Supplier<A> downstreamSupplier = downstream.supplier();
BiConsumer<A, ? super T> downstreamAccumulator = downstream.accumulator();
BinaryOperator<A> downstreamCombiner = downstream.combiner();
Supplier<PairBox<A, T>> supplier = () -> new PairBox<>(downstreamSupplier.get(), null);
BiConsumer<PairBox<A, T>, T> accumulator = (acc, t) -> {
if (acc.b == null) {
downstreamAccumulator.accept(acc.a, t);
acc.b = t;
} else {
int cmp = comparator.compare(t, acc.b);
if (cmp > 0) {
acc.a = downstreamSupplier.get();
acc.b = t;
}
if (cmp >= 0)
downstreamAccumulator.accept(acc.a, t);
}
};
BinaryOperator<PairBox<A, T>> combiner = (acc1, acc2) -> {
if (acc2.b == null) {
return acc1;
}
if (acc1.b == null) {
return acc2;
}
int cmp = comparator.compare(acc1.b, acc2.b);
if (cmp > 0) {
return acc1;
}
if (cmp < 0) {
return acc2;
}
acc1.a = downstreamCombiner.apply(acc1.a, acc2.a);
return acc1;
};
Function<PairBox<A, T>, D> finisher = acc -> downstream.finisher().apply(acc.a);
return Collector.of(supplier, accumulator, combiner, finisher);
}

Remove duplicates from a list of objects based on property in Java 8 [duplicate]

This question already has answers here:
Java 8 Distinct by property
(34 answers)
Closed 3 years ago.
I am trying to remove duplicates from a List of objects based on some property.
can we do it in a simple way using java 8
List<Employee> employee
Can we remove duplicates from it based on id property of employee. I have seen posts removing duplicate strings form arraylist of string.
You can get a stream from the List and put in in the TreeSet from which you provide a custom comparator that compares id uniquely.
Then if you really need a list you can put then back this collection into an ArrayList.
import static java.util.Comparator.comparingInt;
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.toCollection;
...
List<Employee> unique = employee.stream()
.collect(collectingAndThen(toCollection(() -> new TreeSet<>(comparingInt(Employee::getId))),
ArrayList::new));
Given the example:
List<Employee> employee = Arrays.asList(new Employee(1, "John"), new Employee(1, "Bob"), new Employee(2, "Alice"));
It will output:
[Employee{id=1, name='John'}, Employee{id=2, name='Alice'}]
Another idea could be to use a wrapper that wraps an employee and have the equals and hashcode method based with its id:
class WrapperEmployee {
private Employee e;
public WrapperEmployee(Employee e) {
this.e = e;
}
public Employee unwrap() {
return this.e;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
WrapperEmployee that = (WrapperEmployee) o;
return Objects.equals(e.getId(), that.e.getId());
}
#Override
public int hashCode() {
return Objects.hash(e.getId());
}
}
Then you wrap each instance, call distinct(), unwrap them and collect the result in a list.
List<Employee> unique = employee.stream()
.map(WrapperEmployee::new)
.distinct()
.map(WrapperEmployee::unwrap)
.collect(Collectors.toList());
In fact, I think you can make this wrapper generic by providing a function that will do the comparison:
public class Wrapper<T, U> {
private T t;
private Function<T, U> equalityFunction;
public Wrapper(T t, Function<T, U> equalityFunction) {
this.t = t;
this.equalityFunction = equalityFunction;
}
public T unwrap() {
return this.t;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
#SuppressWarnings("unchecked")
Wrapper<T, U> that = (Wrapper<T, U>) o;
return Objects.equals(equalityFunction.apply(this.t), that.equalityFunction.apply(that.t));
}
#Override
public int hashCode() {
return Objects.hash(equalityFunction.apply(this.t));
}
}
and the mapping will be:
.map(e -> new Wrapper<>(e, Employee::getId))
The easiest way to do it directly in the list is
HashSet<Object> seen=new HashSet<>();
employee.removeIf(e->!seen.add(e.getID()));
removeIf will remove an element if it meets the specified criteria
Set.add will return false if it did not modify the Set, i.e. already contains the value
combining these two, it will remove all elements (employees) whose id has been encountered before
Of course, it only works if the list supports removal of elements.
If you can make use of equals, then filter the list by using distinct within a stream (see answers above). If you can not or don't want to override the equals method, you can filter the stream in the following way for any property, e.g. for the property Name (the same for the property Id etc.):
Set<String> nameSet = new HashSet<>();
List<Employee> employeesDistinctByName = employees.stream()
.filter(e -> nameSet.add(e.getName()))
.collect(Collectors.toList());
Another solution is to use a Predicate, then you can use this in any filter:
public static <T> Predicate<T> distinctBy(Function<? super T, ?> f) {
Set<Object> objects = new ConcurrentHashSet<>();
return t -> objects.add(f.apply(t));
}
Then simply reuse the predicate anywhere:
employees.stream().filter(distinctBy(e -> e.getId));
Note: in the JavaDoc of filter, which says it takes a stateless Predicte. Actually, this works fine even if the stream is parallel.
About other solutions:
1) Using .collect(Collectors.toConcurrentMap(..)).values() is a good solution, but it's annoying if you want to sort and keep the order.
2) stream.removeIf(e->!seen.add(e.getID())); is also another very good solution. But we need to make sure the collection implemented removeIf, for example it will throw exception if we construct the collection use Arrays.asList(..).
Try this code:
Collection<Employee> nonDuplicatedEmployees = employees.stream()
.<Map<Integer, Employee>> collect(HashMap::new,(m,e)->m.put(e.getId(), e), Map::putAll)
.values();
This worked for me:
list.stream().distinct().collect(Collectors.toList());
You need to implement equals, of course
If order does not matter and when it's more performant to run in parallel, Collect to a Map and then get values:
employee.stream().collect(Collectors.toConcurrentMap(Employee::getId, Function.identity(), (p, q) -> p)).values()
There are a lot of good answers here but I didn't find the one about using reduce method. So for your case, you can apply it in following way:
List<Employee> employeeList = employees.stream()
.reduce(new ArrayList<>(), (List<Employee> accumulator, Employee employee) ->
{
if (accumulator.stream().noneMatch(emp -> emp.getId().equals(employee.getId())))
{
accumulator.add(employee);
}
return accumulator;
}, (acc1, acc2) ->
{
acc1.addAll(acc2);
return acc1;
});
Another version which is simple
BiFunction<TreeSet<Employee>,List<Employee> ,TreeSet<Employee>> appendTree = (y,x) -> (y.addAll(x))? y:y;
TreeSet<Employee> outputList = appendTree.apply(new TreeSet<Employee>(Comparator.comparing(p->p.getId())),personList);

Categories