Java 8 streams groupby and count multiple properties - java

I have an object Process that has a date and a boolean error indicator. I want to get a count of total processes and a count of processes with errors for each date. So for example Jun 01 will have counts 2, 1; Jun 02 will have 1, 0 and Jun 03 1, 1. The only way I have been able to do this is streaming twice to get the counts. I have tried implementing a custom collector but haven't been successful. Is there an elegant solution instead of my kludgy method?
final SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
final List<Process> processes = new ArrayList<>();
processes.add(new Process(sdf.parse("2016-06-01"), false));
processes.add(new Process(sdf.parse("2016-06-01"), true));
processes.add(new Process(sdf.parse("2016-06-02"), false));
processes.add(new Process(sdf.parse("2016-06-03"), true));
System.out.println(processes.stream()
.collect(
Collectors.groupingBy(Process::getDate, Collectors.counting()) ));
System.out.println(processes.stream().filter(order -> order.isHasError())
.collect(
Collectors.groupingBy(Process::getDate, Collectors.counting()) ));
private class Process {
private Date date;
private boolean hasError;
public Process(Date date, boolean hasError) {
this.date = date;
this.hasError = hasError;
}
public Date getDate() {
return date;
}
public boolean isHasError() {
return hasError;
}
}
Code after #glee8e's solution and #Holger's tips
Collector<Process, Result, Result> ProcessCollector = Collector.of(
() -> Result::new,
(r, p) -> {
r.increment(0);
if (p.isHasError()) {
r.increment(1);
}
}, (r1, r2) -> {
r1.add(0, r2.get(0));
r1.add(1, r2.get(1));
return r1;
});
Map<Date, Result> results = Processs.stream().collect(groupingBy(Process::getDate, ProcessCollector));
results.entrySet().stream().sorted(Comparator.comparing(Entry::getKey)).forEach(entry -> System.out
.println(String.format("date = %s, %s", sdf.format(entry.getKey()), entry.getValue())));
private class Result {
private AtomicIntegerArray array = new AtomicIntegerArray(2);
public int get(int index) {
return array.get(index);
}
public void increment(int index) {
array.getAndIncrement(index);
}
public void add(int index, int delta) {
array.addAndGet(index, delta);
}
#Override
public String toString() {
return String.format("totalProcesses = %d, totalErrors = %d", array.get(0), array.get(1));
}
}

It is preferable that we add a POJO to store the result, or the combiner function may looks a bit obscure. I declared the POJO as public, but you can change it if you think it better to hide it.
public class Result {
public int all, error;
}
Main code:
// Add it somewhere in this file.
private static final Set <Characteristics> CH_ID = Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH));
//...
// This is main processing code
processes.stream().collect(collectingAndThen(groupingBy(Process::getDate, new Collector<Process, Result, Result> {
#Override
public Supplier<Result> supplier() {
return Result::new;
}
#Override
public BiConsumer<Process, Result> accumlator() {
return (p, r) -> {
r.total++;
if (p.isHasError())
r.error++;
};
}
#Override
public BinaryOperator<Result> combiner() {
return (r1, r2) -> {
r1.total += r2.total;
r1.error += r2.error;
return r1;
};
}
#Override
public Function<Result, Result> finisher() {
return Function.identity();
}
#Override
public Set<Characteristics> characteristics() {
return CH_ID;
}
})));
PS: I assume you have import static java.util.stream.Collectors

Related

Flink: getRecord() not getting called in AggregateFunction

I am trying to create a TumblingWindow on a stream of continuous data and create aggregates within the window. But for some reason, the getResult() does not get called.
public class MyAggregator implements AggregateFunction<Event, MyMetrics, MyMetrics> {
#Override
public MyMetrics createAccumulator() {
return new MyMetrics(0L, 0L);
}
#Override
public MyMetrics add(Event value, MyMetrics accumulator) {
Instant previousValue = ...;
if (previousValue != null) {
Long myWay = ...;
accumulator.setMyWay(myWay);
}
return accumulator;
}
#Override
public MyMetrics getResult(MyMetrics accumulator) {
System.out.println("Inside getResult()");
return accumulator;
}
#Override
public MyMetrics merge(MyMetrics acc1, MyMetrics acc2) {
return new MyMetrics(
acc1.getMyWay() + acc2.getMyWay());
}
}
Note: event.getClientTime() returns an Instant object.
private WatermarkStrategy getWatermarkStrategy() {
return WatermarkStrategy
.<MyEvent>forBoundedOutOfOrderness(Duration.ofMinutes(10))
.withTimestampAssigner(
(event, timestamp) ->
event.getClientTime().toEpochMilli()
);
}
public static void main(String[] args) {
DataStream<MyEvent> watermarkedData = actuals
.assignTimestampsAndWatermarks(
getWatermarkStrategy()
).name("addWatermark");
final OutputTag<MyEvent> lateOutputTag = new OutputTag<MyEvent>("late-data"){};
SingleOutputStreamOperator<OutputModel> output_data = watermarkedData
.keyBy("input_key")
.window(TumblingEventTimeWindows.of(Time.hours(1)))
.sideOutputLateData(lateOutputTag)
.aggregate(
new MyAggregator(),
).name("AggregationRollUp");
output_data.addSink(new PrintSinkFunction<>());
}
Any pointers as to what I am missing here would be helpful.
First check the timing of the data to see if it meets the window trigger conditions
Second may be you can do a test by reducing the window size from 1h to 1min and reducing the watermark region from 10min to 30s

Passing BiPredicate to Stream for Comparing List of Objects

A list of Journeys can only be completed by a person if the journey timetable do not overlap. e.g. this list should return true because dates don't overlap.
Journey 1: "2019-09-10 21:00" --> "2019-09-10 21:10"
Journey 2: "2019-08-11 22:10" --> "2019-08-11 22:20"
Journey 3: "2019-09-10 21:30" --> "2019-09-10 22:00"
I have created a predicate that checks if journey times overlap. I want to use this BiPredicate in a stream. What is the correct approach to this problem?
public class Journey {
public static void main(String[] args) throws Exception {
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("y-M-d H:m");
ArrayList<Route> routes = new ArrayList<>();
// This example should return true because there is no overlap between the routes.
routes.add(new Route(simpleDateFormat.parse("2019-09-10 21:00"), simpleDateFormat.parse("2019-09-10 21:10")));
routes.add(new Route(simpleDateFormat.parse("2019-08-11 22:10"), simpleDateFormat.parse("2019-08-11 22:20")));
routes.add(new Route(simpleDateFormat.parse("2019-09-10 21:30"), simpleDateFormat.parse("2019-09-10 22:00")));
boolean result = travelAllRoutes(routes);
System.out.println(result);
}
public static boolean travelAllRoutes(List<Route> routes) {
BiPredicate<Route, Route> predicate = (r1, r2) -> r1.getEndJourney().before(r2.getStartJourney());
// boolean result = routes.stream(); // use predicate here
return result;
}
}
class Route {
private Date startJourney, endJourney;
public Route(Date startJourney, Date endJourney) {
this.startJourney = startJourney;
this.endJourney = endJourney;
}
public Date getStartJourney() {
return startJourney;
}
public void setStartJourney(Date startJourney) {
this.startJourney = startJourney;
}
public Date getEndJourney() {
return endJourney;
}
public void setEndJourney(Date endJourney) {
this.endJourney = endJourney;
}
}
Don't use Stream there are not useful here, a simple for-loop is perfect
public static boolean travelAllRoutes(List<Route> routes) {
Route lastRoute = null;
routes.sort(Comparator.comparing(Route::getStartJourney));
for (Route r : routes) {
if (lastRoute == null) {
lastRoute = r;
continue;
}
if (lastRoute.getEndJourney().after(r.getStartJourney()))
return false;
lastRoute = r;
}
return true;
}
}
Also I'd suggest to use java.time.LocalDate instead of the old java.util.Date

Custom Collector for Collectors.groupingBy doesn't work as expected

Consider the simple class Foo:
public class Foo {
public Float v1;
public Float v2;
public String name;
public Foo(String name, Float v1, Float v2) {
this.name = name;
this.v1 = v1;
this.v2 = v2;
}
public String getName() {
return name;
}
}
Now, I have a collection of Foos and I'd like to group them by Foo::getName. I wrote a custom Collector to do that but it doesn't seem to work as expected. More precisely, combiner() never gets called. Why?
public class Main {
public static void main(String[] args) {
List<Foo> foos = new ArrayList<>();
foos.add(new Foo("blue", 2f, 2f));
foos.add(new Foo("blue", 2f, 3f));
foos.add(new Foo("green", 3f, 4f));
Map<String, Float> fooGroups = foos.stream().collect(Collectors.groupingBy(Foo::getName, new FooCollector()));
System.out.println(fooGroups);
}
private static class FooCollector implements Collector<Foo, Float, Float> {
#Override
public Supplier<Float> supplier() {
return () -> new Float(0);
}
#Override
public BiConsumer<Float, Foo> accumulator() {
return (v, foo) -> v += foo.v1 * foo.v2;
}
#Override
public BinaryOperator<Float> combiner() {
return (v1, v2) -> v1 + v2;
}
#Override
public Function<Float, Float> finisher() {
return Function.identity();
}
#Override
public Set<Characteristics> characteristics() {
Set<Characteristics> characteristics = new TreeSet<>();
return characteristics;
}
}
}
First, the combiner function does not need to get called if you aren't using multiple threads (parallel stream). The combiner gets called to combine the results of the operation on chunks of your stream. There is no parallelism here so the combiner doesn't need to be called.
You are getting zero values because of your accumulator function. The expression
v += foo.v1 * foo.v2;
will replace v with a new Float object. The original accumulator object is not modified; it is still 0f. Besides, Float, like other numeric wrapper types (and String) is immutable and cannot be changed.
You need some other kind of accumulator object that is mutable.
class FloatAcc {
private Float total;
public FloatAcc(Float initial) {
total = initial;
}
public void accumulate(Float item) {
total += item;
}
public Float get() {
return total;
}
}
Then you can modify your custom Collector to use FloatAcc. Supply a new FloatAcc, call accumulate in the accumulator function, etc.
class FooCollector implements Collector<Foo, FloatAcc, Float> {
#Override
public Supplier<FloatAcc> supplier() {
return () -> new FloatAcc(0f);
}
#Override
public BiConsumer<FloatAcc, Foo> accumulator() {
return (v, foo) -> v.accumulate(foo.v1 * foo.v2);
}
#Override
public BinaryOperator<FloatAcc> combiner() {
return (v1, v2) -> {
v1.accumulate(v2.get());
return v1;
};
}
#Override
public Function<FloatAcc, Float> finisher() {
return FloatAcc::get;
}
#Override
public Set<Characteristics> characteristics() {
Set<Characteristics> characteristics = new TreeSet<>();
return characteristics;
}
}
With these changes I get what you're expecting:
{green=12.0, blue=10.0}
You have an explanation as to why the current collector does not work from rgettman.
It is worth checking to see what helper methods exist to create custom collectors. For example, this entire collector can be defined far more concisely as:
reducing(0.f, v -> v.v1 * v.v2, (a, b) -> a + b)
It is not always possible to use methods like these; but the conciseness (and, presumably, the well-testedness) should make them the first choice when possible.

How to calculate the processing time in rx

For the following flow, I am wondering how I can calculate the time it takes to process all the data in forEach(...).
Observable
.from(1,2,3)
.flatMap(it - {})
.toBlocking()
.forEarch(it -> {//some paring logic here})
EDIT
After reading this tutorial: Leaving the Monad, I feel the simple solution would be to do the following. Let me know if I missed something
List items = Observable
.from(1,2,3)
.flatMap(it - {})
.toList();
long startTime = System.currentTimeMillis();
for(Object it : items)
{
//some parsing here
}
long processingTime = System.currentTimeMillis() - startTime
One option is to create an Observable which will output the timings. You can do this by wrapping your computation with Observable#using:
public class TimerExample {
public static void main(String[] args) {
final PublishSubject<Long> timings = PublishSubject.create();
final Observable<List<Integer>> list = Observable
.just(1, 2, 3)
.flatMap(TimerExample::longRunningComputation)
.toList();
final Observable<List<Integer>> timed
= Observable.using(() -> new Timer(timings), (t) -> list, Timer::time);
timings.subscribe(time -> System.out.println("Time: " + time + "ms"));
List<Integer> ints = timed.toBlocking().last();
System.out.println("ints: " + Joiner.on(", ").join(ints));
ints = timed.toBlocking().last();
System.out.println("ints: " + Joiner.on(", ").join(ints));
}
private static Observable<Integer> longRunningComputation(Integer i) {
return Observable.timer(1, TimeUnit.SECONDS).map(ignored -> i);
}
public static class Timer {
private final long startTime;
private final Observer<Long> timings;
public Timer(Observer<Long> timings) {
this.startTime = System.currentTimeMillis();
this.timings = timings;
}
public void time() {
timings.onNext(System.currentTimeMillis() - startTime);
}
}
}
The timings are in this case print to the console, but you can do with them as you please:
Time: 1089ms
ints: 2, 1, 3
Time: 1003ms
ints: 1, 3, 2
I think this what you want, from your code I split the production of values Observable.range (that should match the Observable.just in your sample) and the pipeline to measure, in this case I added some fake computation.
The idea is to wrap the pipeline you want to measure in a flatmap and add a stopwatch in a single flatmap.
Observable.range(1, 10_000)
.nest()
.flatMap(
o -> {
Observable<Integer> pipelineToMeasure = o.flatMap(i -> {
Random random = new Random(73);
try {
TimeUnit.MILLISECONDS.sleep(random.nextInt(5));
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
return Observable.just(i);
});
Stopwatch measure = Stopwatch.createUnstarted();
return pipelineToMeasure
.doOnSubscribe(measure::start)
.doOnTerminate(() -> {
measure.stop();
System.out.println(measure);
});
}
)
.toBlocking()
.forEach(System.out::println);
Just to avoid confusion I used nest to avoid recreating myself the Observable in the outer flatmap.
Also I'm using the Stopwatch of the Guava library.
To give more information, here's a possible code to measure in the forEach statement when blocking.
MeasurableAction1<Integer> measuring = MeasurableAction1.measure(System.out::println);
Observable
.just(1, 2, 3)
.flatMap(Observable::just)
.toBlocking()
.forEach(measuring.start());
measuring.stop().elapsed(TimeUnit.SECONDS);
And the measuring class :
private static class MeasurableAction1<T> implements Action1<T> {
private Stopwatch measure = Stopwatch.createUnstarted();
private Action1<? super T> action;
public MeasurableAction1(Action1<? super T> action) {
this.action = action;
}
#Override
public void call(T t) {
action.call(t);
}
public MeasurableAction1<T> start() {
measure.start();
return this;
}
public MeasurableAction1<T> stop() {
measure.stop();
return this;
}
public long elapsed(TimeUnit desiredUnit) {
return measure.elapsed(desiredUnit);
}
public static <T> MeasurableAction1<T> measure(Action1<? super T> action) {
return new MeasurableAction1<>(action);
}
}
And better without blocking with a subscriber, note that .subscribe offer more options that the .forEach alias (either when blocking or not):
Observable
.just(1, 2, 3)
.flatMap(Observable::just)
.subscribe(MeasuringSubscriber.measuringSubscriber(
System.out::println,
System.out::println,
System.out::println
));
And subscriber :
private static class MeasuringSubscriber<T> extends Subscriber<T> {
private Stopwatch measure = Stopwatch.createUnstarted();
private Action1<? super T> onNext;
private final Action1<Throwable> onError;
private final Action0 onComplete;
public MeasuringSubscriber(Action1<? super T> onNext, Action1<Throwable> onError, Action0 onComplete) {
this.onNext = onNext;
this.onError = onError;
this.onComplete = onComplete;
}
#Override
public void onCompleted() {
try {
onComplete.call();
} finally {
stopAndPrintMeasure();
}
}
#Override
public void onError(Throwable e) {
try {
onError.call(e);
} finally {
stopAndPrintMeasure();
}
}
#Override
public void onNext(T item) {
onNext.call(item);
}
#Override
public void onStart() {
measure.start();
super.onStart();
}
private void stopAndPrintMeasure() {
measure.stop();
System.out.println("took " + measure);
}
private static <T> MeasuringSubscriber<T> measuringSubscriber(final Action1<? super T> onNext, final Action1<Throwable> onError, final Action0 onComplete) {
return new MeasuringSubscriber<>(onNext, onError, onComplete);
}
}

How can I combine the results of two separate WS calls in Play framework?

I have a controller method which sends two web service requests at the same time, I immediately return a promise for both of them. Now what I want to do is combine the results of the two web service calls into a single result returned to the user. The code I have so far is:
public static Promise<Result> search(String searchTerms) {
final Promise<List<SearchResult>> result1 = webserviceOne(searchTerms);
final Promise<List<SearchResult>> result2 = webserviceTwo(searchTerms);
return result1.flatMap(
new Function<Promise<List<SearchResult>>, Promise<Result>>() {
public Promise<Result> apply(Promise<List<SearchResult>> res1) {
return result2.flatMap(
new Function<Promise<List<SearchResult>>, Result>() {
public Result apply(Promise<List<SearchResult>> res2) {
//TODO: Here I want to combine the two lists of results and return a JSON response
}
}
);
}
}
);
}
How do I do this? I'm finding it really hard to find decent documentation for this sort of thing.
Something like this should do it:
public static Promise<Result> search(String searchTerms) {
final Promise<List<SearchResult>> result1 = webserviceOne(searchTerms);
final Promise<List<SearchResult>> result2 = webserviceTwo(searchTerms);
return result1.flatMap(
new Function<Promise<List<SearchResult>>, Promise<Result>>() {
public Promise<Result> apply(List<SearchResult> res1) {
return result2.flatMap(
new Function<Promise<List<SearchResult>>, Result>() {
public Result apply(List<SearchResult> res2) {
List<SearchResult> newList = new ArrayList<SearchResult>(res1);
newList.addAll(res2);
return ok(toJson(newList));
}
}
);
}
}
);
}
#Override
public Zone buildZone(final GeoPoint center, final int distance) {
Promise<List<Street>> streetPromise = Promise.promise(
new Function0<List<Street>>() {
public List<Street> apply() {
return streetRepository.findByLocation(center.getGeom(), distance);
}
}
);
Promise<List<Place>> placePromise = Promise.promise(
new Function0<List<Place>>() {
public List<Place> apply() {
return placeService.findByLocation(center, distance);
}
}
);
Promise<Zone> result = Promise.sequence(streetPromise, placePromise).map(
new Function<List<List<? extends Object>>, Zone>() {
#Override
public Zone apply(List<List<? extends Object>> lists) throws Throwable {
return new Zone((List<Street>) lists.get(0), (List<Place>) lists.get(1));
}
}
);
return result.get(10, TimeUnit.SECONDS);
}

Categories