Flink: getRecord() not getting called in AggregateFunction

Flink: getRecord() not getting called in AggregateFunction - java

I am trying to create a TumblingWindow on a stream of continuous data and create aggregates within the window. But for some reason, the getResult() does not get called.
public class MyAggregator implements AggregateFunction<Event, MyMetrics, MyMetrics> {
#Override
public MyMetrics createAccumulator() {
return new MyMetrics(0L, 0L);
}
#Override
public MyMetrics add(Event value, MyMetrics accumulator) {
Instant previousValue = ...;
if (previousValue != null) {
Long myWay = ...;
accumulator.setMyWay(myWay);
}
return accumulator;
}
#Override
public MyMetrics getResult(MyMetrics accumulator) {
System.out.println("Inside getResult()");
return accumulator;
}
#Override
public MyMetrics merge(MyMetrics acc1, MyMetrics acc2) {
return new MyMetrics(
acc1.getMyWay() + acc2.getMyWay());
}
}
Note: event.getClientTime() returns an Instant object.
private WatermarkStrategy getWatermarkStrategy() {
return WatermarkStrategy
.<MyEvent>forBoundedOutOfOrderness(Duration.ofMinutes(10))
.withTimestampAssigner(
(event, timestamp) ->
event.getClientTime().toEpochMilli()
);
}
public static void main(String[] args) {
DataStream<MyEvent> watermarkedData = actuals
.assignTimestampsAndWatermarks(
getWatermarkStrategy()
).name("addWatermark");
final OutputTag<MyEvent> lateOutputTag = new OutputTag<MyEvent>("late-data"){};
SingleOutputStreamOperator<OutputModel> output_data = watermarkedData
.keyBy("input_key")
.window(TumblingEventTimeWindows.of(Time.hours(1)))
.sideOutputLateData(lateOutputTag)
.aggregate(
new MyAggregator(),
).name("AggregationRollUp");
output_data.addSink(new PrintSinkFunction<>());
}
Any pointers as to what I am missing here would be helpful.

First check the timing of the data to see if it meets the window trigger conditions
Second may be you can do a test by reducing the window size from 1h to 1min and reducing the watermark region from 10min to 30s

Related

Passing BiPredicate to Stream for Comparing List of Objects

A list of Journeys can only be completed by a person if the journey timetable do not overlap. e.g. this list should return true because dates don't overlap.
Journey 1: "2019-09-10 21:00" --> "2019-09-10 21:10"
Journey 2: "2019-08-11 22:10" --> "2019-08-11 22:20"
Journey 3: "2019-09-10 21:30" --> "2019-09-10 22:00"
I have created a predicate that checks if journey times overlap. I want to use this BiPredicate in a stream. What is the correct approach to this problem?
public class Journey {
public static void main(String[] args) throws Exception {
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("y-M-d H:m");
ArrayList<Route> routes = new ArrayList<>();
// This example should return true because there is no overlap between the routes.
routes.add(new Route(simpleDateFormat.parse("2019-09-10 21:00"), simpleDateFormat.parse("2019-09-10 21:10")));
routes.add(new Route(simpleDateFormat.parse("2019-08-11 22:10"), simpleDateFormat.parse("2019-08-11 22:20")));
routes.add(new Route(simpleDateFormat.parse("2019-09-10 21:30"), simpleDateFormat.parse("2019-09-10 22:00")));
boolean result = travelAllRoutes(routes);
System.out.println(result);
}
public static boolean travelAllRoutes(List<Route> routes) {
BiPredicate<Route, Route> predicate = (r1, r2) -> r1.getEndJourney().before(r2.getStartJourney());
// boolean result = routes.stream(); // use predicate here
return result;
}
}
class Route {
private Date startJourney, endJourney;
public Route(Date startJourney, Date endJourney) {
this.startJourney = startJourney;
this.endJourney = endJourney;
}
public Date getStartJourney() {
return startJourney;
}
public void setStartJourney(Date startJourney) {
this.startJourney = startJourney;
}
public Date getEndJourney() {
return endJourney;
}
public void setEndJourney(Date endJourney) {
this.endJourney = endJourney;
}
}

Don't use Stream there are not useful here, a simple for-loop is perfect
public static boolean travelAllRoutes(List<Route> routes) {
Route lastRoute = null;
routes.sort(Comparator.comparing(Route::getStartJourney));
for (Route r : routes) {
if (lastRoute == null) {
lastRoute = r;
continue;
}
if (lastRoute.getEndJourney().after(r.getStartJourney()))
return false;
lastRoute = r;
}
return true;
}
}
Also I'd suggest to use java.time.LocalDate instead of the old java.util.Date

Java 8 streams groupby and count multiple properties

I have an object Process that has a date and a boolean error indicator. I want to get a count of total processes and a count of processes with errors for each date. So for example Jun 01 will have counts 2, 1; Jun 02 will have 1, 0 and Jun 03 1, 1. The only way I have been able to do this is streaming twice to get the counts. I have tried implementing a custom collector but haven't been successful. Is there an elegant solution instead of my kludgy method?
final SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
final List<Process> processes = new ArrayList<>();
processes.add(new Process(sdf.parse("2016-06-01"), false));
processes.add(new Process(sdf.parse("2016-06-01"), true));
processes.add(new Process(sdf.parse("2016-06-02"), false));
processes.add(new Process(sdf.parse("2016-06-03"), true));
System.out.println(processes.stream()
.collect(
Collectors.groupingBy(Process::getDate, Collectors.counting()) ));
System.out.println(processes.stream().filter(order -> order.isHasError())
.collect(
Collectors.groupingBy(Process::getDate, Collectors.counting()) ));
private class Process {
private Date date;
private boolean hasError;
public Process(Date date, boolean hasError) {
this.date = date;
this.hasError = hasError;
}
public Date getDate() {
return date;
}
public boolean isHasError() {
return hasError;
}
}
Code after #glee8e's solution and #Holger's tips
Collector<Process, Result, Result> ProcessCollector = Collector.of(
() -> Result::new,
(r, p) -> {
r.increment(0);
if (p.isHasError()) {
r.increment(1);
}
}, (r1, r2) -> {
r1.add(0, r2.get(0));
r1.add(1, r2.get(1));
return r1;
});
Map<Date, Result> results = Processs.stream().collect(groupingBy(Process::getDate, ProcessCollector));
results.entrySet().stream().sorted(Comparator.comparing(Entry::getKey)).forEach(entry -> System.out
.println(String.format("date = %s, %s", sdf.format(entry.getKey()), entry.getValue())));
private class Result {
private AtomicIntegerArray array = new AtomicIntegerArray(2);
public int get(int index) {
return array.get(index);
}
public void increment(int index) {
array.getAndIncrement(index);
}
public void add(int index, int delta) {
array.addAndGet(index, delta);
}
#Override
public String toString() {
return String.format("totalProcesses = %d, totalErrors = %d", array.get(0), array.get(1));
}
}

It is preferable that we add a POJO to store the result, or the combiner function may looks a bit obscure. I declared the POJO as public, but you can change it if you think it better to hide it.
public class Result {
public int all, error;
}
Main code:
// Add it somewhere in this file.
private static final Set <Characteristics> CH_ID = Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH));
//...
// This is main processing code
processes.stream().collect(collectingAndThen(groupingBy(Process::getDate, new Collector<Process, Result, Result> {
#Override
public Supplier<Result> supplier() {
return Result::new;
}
#Override
public BiConsumer<Process, Result> accumlator() {
return (p, r) -> {
r.total++;
if (p.isHasError())
r.error++;
};
}
#Override
public BinaryOperator<Result> combiner() {
return (r1, r2) -> {
r1.total += r2.total;
r1.error += r2.error;
return r1;
};
}
#Override
public Function<Result, Result> finisher() {
return Function.identity();
}
#Override
public Set<Characteristics> characteristics() {
return CH_ID;
}
})));
PS: I assume you have import static java.util.stream.Collectors

How to calculate the processing time in rx

For the following flow, I am wondering how I can calculate the time it takes to process all the data in forEach(...).
Observable
.from(1,2,3)
.flatMap(it - {})
.toBlocking()
.forEarch(it -> {//some paring logic here})
EDIT
After reading this tutorial: Leaving the Monad, I feel the simple solution would be to do the following. Let me know if I missed something
List items = Observable
.from(1,2,3)
.flatMap(it - {})
.toList();
long startTime = System.currentTimeMillis();
for(Object it : items)
{
//some parsing here
}
long processingTime = System.currentTimeMillis() - startTime

One option is to create an Observable which will output the timings. You can do this by wrapping your computation with Observable#using:
public class TimerExample {
public static void main(String[] args) {
final PublishSubject<Long> timings = PublishSubject.create();
final Observable<List<Integer>> list = Observable
.just(1, 2, 3)
.flatMap(TimerExample::longRunningComputation)
.toList();
final Observable<List<Integer>> timed
= Observable.using(() -> new Timer(timings), (t) -> list, Timer::time);
timings.subscribe(time -> System.out.println("Time: " + time + "ms"));
List<Integer> ints = timed.toBlocking().last();
System.out.println("ints: " + Joiner.on(", ").join(ints));
ints = timed.toBlocking().last();
System.out.println("ints: " + Joiner.on(", ").join(ints));
}
private static Observable<Integer> longRunningComputation(Integer i) {
return Observable.timer(1, TimeUnit.SECONDS).map(ignored -> i);
}
public static class Timer {
private final long startTime;
private final Observer<Long> timings;
public Timer(Observer<Long> timings) {
this.startTime = System.currentTimeMillis();
this.timings = timings;
}
public void time() {
timings.onNext(System.currentTimeMillis() - startTime);
}
}
}
The timings are in this case print to the console, but you can do with them as you please:
Time: 1089ms
ints: 2, 1, 3
Time: 1003ms
ints: 1, 3, 2

I think this what you want, from your code I split the production of values Observable.range (that should match the Observable.just in your sample) and the pipeline to measure, in this case I added some fake computation.
The idea is to wrap the pipeline you want to measure in a flatmap and add a stopwatch in a single flatmap.
Observable.range(1, 10_000)
.nest()
.flatMap(
o -> {
Observable<Integer> pipelineToMeasure = o.flatMap(i -> {
Random random = new Random(73);
try {
TimeUnit.MILLISECONDS.sleep(random.nextInt(5));
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
return Observable.just(i);
});
Stopwatch measure = Stopwatch.createUnstarted();
return pipelineToMeasure
.doOnSubscribe(measure::start)
.doOnTerminate(() -> {
measure.stop();
System.out.println(measure);
});
}
)
.toBlocking()
.forEach(System.out::println);
Just to avoid confusion I used nest to avoid recreating myself the Observable in the outer flatmap.
Also I'm using the Stopwatch of the Guava library.
To give more information, here's a possible code to measure in the forEach statement when blocking.
MeasurableAction1<Integer> measuring = MeasurableAction1.measure(System.out::println);
Observable
.just(1, 2, 3)
.flatMap(Observable::just)
.toBlocking()
.forEach(measuring.start());
measuring.stop().elapsed(TimeUnit.SECONDS);
And the measuring class :
private static class MeasurableAction1<T> implements Action1<T> {
private Stopwatch measure = Stopwatch.createUnstarted();
private Action1<? super T> action;
public MeasurableAction1(Action1<? super T> action) {
this.action = action;
}
#Override
public void call(T t) {
action.call(t);
}
public MeasurableAction1<T> start() {
measure.start();
return this;
}
public MeasurableAction1<T> stop() {
measure.stop();
return this;
}
public long elapsed(TimeUnit desiredUnit) {
return measure.elapsed(desiredUnit);
}
public static <T> MeasurableAction1<T> measure(Action1<? super T> action) {
return new MeasurableAction1<>(action);
}
}
And better without blocking with a subscriber, note that .subscribe offer more options that the .forEach alias (either when blocking or not):
Observable
.just(1, 2, 3)
.flatMap(Observable::just)
.subscribe(MeasuringSubscriber.measuringSubscriber(
System.out::println,
System.out::println,
System.out::println
));
And subscriber :
private static class MeasuringSubscriber<T> extends Subscriber<T> {
private Stopwatch measure = Stopwatch.createUnstarted();
private Action1<? super T> onNext;
private final Action1<Throwable> onError;
private final Action0 onComplete;
public MeasuringSubscriber(Action1<? super T> onNext, Action1<Throwable> onError, Action0 onComplete) {
this.onNext = onNext;
this.onError = onError;
this.onComplete = onComplete;
}
#Override
public void onCompleted() {
try {
onComplete.call();
} finally {
stopAndPrintMeasure();
}
}
#Override
public void onError(Throwable e) {
try {
onError.call(e);
} finally {
stopAndPrintMeasure();
}
}
#Override
public void onNext(T item) {
onNext.call(item);
}
#Override
public void onStart() {
measure.start();
super.onStart();
}
private void stopAndPrintMeasure() {
measure.stop();
System.out.println("took " + measure);
}
private static <T> MeasuringSubscriber<T> measuringSubscriber(final Action1<? super T> onNext, final Action1<Throwable> onError, final Action0 onComplete) {
return new MeasuringSubscriber<>(onNext, onError, onComplete);
}
}

How can I combine the results of two separate WS calls in Play framework?

I have a controller method which sends two web service requests at the same time, I immediately return a promise for both of them. Now what I want to do is combine the results of the two web service calls into a single result returned to the user. The code I have so far is:
public static Promise<Result> search(String searchTerms) {
final Promise<List<SearchResult>> result1 = webserviceOne(searchTerms);
final Promise<List<SearchResult>> result2 = webserviceTwo(searchTerms);
return result1.flatMap(
new Function<Promise<List<SearchResult>>, Promise<Result>>() {
public Promise<Result> apply(Promise<List<SearchResult>> res1) {
return result2.flatMap(
new Function<Promise<List<SearchResult>>, Result>() {
public Result apply(Promise<List<SearchResult>> res2) {
//TODO: Here I want to combine the two lists of results and return a JSON response
}
}
);
}
}
);
}
How do I do this? I'm finding it really hard to find decent documentation for this sort of thing.

Something like this should do it:
public static Promise<Result> search(String searchTerms) {
final Promise<List<SearchResult>> result1 = webserviceOne(searchTerms);
final Promise<List<SearchResult>> result2 = webserviceTwo(searchTerms);
return result1.flatMap(
new Function<Promise<List<SearchResult>>, Promise<Result>>() {
public Promise<Result> apply(List<SearchResult> res1) {
return result2.flatMap(
new Function<Promise<List<SearchResult>>, Result>() {
public Result apply(List<SearchResult> res2) {
List<SearchResult> newList = new ArrayList<SearchResult>(res1);
newList.addAll(res2);
return ok(toJson(newList));
}
}
);
}
}
);
}

#Override
public Zone buildZone(final GeoPoint center, final int distance) {
Promise<List<Street>> streetPromise = Promise.promise(
new Function0<List<Street>>() {
public List<Street> apply() {
return streetRepository.findByLocation(center.getGeom(), distance);
}
}
);
Promise<List<Place>> placePromise = Promise.promise(
new Function0<List<Place>>() {
public List<Place> apply() {
return placeService.findByLocation(center, distance);
}
}
);
Promise<Zone> result = Promise.sequence(streetPromise, placePromise).map(
new Function<List<List<? extends Object>>, Zone>() {
#Override
public Zone apply(List<List<? extends Object>> lists) throws Throwable {
return new Zone((List<Street>) lists.get(0), (List<Place>) lists.get(1));
}
}
);
return result.get(10, TimeUnit.SECONDS);
}

Supplying call latency as a IntStream

I am trying to make use of Java 8 and streams and one of the things I am trying to replace is a system we have where we
Use an aspect to measure call latency (per config period of time) to out webservices and then
Feed those results into a Complex Event Processor (esper) so that
We can send out alert notifications
So, one step at a time. For the first step, I need to produce a stream (I think) that allows me to feed those latency numbers into existing listeners. Understanding that, getting the next number in series might have to wait until there is a call.
How can I do that? Here is the latency aspect with comments.
public class ProfilingAspect {
private ProfilingAction action;
public ProfilingAspect(ProfilingAction action) {
this.action = action;
}
public Object doAroundAdvice(ProceedingJoinPoint jp) throws Throwable{
long startTime = System.currentTimeMillis();
Object retVal = null;
Throwable error = null;
try{
retVal = jp.proceed();
}catch (Throwable t){
error = t;
}
Class withinType = jp.getSourceLocation().getWithinType();
String methodName = jp.getSignature().getName();
long endTime = System.currentTimeMillis();
long runningTime = endTime - startTime;
// Let the IntStream know we have a new latency. Or really, we have an object
// stream with all this extra data
action.perform(withinType, methodName, jp.getArgs(), runningTime, error);
if( error != null ){
throw error;
}
return retVal;
}
}

Ok, I have a working example. It doesn't handle the situation where I have to buffer up results though is the stream isn't being read fast enough. I am open to some improvement
public class LatencySupplier implements Supplier<SomeFancyObject> {
private Random r = new Random();
#Override
public SomeFancyObject get() {
try {
Thread.sleep(100 + r.nextInt(1000));
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
return new SomeFancyObject(10 + r.nextInt(1000));
}
}
public class SomeFancyObject {
private static String[] someGroups = {"Group1","Group2","Group3"};
private final String group;
private int value;
public SomeFancyObject(int value) {
this.value = value;
this.group = WSRandom.selectOne(someGroups);
}
public String getGroup() {
return group;
}
public int getValue() {
return value;
}
#Override
public String toString() {
return value + "";
}
}
My next step is to create a stream by time so I can do avg/5 min, etc.
public class Sample {
public static void main(String[] args) throws InterruptedException {
Stream<SomeFancyObject> latencyStream = Stream.generate(new LatencySupplier());
Map<Object,List<SomeFancyObject>> collect = latencyStream.limit(10).collect(Collectors.groupingBy(sfo -> sfo.getGroup()));
System.out.println(collect);
Object o = new Object();
synchronized (o){
o.wait();
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Flink: getRecord() not getting called in AggregateFunction - java

First check the timing of the data to see if it meets the window trigger conditions Second may be you can do a test by reducing the window size from 1h to 1min and reducing the watermark region from 10min to 30s

Related

Passing BiPredicate to Stream for Comparing List of Objects

Java 8 streams groupby and count multiple properties

How to calculate the processing time in rx

How can I combine the results of two separate WS calls in Play framework?

Supplying call latency as a IntStream

Categories

Resources