Kafka KStreams Issue in Aggregation with Time Window - java

I have an issue with KStreams aggregation and windows. I want to aggregate a record into a list of records which have the same key as long as it falls inside a time window.
I have chosen SessionWindows because I have to work with a moving window inside a session: let's say record A arrives at 10:00:00; then every other record with the same key that arrives
inside the 10 second window time (until 10:00:10) will fall into the same session, bearing in mind that if it arrives at 10:00:03, the window will move until 10:00:13 (+10s).
That leads us to have a moving window of +10s from the last record received for a given key.
Now the problem: I want to obtain the last aggregated result. I have used .suppress() to indicate that I don't want any intermediate results, I just want the last one when the window closes. This
is not working fine because while it doesn't send any intermediate aggregated result, when the time window ends, I don't get any result. I have noted that in order to receive it I need to publish another
message into the topic, something which is in my case impossible.
Reading about .suppress() I have come to the conclusion that it may not be the way to achieve what I want, that's why my question is: how can I force the window to close and send the latest aggregated calculated result?
#StreamListener(ExtractContractBinding.RECEIVE_PAGE)
#SendTo(ExtractCommunicationBinding.AGGREGATED_PAGES)
public KStream<String, List<Records>> aggregatePages(KStream<?, Record> input) {
input.map(this::getRecord)
.groupBy(keyOfElement)
.windowedBy(SessionWindows.with(Duration.ofSeconds(10L)).grace(Duration.ofSeconds(10L)))
.aggregate(...do stuff...)
.suppress(Suppressed.untilWindowCloses(unbounded()))
.toStream()
.map(this::createAggregatedResult);
}

In short, the reason why this happens is because in KStreams, and most other stream processing engines that compute aggregations, time works based on event time.
https://kafka.apache.org/0101/documentation/streams#streams_time
In other words the window cannot close until a new message arrives beyond your time window + grace time that accounts for late arriving messages.
Moreover, based on some unit tests I’ve been writing recently I’m inclined to believe that the second message needs to land in the same partition as the previous message for event time to move forward. In practice, when you run in production and presumably process hundreds of messages per second this becomes unnoticeable.
Let me also add that you can implement custom timestamp extractor which allows you fine-grained control in terms of which time window a particular message lands in.
how can I force the window to close and send the latest aggregated calculated result?
To finally answer your question, it’s not possible to force the time window to close without emitting an extra message to the source topic.

Related

Kafka Streams Session Windows with Punctuator

I'm building a Kafka Streams application where I want to make use of Session Windows.
Say my session is configured as follows:
// Inactivity gap is 5 seconds
// Grace period is 1 second
Duration inactivityGapDuration = Duration.ofSeconds(5);
Duration graceDuration = Duration.ofSeconds(1);
KStream<Windowed<String>, EventData> windowedListKStream = groupedStream.windowedBy(
SessionWindows.ofInactivityGapAndGrace(inactivityGapDuration, graceDuration))
.aggregate(...)
.suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded()))
.toStream();
And given the following stream events:
Event Key
Time
A
10
B
12
Based on reading the docs and experiments I expect this will create 2 session windows: one with key A and one with key B.
Now say I receive this next event:
Event Key
Time
B
20
This will close the window with key B, but the window with key A will remain open. That is to say, when an event for a given key is received, only the stream time for the windows that have that key will advance. Is my understanding here correct?
If so, then this behavior is not exactly what I need. What I need is if I never see another event with key A then for the key A window to eventually close.
I think this is where the Punctuator can come in. However, if I read the docs correctly then I would need to basically re-implement the Session Window logic using the Processor API if I want to add a Punctuator. As far as I can tell I can't inject a Punctuator event into the session window DSL implementation in order to move the stream time along.
If all of the above is correct, then this seems like a big lift for what seems like a simple operation. Am I missing some other feature that would make this a simpler implementation?
Thank you!

Apache Beam - KafkaIO Sliding Window processing

I have a Beam pipeline to read Kafka avro messages based on java SDK.The pipeline receives the message and tries to create Sliding Window,
PCollection<AvroMessage> message_timestamped =
messageValues
.apply(
"append event time for PCollection records",
WithTimestamps.of(
(AvroMessage rec) -> new Instant(rec.getTime())));
PCollection<AvroMessage> messages_Windowed =
message_timestamped
.apply(
Window
.<AvroMessage>into(
SlidingWindows
.of(Duration.standardMinutes(2))
.every(Duration.standardMinutes(1)))
.discardingFiredPanes());
Does the window get invoked after 2 Minutes or a trigger configuration is necessary.I tried to access the Window pane information as part of ParDo but it is getting triggered for each received message and it doesn't wait to accumulate the messages for configured 2 minutes. What kind of trigger is required(after 2 minutes - process only current window messages)?
Do I need to include any specific configuration to run with unbounded kafka messages?
I have used the timestamppolicy to use the message timestamp during the KafkaIO read operation,
.withTimestampPolicyFactory(
(tp, previousWaterMark) -> new CustomFieldTimePolicy(previousWaterMark))
It is important to consider that windows and triggers have very different purposes:
Windows are based on the timestamps in the data, not on when they arrive or when they are processed. I find the best way to think about "windows" is as a secondary key. When data is unbounded/infinite, you need one of the grouping keys to have an "end" - a timestamp when you can say they are "done". Windows provide this "end". If you want to control how your data is aggregated, use windows.
Triggers are a way to try to control how output flows through your pipeline. They are not closely related to your business logic. If you want to manage the flow of data, use triggers.
To answer your specific questions:
Windows do not wait. An element that arrives may be assigned to a window that is "done" 1ms after it arrives. This is just fine.
Since you have not changed the default trigger, you will get one output with all of the elements for a window.
You also do not need discardingFiredPanes. Your configuration only produces one output per aggregation, so this has no effect.
But there is actually a problem that you will want to fix: the watermark (this controls when a window is "done") is determined by the source. Using WithTimestamps does not change the watermark. You will need to specify the timestamp in the KafkaIO transform, using withTimestampPolicyFactory. Otherwise, the watermark will move according to the publish time and may declare data late or drop data.

How to create streaming Beam pipeline that is triggered once and only once in a fixed interval

I need to create an Apache Beam (Java) streaming job that should start once (and only once) every 60 seconds.
I got it working correctly using DirectRunner by using GenerateSequence, Window, and Combine.
However when I run it on Google Dataflow, sometimes it is triggered more than once within the 60 seconds window. I am guessing it has something to do with delays and out of order messages.
Pipeline pipeline = Pipeline.create(options);
pipeline
// Jenerate a tick every 15 seconds
.apply("Ticker", GenerateSequence.from(0).withRate(1, Duration.standardSeconds(15)))
// Just to check if individual ticks are being generated once every 15 second
.apply(ParDo.of(new DoFn<Long, Long>() {
#ProcessElement
public void processElement(#Element Long tick, OutputReceiver<Long> out) {
ZonedDateTime currentInstant = Instant.now().atZone(ZoneId.of("Asia/Jakarta"));
LOG.warn("-" + tick + "-" + currentInstant.toString());
out.output(word);
}
}
))
// 60 Second window
.apply("Window", Window.<Long>into(FixedWindows.of(Duration.standardSeconds(60))))
// Emit once per 60 second
.apply("Cobmine window into one", Combine.globally(Count.<Long>combineFn()).withoutDefaults())
.apply("START", ParDo.of(new DoFn<Long, ZonedDateTime>() {
#ProcessElement
public void processElement(#Element Long count, OutputReceiver<ZonedDateTime> out) {
ZonedDateTime currentInstant = Instant.now().atZone(ZoneId.of("Asia/Jakarta"));
// LOG just to check
// This log is sometimes printed more than once within 60 seconds
LOG.warn("x" + count + "-" + currentInstant.toString());
out.output(currentInstant);
}
}
));
It works most of the time, except once every 5 or 10 minutes at random I see two outputs in the same minute. How do I ensure "START" above runs once every 60 seconds? Thanks.
Short answer: you can't currently, Beam model is focused on event-time processing and correct handling of late data.
Workaround: you can define a processing-time timer, but you will have to deal with outputs and handling of the timer and late data manually, see this or this.
More details:
Windows and triggers in Beam are usually defined in event time, not in processing time. This way if you have late data coming after you already emitted the results for a window, late data still ends up in the correct window and results can be re-calculated for that window. Beam model allows you to express that logic and most of its functionality is tailored for that.
This also means that usually there is no requirement for a Beam pipeline to emit results at some specific real-world time, e.g. it doesn't make sense to say things like - "aggregate the events that belong to some window based on the data in the events themselves, and then output that window every minute". Beam runner aggregates the data for the window, possibly waits for the late data, and then emits results as soon as it deems right. The condition when the data is ready to be emitted is specified by a trigger. But that's just that - a condition when the window data is ready to be emitted, it doesn't actually force the runner to emit it. So the runner can emit it at any point in time after the trigger condition is met and the results are going to be correct, i.e. if more events have arrived since timer condition was met, only the ones that belong to a concrete window will be processed in that window.
Event-time windowing doesn't work with processing-time triggering and there are no convenient primitives (triggers/windows) in Beam to deal with processing time in presence of late data. And in this model if you use a trigger that only fires once, you lose the late data, and you still don't have a way to define a robust processing-time trigger. To build something like that you have to be able to specify things like the real-life point in time from which to start measuring the processing time from, and you will have to deal with issues of different processing time and delays that can happen across a large fleet of worker machines. This just is not part of Beam at the moment.
There are efforts in Beam community that will enable this use case, e.g. sink triggers and retractions that will allow you to define your pipeline in event-time space but remove the need for complex event-time triggers. The results could be either immediately updated/recalculated and emitted, or the trigger can be specified at a sink like "I want the output table to be updated every minute". And the results will be updated and recalculated for late data automatically without your involvement. These efforts are far from completion though at this point, so your best bet currently is either using one of the existing triggers or manually handling everything with timers.

Kafka Stream count on time window not reporting zero values

I'm using a Kafka streams to calculate how many events occurred in last 3 minutes using a hopping time window:
public class ViewCountAggregator {
void buildStream(KStreamBuilder builder) {
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
KStream<String, String> views = builder.stream(stringSerde, stringSerde, "streams-view-count-input");
KStream<String, Long> viewCount = views
.groupBy((key, value) -> value)
.count(TimeWindows.of(TimeUnit.MINUTES.toMillis(3)).advanceBy(TimeUnit.MINUTES.toMillis(1)))
.toStream()
.map((key, value) -> new KeyValue<>(key.key(), value));
viewCount.to(stringSerde, longSerde, "streams-view-count-output");
}
public static void main(String[] args) throws Exception {
// some not so important initialization code
...
}
}
When running a consumer and pushing some messages to an input topic it receives following updates as the time passes:
single 1
single 1
single 1
five 1
five 4
five 5
five 4
five 1
Which is almost correct, but it never receives updates for:
single 0
five 0
Without it my consumer that updates a counter will never set it back to zero when there are no events for a longer period of time. I'm expecting consumed messages to look like this:
single 1
single 1
single 1
single 0
five 1
five 4
five 5
five 4
five 1
five 0
Is there some configuration option / argument I'm missing that would help me achieving such behavior?
Which is almost correct, but it never receives updates for:
First, the computed output is correct.
Second, why is it correct:
If you apply a windowed aggregate, only those windows that do have actual content are created (all other systems I am familiar with, would produce the same output). Thus, if for some key, there is not data for a time period longer than the window size, there is no window instantiated and thus, there is also no count at all.
The reason to not instantiate windows if there is no content is quite simple: the processor cannot know all keys. In your example, you have two keys, but maybe later on there might come up a third key. Would you expect to get <thirdKey,0> from the beginning on? Also, as data streams are infinite in nature, keys might go away and never reappear. If you remember all seen keys, and emit <key,0> if there is no data for a key that disappeared, would you emit <key,0> for ever?
I don't want to say that your expected result/semantics does not make sense. It's just a very specific use case of yours and not applicable in general. Hence, stream processors don't implement it.
Third: What can you do?
There are multiple options:
Your consumer can keep track of what keys it did see, and using the embedded record timestamps figures out if a key is "missing" and then set the counter to zero for this key (for this, it might also help to remove the map step and preserve the Windowed<K> type for the key, such that the consumer get the information to which window a record belongs)
Add a stateful #transform() step in your Stream application that does the same thing as described in (1). For this, it might be helpful to register a punctuation call back.
Approach (2) should make it easier to track keys, as you can attach a state store to your transform step and thus don't need to deal with state (and failure/recovery) in your downstream consumer.
However, the tricky part for both approaches is still to decide when a key is missing, i.e., how long do you wait until you produce <key,0>. Note, that data might be late arriving (aka out-of-order) and even if you did emit <key,0> a late arriving record might producer a <key,1> message after your code did emit a <key,0> record. But maybe this is not really an issue for your case as it seems you use the latest window only anyways.
Last but not least one more comment: It seems that you are using only the latest count and that newer windows overwrite older windows in your downstream consumer. Thus, it might be worth to explore "Interactive Queries" to tap into the state of your count operator directly instead of consumer the topic and updating some other state. This might allow you to redesign and simplify you downstream application significantly. Check out the docs and a very good blog post about Interactive Queries for more details.

Parallelism and Failover of a Sequential Data

Good time guys!
We have a pretty straightforward application-adapter: once in 30 seconds it reads records from a database (can't write to it) of one system, converts each of these records into an internal format, performs filtering, encrichment, ..., and, finally, transforms the resulting, let's say, entities into an xml format and sends them via a JMS to other system. Nothing new.
Let's add some spice here: records in the database are sequential (that means that their identifies are generated by a sequence), and when it is time to read a new bunch of records, we get a last-processed-sequence-number -- which is stored in our internal databese and updated each time the next record is processed (sent to the JMS) -- and start reading from that record (+1).
The problem is our customers gave us an NFR: processing of a read record bunch must not last longer than 30 seconds. As far as there are a lot of steps in the workflow (with some pretty long running ones), and it is possible to get a pretty big amount of records, and as far as we process them one by one, it can take more than 30 seconds.
Because of all the above I want to ask 2 questions:
1) Is there an approach of a parallel processing of sequential data, maybe with one or several intermediate storages, or Disruptor patern, or cqrs-like, or a notification-based, or ... that provides a possibility of working in such a system?
2) A general one. I need to store a last-processed-number and send an entity to the JMS. If I save a number to a database and then some problem raises with the JMS, on an application's restart my adapter will think that it successfuly sended the entity, which is not true and it won't be ever received. If I send an entity and after that try so save a number to a database and get an exception, on an application's restart a reprocessing will be performed which will lead to duplications in the JMS. I'm not sure that xa transactions will help here or some kind of a last resorce gambit...
Could somebody, please, share experience or ideas?
Thanks in advance!
1) 30 seconds is a long time and you can do a lot in that time esp with more than one CPU. Without specifics I can only say it is likely you can make it faster if you profile it and use more CPUs.
2) You can update the database before you send and listen to the JMS queue yourself to see it was received by the broker.
Dimitry - I don't know the detail around your problem so I'm just going to make a set of assumptions. I hope it willtrigger an idea that will lead to the solution at least.
Here goes:
Grab you list of items to process.
Store the last id (and maybe the starting id)
Process each item on a different thread (suggest using Tasks).
Record any failed item in a local failed queue.
When you grab the next bunch, ensure you process the failed queue first.
Have a way of determining a max number of retries and a way of moving/marking it as permanently failed.
Not sure if that was what you were after. NServiceBus has a retry process where the gap between each retry gets longer up to a point, then it is marked as failed.
Folks, finally we ended up with the following solution. We implemented a kind of the Actor Model. The idea is the following.
There are two main (internal) database tables for our application, let's call them READ_DATA_INFO, which contains a last-read-record-number of the 'source' external system, and DUMPED_DATA, which stores a metadata about each read record of the source system. This is how it all works: each n (a configurable property) seconds a service bus reads the last processed identifier of the source system and sends a request to the source system to get new records from it. If there are several new records, they are being wrapped with a DumpRecordBunchMessage message and sent to a DumpActor class. This class begins a transaction which comprises two operations: update the last-read-record-number (the READ_DATA_INFO table) and save a metadata about each reacord (the DUMPED_DATA table) (each dumped record gets the 'NEW' status. When a record is successfully processed, it gets the 'COMPLETED' status; otherwise - the 'FAILED' status). In case of a successfull transaction commit each of those records is wrapped with a RecordMessage message class and send to next processing actor; otherwise those records are just skipped - they would be reread after next n seconds.
There are three interesting points:
an application's disaster recovery. What if our application will be stopped somehow at the middle of a processing. No problem, at an application's startup (#PostConstruct marked method) we find all the records with the 'NEW' statuses at the DUMPED_DATA table and with a help of a stored metadata rebuild restore them from the source system.
parallel processing. After all records are successfully dumped, they become independent, which means that they can be processed in parallel. We introduced several mechanisms of a parallelism and a loa balancing. The simplest one is a round robin approach. Each processing actor consists of a parant actor (load balancer) and a configurable set of it's child actors (worker). When a new message arrives to the parent actor's queue, it dispatches it to the next worker.
duplicate record prevention. This is the most interesting one. Let's assume that we read data each 5 seconds. If there is an actor with a long running operation, it is possible to have several tryings to read from the source system's database starting from the same last-read-record number. Thus there would potentially be a lot duplicate records dumped and processed. In order to prevent this we added a CAS-like check of DumpActor's messages: if the last-read-record from a message is equal to a one from the DUMPED_DATA table, this message should be processed (no messages were processed before it); otherwise this message is rejected. Rather simple, but powerfull.
I hope this overview will help somebody. Have a good time!

Categories