Apache Flink: Windowed ReduceFunction is never executed

Apache Flink: Windowed ReduceFunction is never executed - java

below is code snippet, where I'm using a Tumbling EventTime based window
DataStream<OHLC> ohlcStream = stockStream.assignTimestampsAndWatermarks(new TimestampExtractor()).map(new mapStockToOhlc()).keyBy((KeySelector<OHLC, Long>) o -> o.getMinuteKey())
.timeWindow(Time.seconds(60))
.reduce(new myAggFunction());
Unfortunatelly, it looks like it never exectutes the reduce function. If use code above w/o windowing, reduce function works fine. Below is code for TimestampExtractor. The 30 seconds watermark delay serves just as a testing value, but the one minute tumbling window is m
public static class TimestampExtractor implements AssignerWithPeriodicWatermarks<StockTrade> {
#Nullable
#Override
public Watermark getCurrentWatermark() {
return new Watermark(System.currentTimeMillis() - 30000);
}
#Override
public long extractTimestamp(StockTrade stockTrade, long l) {
BigDecimal bd = new BigDecimal(stockTrade.getTime());
// bd contains miliseconds timestamp 1498658629.036
return bd.longValue();
}
}
bd.longValue() which returns seconds timestamp 1498658629, as my window is defined also in seconds.
When I used bd.longValue()/60, which returns minute timestamp, reduce function is called. My output file than contains all records for each reduce operation
{time=1498717692.000, minuteTime=24978628, n=1, open=2248.0}
{time=1498717692.000, minuteTime=24978628, n=2, open=2248.0}
...
{time=1498717692.000, minuteTime=24978628, n=8, open=2248.0}
So, can anyone explain to me, what is happening? Thx a lot.

Normally watermarks should be relative to the timestamps in your data, and should not be based on the system clock. One of the great things about working with event time is that the same application can be used to reprocess historic data or to process current data, but that's not possible if you compare your timestamps to the the system clock, as you've done here.
A watermark can be thought of as a statement that all data with timestamps smaller than the watermark have already arrived. Or in other words, any data with a timestamp less than the current watermark will be considered late. My guess is that you are not seeing any results because your watermarks are causing all of your data to be considered late, and the window operator is dropping all this late data.
I suggest you use a BoundedOutOfOrdernessTimestampExtractor instead. It works by keeping track of the max timestamp seen so far in the data stream, and subtracts the delay from that max timestamp, rather than the system clock. The source code, in case you're curious.

Related

Filtering duplicates out of an infinite DataStream with windows

I want to filter out duplicates in Flink from an infinite DataStream. I know the duplicates arise only in a small time window (max 10 seconds). I found a promising approach that is pretty simple here. But it doesn't work. It uses a keyed DataStream and returns only the first message of every window.
This is my window code:
DataStream<Row> outputStream = inputStream
.keyBy(new MyKeySelector())
.window(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.minutes(5)))
.process(new DuplicateFilter());
MyKeySelector()is just a class to select the first two attributes of the Row message as the key. This key works as a primary key and causes that only messages with same key are assigned to the same window (classic keyed stream behaviour).
That's the class Duplicate Filter which is very similar to the proposed answer to the above-mentioned question. I only used the newer process() function instead of apply().
public class DuplicateFilter extends ProcessWindowFunction<Row, Row, Tuple2<String, String>, TimeWindow> {
private static final Logger LOG = LoggerFactory.getLogger(DuplicateFilter.class);
#Override
public void process(Tuple2<String, String> key, Context context, Iterable<Row> iterable, Collector<Row> collector) throws Exception {
// this is just for debugging and can be ignored
int count = 0;
for (Row record :
iterable) {
LOG.info("Row number {}: {}", count, record);
count++;
}
LOG.info("first Row: {}", iterable.iterator().next());
collector.collect(iterable.iterator().next()); //output only the first message in this window
}
}
My messages arrive with an interval of max. one second, so a 30 seconds window should handle that well. But messages which arrive with a distance of less than 1 second are assigned to different windows. What I can see from the logs is that it works correctly only very rarely.
Has someone got an idea or another approach for this task? Please let me know if you need more information.

Flink's time windows are aligned to the clock, rather than to the events, so two events that are close together in time can be assigned to different windows. Windows are often not very well suited for deduplication, but you might get good results if you use session windows.
Personally, I would use a keyed flatmap (or a process function), and use state TTL (or timers) to clear the state for keys once it's no longer needed.
You can also do deduplication with Flink SQL: https://ci.apache.org/projects/flink/flink-docs-stable/docs/dev/table/sql/queries/deduplication/ (but you would need to set an idle state retention interval).

Why is a particular Guava Stopwatch.elapsed() call much later than others? (output in post)

I am working on a small game project and want to track time in order to process physics. After scrolling through different approaches, at first I had decided to use Java's Instant and Duration classes and now switched over to Guava's Stopwatch implementation, however, in my snippet, both of those approaches have a big gap at the second call of runtime.elapsed(). That doesn't seem like a big problem in the long run, but why does that happen?
I have tried running the code below as both in focus and as a Thread, in Windows and in Linux (Ubuntu 18.04) and the result stays the same - the exact values differ, but the gap occurs. I am using the IntelliJ IDEA environment with JDK 11.
Snippet from Main:
public static void main(String[] args) {
MassObject[] planets = {
new Spaceship(10, 0, 6378000)
};
planets[0].run();
}
This is part of my class MassObject extends Thread:
public void run() {
// I am using StringBuilder to eliminate flushing delays.
StringBuilder output = new StringBuilder();
Stopwatch runtime = Stopwatch.createStarted();
// massObjectList = static List<MassObject>;
for (MassObject b : massObjectList) {
if(b!=this) calculateGravity(this, b);
}
for (int i = 0; i < 10; i++) {
output.append(runtime.elapsed().getNano()).append("\n");
}
System.out.println(output);
}
Stdout:
30700
1807000
1808900
1811600
1812400
1813300
1830200
1833200
1834500
1835500
Thanks for your help.

You're calling Duration.getNano() on the Duration returned by elapsed(), which isn't what you want.
The internal representation of a Duration is a number of seconds plus a nano offset for whatever additional fraction of a whole second there is in the duration. Duration.getNano() returns that nano offset, and should almost never be called unless you're also calling Duration.getSeconds().
The method you probably want to be calling is toNanos(), which converts the whole duration to a number of nanoseconds.
Edit: In this case that doesn't explain what you're seeing because it does appear that the nano offsets being printed are probably all within the same second, but it's still the case that you shouldn't be using getNano().
The actual issue is probably some combination of classloading or extra work that has to happen during the first call, and/or JIT improving performance of future calls (though I don't think looping 10 times is necessarily enough that you'd see much of any change from JIT).

Understanding Kafka stream groupBy and window

I am not able to understand the concept of groupBy/groupById and windowing in kafka streaming. My goal is to aggregate stream data over some time period (e.g. 5 seconds). My streaming data looks something like:
{"value":0,"time":1533875665509}
{"value":10,"time":1533875667511}
{"value":8,"time":1533875669512}
The time is in milliseconds (epoch). Here my timestamp is in my message and not in key. And I want to average the value of 5 seconds window.
Here is code that I am trying but it seems I am unable to get it work
builder.<String, String>stream("my_topic")
.map((key, val) -> { TimeVal tv = TimeVal.fromJson(val); return new KeyValue<Long, Double>(tv.time, tv.value);})
.groupByKey(Serialized.with(Serdes.Long(), Serdes.Double()))
.windowedBy(TimeWindows.of(5000))
.count()
.toStream()
.foreach((key, val) -> System.out.println(key + " " + val));
This code does not print anything even though the topic is generating messages every two seconds. When I press Ctrl+C then it prints something like
[1533877059029#1533877055000/1533877060000] 1
[1533877061031#1533877060000/1533877065000] 1
[1533877063034#1533877060000/1533877065000] 1
[1533877065035#1533877065000/1533877070000] 1
[1533877067039#1533877065000/1533877070000] 1
This output does not make sense to me.
Related code:
public class MessageTimeExtractor implements TimestampExtractor {
#Override
public long extract(ConsumerRecord<Object, Object> record, long previousTimestamp) {
String str = (String)record.value();
TimeVal tv = TimeVal.fromJson(str);
return tv.time;
}
}
public class TimeVal
{
final public long time;
final public double value;
public TimeVal(long tm, double val) {
this.time = tm;
this.value = val;
}
public static TimeVal fromJson(String val) {
Gson gson = new GsonBuilder().create();
TimeVal tv = gson.fromJson(val, TimeVal.class);
return tv;
}
}
Questions:
Why do you need to pass serializer/deserializer to group by. Some of the overloads also take ValueStore, what is that? When grouped, how the data looks in the grouped stream?
How window stream is related to group stream?
The above, I was expecting to print in streaming way. That means buffer for every 5 seconds and then count and then print. It only prints once press Ctrl+c on command prompt i.e. it prints and then exits

It seems you don't have keys in your input data (correct me if this is wrong), and it further seems, that you want to do global aggregation?
In general, grouping is for splitting a stream into sub-streams. Those sub-streams are build by key (ie, one logical sub-stream per key). You set your timestamp as key in your code snippet an thus generate a sub-stream per timestamps. I assume this is not intended.
If you want to go a global aggregation, you will need to map all record to a single substream, ie, assign the same key to all records in groupBy(). Note, that global aggregations don't scale as the aggregation must be computed by a single thread. Thus, this will only work for small workloads.
Windowing is applied to each generated sub-stream to build the windows, and the aggregation is computed per window. The windows are build base on the timestamp returned by the Timestamp extractor. It seems you have an implementation that extracts the timestamp for the value for this purpose already.
This code does not print anything even though the topic is generating messages every two seconds. When I press Ctrl+C then it prints something like
By default, Kafka Streams uses some internal caching and the cache will be flushed on commit -- this happens every 30 seconds by default, or when you stop your application. You would need to disable caching to see result earlier (cf. https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html)
Why do you need to pass serializer/deserializer to group by.
Because data needs to be redistributed and this happens via a topic in Kafka. Note, that Kafka Streams is build for a distributed setup, with multiple instances of the same application running in parallel to scale out horizontally.
Btw: we might also be interesting in this blog post about the execution model of Kafka Streams: https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/

It seems like you misunderstand the nature of window DSL.
It works for internal message timestamps handled by kafka platform, not for arbitrary properties in your specific message type that encode time information. Also, this window does not group into intervals - it is a sliding window. It means any aggregation you get is for the last 5 seconds before the current message.
Also, you need the same key for all group elements to be combined into the same group, for example, null. In your example key is a timestamp which is kind of entry-unique, so there will be only a single element in a group.

SlidingWindows for slow data (big intervals) on Apache Beam

I am working with Chicago Traffic Tracker dataset, where new data is published every 15 minutes. When new data is available, it represents records off by 10-15 minutes from the "real time" (example, look for _last_updt).
For example, at 00:20, I get data timestamped 00:10; at 00:35, I get from 00:20; at 00:50, I get from 00:40. So the interval that I can get new data "fixed" (every 15 minutes), although the interval on timestamps change slightly.
I am trying to consume this data on Dataflow (Apache Beam) and for that I am playing with Sliding Windows. My idea is to collect and work on 4 consecutive datapoints (4 x 15min = 60min), and ideally update my calculation of sum/averages as soon as a new datapoint is available. For that, I've started with the code:
PCollection<TrafficData> trafficData = input
.apply("MapIntoSlidingWindows", Window.<TrafficData>into(
SlidingWindows.of(Duration.standardMinutes(60)) // (4x15)
.every(Duration.standardMinutes(15))) . // interval to get new data
.triggering(AfterWatermark
.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()))
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes());
Unfortunately, looks like when I receive a new datapoint from my input, I do not get a new (updated) result from the GroupByKey that I have after.
Is this something wrong with my SlidingWindows? Or am I missing something else?

One issue may be that the watermark is going past the end of the window, and dropping all later elements. You may try giving a few minutes after the watermark passes:
PCollection<TrafficData> trafficData = input
.apply("MapIntoSlidingWindows", Window.<TrafficData>into(
SlidingWindows.of(Duration.standardMinutes(60)) // (4x15)
.every(Duration.standardMinutes(15))) . // interval to get new data
.triggering(AfterWatermark
.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane())
.withLateFirings(AfterProcessingTime.pastFirstElementInPane()))
.withAllowedLateness(Duration.standardMinutes(15))
.accumulatingFiredPanes());
Let me know if this helps at all.

So #Pablo (from my understanding) gave the correct answer. But I had some suggestions that would not fit in a comment.
I wanted to ask whether you need sliding windows? From what I can tell, fixed windows would do the job for you and be computationally simpler as well. Since you are using accumulating fired panes, you don't need to use a sliding window since your next DoFn function will already be doing an average from the accumulated panes.
As for the code, I made changes to the early and late firing logic. I also suggest increasing the windowing size. Since you know the data comes every 15 minutes, you should be closing the window after 15 minutes rather than on 15 minutes. But you also don't want to pick a window which will eventually collide with multiples of 15 (like 20) because at 60 minutes you'll have the same problem. So pick a number that is co-prime to 15, for example 19. Also allow for late entries.
PCollection<TrafficData> trafficData = input
.apply("MapIntoFixedWindows", Window.<TrafficData>into(
FixedWindows.of(Duration.standardMinutes(19))
.triggering(AfterWatermark.pastEndOfWindow()
// fire the moment you see an element
.withEarlyFirings(AfterPane.elementCountAtLeast(1))
//this line is optional since you already have a past end of window and a early firing. But just in case
.withLateFirings(AfterProcessingTime.pastFirstElementInPane()))
.withAllowedLateness(Duration.standardMinutes(60))
.accumulatingFiredPanes());
Let me know if that solves your issue!
EDIT
So, I could not understand how you computed the above example, so I am using a generic example. Below is a generic averaging function:
public class AverageFn extends CombineFn<Integer, AverageFn.Accum, Double> {
public static class Accum {
int sum = 0;
int count = 0;
}
#Override
public Accum createAccumulator() { return new Accum(); }
#Override
public Accum addInput(Accum accum, Integer input) {
accum.sum += input;
accum.count++;
return accum;
}
#Override
public Accum mergeAccumulators(Iterable<Accum> accums) {
Accum merged = createAccumulator();
for (Accum accum : accums) {
merged.sum += accum.sum;
merged.count += accum.count;
}
return merged;
}
#Override
public Double extractOutput(Accum accum) {
return ((double) accum.sum) / accum.count;
}
}
In order to run it you would add the line:
PCollection<Double> average = trafficData.apply(Combine.globally(new AverageFn()));
Since you are currently using accumulating firing triggers, this would be the simplest coding way to solve the solution.
HOWEVER, if you want to use a discarding fire pane window, you would need to use a PCollectionView to store the previous average and pass it as a side input to the next one in order to keep track of the values. This is a little more complex in coding but would definitely improve performance since constant work is done every window, unlike in accumulating firing.
Does this make enough sense for you to generate your own function for discarding fire pane window?

How to obtain current TAI time?

How can I obtain the current TAI time in milliseconds in Linux using either Java or C++?
The reason I need this is to be able to accurately take timestamps over a long period of time (on the order of years) and still be able to compare them, without worrying about leap seconds. It is possible for multiple measurements to take place during a leap second and all measurements need to be unambiguous, monotonically increasing, and linearly increasing. This will be a dedicated Linux server. This is for a scientific project which needs precision of about .5 seconds.
I do not currently wish to invest in a GPS timekeeper and hope to use NTP to pool.ntp.org in order to keep the system clock on track.
I have looked into the following solutions:
Java 8 or the ThreeTen Project
The only way to obtain a TAIInstant is to use an Instant and then convert it which, according to the specs, "Conversion from an Instant will not be completely accurate near a leap second in accordance with UTC-SLS." That in and of itself is not a big deal (in fact, using UTC-SLS would also be acceptable). However, using now() in the Instant class also seems to just be a wrapper for System.currentTimeMillis(), which makes me think that during the leap second, the time will still be ambiguous and the project will not actually give me TAI time. The Java 8 specifications also state:
Implementations of the Java time-scale using the JSR-310 API are not
required to provide any clock that is sub-second accurate, or that
progresses monotonically or smoothly. Implementations are therefore
not required to actually perform the UTC-SLS slew or to otherwise be
aware of leap seconds.
Using a right/? timezone
This seems like it would work, however I am not sure if the implementation is smart enough to continue working during a leap second or if System.currentTimeMillis() would even give TAI time. In other words, would the underlying implementation still use UTC, thus giving an ambiguous time during the leap second which is then converted to TAI, or does using a right/ timezone actually work with TAI using System.currentTimeMillis() always (ie even during leap second)?
Using CLOCK_TAI
I tried using CLOCK_TAI in the Linux kernel but found it to be completely identical to CLOCK_REALTIME in my test:
Code:
#include <iostream>
#include <time.h>
long sec(int clock)
{
struct timespec gettime_now;
clock_gettime(clock, &gettime_now);
return gettime_now.tv_sec;
}
int main()
{
std::cout << sec(0) << std::endl; // CLOCK_REALTIME
std::cout << sec(1) << std::endl; // CLOCK_MONOTONIC
std::cout << sec(11) << std::endl; // CLOCK_TAI
return 0;
}
The output was simply:
1427744797
6896
1427744797
Using CLOCK_MONOTONIC
The problem with this is that the timestamps need to remain valid and comparable even if the computer restarts.

CLOCK_REALTIME and CLOCK_TAI return the same because the kernel parameter tai_offset is zero.
Check by using adjtimex(timex tmx) and read the value. I think that ntpd will set it if it is new enough (>4.2.6) and has a leap second file. It may also be able to get it from upstream servers but I haven't been able to verify. The call adjtimex() can set tai_offset manually when run as root. You will need a new-ish man page for adjtimex to see the parameters to set. My debian man page was too old but the command worked.

In addition to the correct accepted answer I would also mention the free Java library Time4J (min version v4.1) as possible solution because
I have written it to fill a gap in Java world (java.time cannot do all),
other answers given so far only talk about C++ (but you also asked for Java),
it works according to the same principles described by #user3427419.
It uses a monotonic clock based on System.nanoTime() but even allows custom implementations via the interface TickProvider. For the purpose of calibration, you can either use net.time4j.SystemClock.MONOTONIC, or you use an SNTP-clock named SntpConnector which just needs some simple configuration to connect to any NTP-time-server you want. And thanks to the built-in leap-second-table Time4J can even show you the announced leap second at the end of this month - in ISO-8601-notation or even as formatted local timestamp string in any timezone (using i18n-module).
A recalibration (in case of NTP - reconnect) of the clocks is possible meaning the clocks can be adapted to intermediate time adjustments (although I strongly recommend not to do it during your measurements or during a leap second). Although such a reconnect of an SNTP clock would normally cause the time stepping back in some cases Time4J tries to apply a smoothing algorithm (if activated in clock configuration) to ensure monotone behaviour. Detailed documentation is available online.
Example:
// Step 0: configure your clock
String ntpServer = "ptbtime1.ptb.de";
SntpConnector clock = new SntpConnector(ntpServer);
// Step 1: Timestamp start of the program and associate it with a counter
clock.connect();
// Step 2: Use the counter for sequential measurements at fixed intervals
Moment m = clock.currentTime();
System.out.println(m); // possible output = 2015-06-30T23:59:60,123456789Z
// Step 3: Timestamp new counter value(s) as necessary to keep your data adequately synced
clock.connect();
I doubt if any C++-based solution is more simple. More code demonstrations can also be studied on DZone.
Update (answer to question in comment):
A slightly simplified solution how to automatically download the given IETF-resource for new leap seconds and to translate it into a Time4J-specific format might look like this:
URL url = new URL("https://www.ietf.org/timezones/data/leap-seconds.list");
BufferedReader br =
new BufferedReader(
new InputStreamReader(url.openStream(), "US-ASCII"));
String line;
PlainDate expires = null;
Moment ntpEpoch = PlainTimestamp.of(1900, 1, 1, 0, 0).atUTC();
List<PlainDate> events = new ArrayList<PlainDate>();
try {
while ((line = br.readLine()) != null) {
if (line.startsWith("##")) {
long expraw = Long.parseLong(line.substring(2).trim());
expires = ntpEpoch.plus(
expraw, TimeUnit.SECONDS)
.toZonalTimestamp(ZonalOffset.UTC).toDate();
continue;
} else if (line.startsWith("#")) {
continue; // comment line
}
// this works for some foreseeable future
long epoch = Long.parseLong(line.substring(0, 10));
// this is no leap second
// but just the official introduction of modern UTC scale
if (epoch == 2272060800L) {
continue;
}
// -1 because we don't want to associate
// the leap second with the following day
PlainDate event =
ntpEpoch.plus(epoch - 1, TimeUnit.SECONDS)
.toZonalTimestamp(ZonalOffset.UTC).toDate();
events.add(event); // we don't assume any negative leap seconds here for simplicity
}
} finally {
br.close();
}
// now let's write the result into time4j-format
// use a location relative to class path of main program (see below)
String path = "C:/work/leapseconds.txt";
Writer writer = new FileWriter(new File(path));
String sep = System.getProperty("line.separator");
try {
for (PlainDate event : events) {
writer.write(event + ", +" + sep);
}
writer.write("#expires=" + expires + sep);
} finally {
writer.close();
}
System.out.println(
"Leap second file was successfully written from IETF-resource.");
// And finally, we can start the main program in a separate process
// with the system property "net.time4j.scale.leapseconds.path"
// set to our leapsecond file path (must be relative to class path)
Some notes:
I recommend to write this code as subprogram called by a simple batch program in order to avoid the main program being dependent on internet connectivity. This batch file would finally call the main program with the mentioned system property. If you set this property then the leap seconds will be read from the file specified there, and any eventually available tzdata-module would then stop to yield any concurrent leap second informations.

The reason I need this is to be able to accurately take timestamps
over a long period of time (on the order of years) and still be able
to compare them, without worrying about leap seconds. It is possible
for multiple measurements to take place during a leap second and all
measurements need to be unambiguous, monotonically increasing, and
linearly increasing.
Then your design is suboptimal. You cannot use time and then somehow meddle through leap seconds. This actually comes up often enough and people fall into the same trap of timestamping measurements using wall clock.
Timestamp start of the program and associate it with a counter
Use the counter for sequential measurements at fixed intervals
Timestamp new counter value(s) as necessary to keep your data adequately synced
If you avoid timestamping for the 1 second that leapsecond can occur (midnight!), you are home free because those can be adjusted later.
Now if you insist on using TAI without counter, all you need is a table with leap seconds that need to be accounted for. Then just use monotonic time. There is also libraries that can do this for you, but they may be out of date so you'll have to maintain them yourself,
http://skarnet.org/software/skalibs/libstddjb/tai.html

You have to implement a TAI clock based on C++ std::steady_clock or similar. To synchronize your TAI clock you could rely on GPS or NTP.
Option TAI from NTP: Your TAI implementation would need knowledge about leap seconds. Probably NTP protocol or referenced resources are the most reliable sources of current and future leap seconds.
Option TAI from GPS: GPS clock has a fixed offset to TAI, you do not have to mess with leap seconds

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.