I am trying to do a windowed aggregation query on a data stream that contains over 40 attributes in Flink. The stream's schema contains an epoch timestamp which I want to use for the WatermarkStrategy so I can actually define tumbling windows over it.
I know from the docs, that you can define a Timestamp using the SQL Api in an CREATE TABLE-query by first using TO_TIMESTAMP_LTZ on the epochs to convert it to a proper timestamp which can be used in the following WATERMARK FOR-statement. Since I got a really huge schema tho, I want to deserialise and provide the schema NOT by completely writing the complete CREATE TABLE-statement containing all columns BUT by using a custom class derived from the proto file that cointains the schema. As far as I know, this is only possible by providing a deserializer for the KafkaSourceBuilder and calling the results function of the stream on the class derived from the protofile with protoc. Which means, that I have to define the table using the stream api.
Inspired by the answer to this question, I do it like this:
WatermarkStrategy watermarkStrategy = WatermarkStrategy
.<Row>forBoundedOutOfOrderness(Duration.ofSeconds(10))
.withTimestampAssigner( (event, ts) -> (Long) event.getField("ts"));
tableEnv.createTemporaryView(
"bidevents",
stream
.returns(BiddingEvent.BidEvent.class)
.map(e -> Row.of(
e.getTracking().getCampaign().getId(),
e.getTracking().getAuction().getId(),
Timestamp.from(Instant.ofEpochSecond(e.getTimestamp().getMilliseconds() / 1000))
)
)
.returns(Types.ROW_NAMED(new String[] {"campaign_id", "auction_id", "ts"}, Types.STRING, Types.STRING, Types.SQL_TIMESTAMP))
.assignTimestampsAndWatermarks(watermarkStrategy)
);
tableEnv.executeSql("DESCRIBE bidevents").print();
Table resultTable = tableEnv.sqlQuery("" +
"SELECT " +
" TUMBLE_START(ts, INTERVAL '1' DAY) AS window_start, " +
" TUMBLE_END(ts, INTERVAL '1' DAY) AS window_end, " +
" campaign_id, " +
" count(distinct auction_id) auctions " +
"FROM bidevents " +
"GROUP BY TUMBLE(ts, INTERVAL '1' DAY), campaign_id");
DataStream<Row> resultStream = tableEnv.toDataStream(resultTable);
resultStream.print();
env.execute();
I get this error:
Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Window aggregate can only be defined over a time attribute column, but TIMESTAMP(9) encountered.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) ~[flink-dist-1.15.1.jar:1.15.1]
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.1.jar:1.15.1]
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.1.jar:1.15.1]
at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.1.jar:1.15.1]
This seems kind of logical, since in line 3 I cast a java.sql.Timestamp to a Long value, which it is not (but also the stacktrace does not indicate that an error occurred during the cast). But when I do not convert the epoch (in Long-Format) during the map-statement to a Timestamp, I get this exception:
"Cannot apply '$TUMBLE' to arguments of type '$TUMBLE(<BIGINT>, <INTERVAL DAY>)'"
How can I assign the watermark AFTER the map-statement and use the column in the later SQL Query to create a tumbling window?
======UPDATE=====
Thanks to a comment from David, I understand that I need the column to be of type TIMESTAMP(p) with precision p <= 3. To my understanding this means, that my timestamp may not be more precise than having full milliseconds. So i tried different ways to create Java Timestamps (java.sql.Timestamps and java.time.LocaleDateTime) which correspond to the Flink timestamps.
Some examples are:
1 Trying to convert epochs into a LocalDateTime by setting nanoseconds (the 2nd parameter of ofEpochSecond to 0):
LocalDateTime.ofEpochSecond(e.getTimestamp().getMilliseconds() / 1000, 0, ZoneOffset.UTC )
2 After reading the answer from Svend in this question who uses LocalDateTime.parse on timestamps that look like this "2021-11-16T08:19:30.123", I tried this:
LocalDateTime.parse(
DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss").format(
LocalDateTime.ofInstant(
Instant.ofEpochSecond(e.getTimestamp().getMilliseconds() / 1000),
ZoneId.systemDefault()
)
)
)
As you can see, the timestamps even only have seconds-granularity (which i checked when looking at the printed output of the stream I created) which I assume should mean, they have a precision of 0. But actually when using this stream for defining a table/view, it once again has the type TIMESTAMP(9).
3 I also tried it with the sql timestamps:
new Timestamp(e.getTimestamp().getMilliseconds() )
This also did not change anything. I somehow always end up with a precision of 9.
Can somebody please help me how I can fix this?
Ok, I found the solution to the problem. If you got a stream containing a timestamp which you want to define as event time column for watermarks, you can use this function:
Table inputTable = tableEnv.fromDataStream(
stream,
Schema.newBuilder()
.column("campaign_id", "STRING")
.column("auction_id", "STRING")
.column("ts", "TIMESTAMP(3)")
.watermark("ts", "SOURCE_WATERMARK()")
.build()
);
The important part is, that you can "cast" the timestamp ts from TIMESTAMP(9) "down" to TIMESTAMP(3) or any other precision below 4 and you can set the column to contain the water mark.
Another mention that seems important to me is, that only Timestamps of type java.time.LocalDateTime actually worked for later use as watermarks for tumbling windows.
Any other attempts to influence the precision of the timestamps by differently creating java.sql.Timestamp or java.time.LocalDateTime failed. This seemed to be the only viable way.
Related
I came across this issue while writing a test case where I had to get a range of records between a range of timestamps –– using H2 embedded database with spring-data-jpa.
The original issue is located at: Fetching records BETWEEN two java.time.Instant instances in Spring Data Query
I have the timestamps as java.time.Instant instances.
If the user gives no start-end timestamps, I go ahead and plug in Instant.MIN and Instant.MAX respectively.
What perplexes me is that the following test-case passes:
#Test
public void test_date_min_max_instants_timestamps() {
Timestamp past = new Timestamp(Long.MIN_VALUE);
Timestamp future = new Timestamp(Long.MAX_VALUE);
Timestamp present = Timestamp.from(Instant.now());
assertTrue(present.after(past));
assertTrue(future.after(past));
assertTrue(future.after(present));
assertTrue(present.before(future));
assertTrue(past.before(present));
assertTrue(past.before(future));
}
but, the following test-case fails:
#Test
public void test_instant_ranges() throws InterruptedException {
Timestamp past = Timestamp.from(Instant.MIN);
Timestamp future = Timestamp.from(Instant.MAX);
Timestamp present = Timestamp.from(Instant.now());
assertTrue(present.after(past));
assertTrue(future.after(past));
assertTrue(future.after(present));
assertTrue(present.before(future));
assertTrue(past.before(present));
assertTrue(past.before(future));
}
Furthermore, if the past and future are not MIN/MAX values, but normal values instead, the result is as expected.
Any idea why java.sql.Timestamp behaves like this?
Also, if the time represented by Instant is too big for Timestamp shouldn't it fail instead?
P.S. If this question has already been asked, could someone link the original since I haven't been able to find it.
Edit: Added the debug information I mentioned in the comment section, so that we have everything in one place.
For the Timestamp instances made from Instant.MIN and Instant.MAX, I had the following values:
past = 169108098-07-03 21:51:43.0
future = 169104627-12-11 11:08:15.999999999
present = 2018-07-23 10:46:50.842
and for the Timestamp instances made from Long.MIN_VALUE and Long.MAX_VALUE, I got:
past = 292278994-08-16 23:12:55.192
future = 292278994-08-16 23:12:55.807
present = 2018-07-23 10:49:54.281
To clarify my question, instead of failing silently or using using a different value internally, the Timestamp should fail explicitly. Currently it doesn't.
This is a known bug in the Timestamp class and its conversion from Instant. It was registered in the Java bug database in January 2015, three and a half years ago (and is still open with no decided fix version). See the link to the official bug report at the bottom.
Expected behaviour is clear
The documentation of Timestamp.from(Instant) is pretty clear about this:
Instant can store points on the time-line further in the future and
further in the past than Date. In this scenario, this method will
throw an exception.
So yes, an exception should be thrown.
It’s straightforward to reproduce the bug
On my Java 10 I have reproduced a couple of examples where the conversion silently gives an incorrect result rather than throwing an exception. One example is:
Instant i = LocalDate.of(-400_000_000, Month.JUNE, 14)
.atStartOfDay(ZoneId.of("Africa/Cairo"))
.toInstant();
Timestamp ts = Timestamp.from(i);
System.out.println("" + i + " -> " + ts + " -> " + ts.toInstant());
This prints:
-400000000-06-13T21:54:51Z -> 184554049-09-14 14:20:42.0 -> +184554049-09-14T12:20:42Z
The former conversion is very obviously wrong: a time in the far past has been converted into a time in the far (though not quite as far) future (the conversion back to Instant seems to be correct).
Appendix: JDK source code
For the curious here is the implementation of the conversion method:
public static Timestamp from(Instant instant) {
try {
Timestamp stamp = new Timestamp(instant.getEpochSecond() * MILLIS_PER_SECOND);
stamp.nanos = instant.getNano();
return stamp;
} catch (ArithmeticException ex) {
throw new IllegalArgumentException(ex);
}
}
It may seem that the author had expected that an arithmetic overflow in the multiplication would cause an ArithmeticException. It does not.
Links
JDK bug JDK-8068958: Timestamp.from(Instant) should throw when conversion is not possible
Documentation of Timestamp.from
I want to retrieve a record which has a date field whose value is closer to a given date.How should I proceed?
Below is the table,
id |employeeid|region |startdate |enddate |
1 1234 abc 2014-11-24 2015-01-17
2 1234 xyz 2015-01-18 9999-12-31
Here, I should retrieve the record whose enddate is closer to the startdate of another record say,'2015-01-18', so it should retrieve the 1 st record.I tried the following queries
1.
SELECT l.region
FROM ABC.location l where l.EmployeeId=1234
ORDER BY ABS( DATEDIFF('2015-01-18',l.Enddate) );
2.
SELECT l.region
FROM ABC.location l where l.EmployeeId=1234
ORDER BY ABS( DATEDIFF(l.Enddate,'2015-01-18') );
But, none of them is working. Kindly help me in this.
Thanks,
Poorna.
You might want to try this:
Query query = session.createQuery("SELECT l.region, ABS( DATEDIFF('2015-01-18',l.Enddate) ) as resultDiff FROM ABC.location l where l.EmployeeId=1234 ORDER BY resultDiff");
query.setFirstResult(0);
query.setMaxResults(1);
List result = query.list();
Well, Unix timestamps are expressed as a number of seconds since 01 Jan 1970, so if you subtract one from the other you get the difference in seconds. The difference in days is then simply a matter of dividing by the number of seconds in a day:
(date_modified - date_submitted) / (24*60*60)
or
(date_modified - date_submitted) / 86400
get the minimum of them.
refer this question it may be helpful::::Selecting the minimum difference between two dates in Oracle when the dates are represented as UNIX timestamps
Using Google's "electric meter" example from a few years back, we would have:
MeterID (Datastore Key) | MeterDate (Date) | ReceivedDate (Date) | Reading (double)
Presuming we received updated info (Say, out of calibration/busted meter, etc.) and put in a new row with same MeterID and MeterDate, using a Window Function to grab the newest Received Date for each ID+MeterDate pair would only cost more if there is multiple records for that pair, right?
Sadly, we are flying without a SQL expert, but it seems like the query should look like:
SELECT
meterDate,
NTH_VALUE(reading, 1) OVER (PARTITION BY meterDate ORDER BY receivedDate DESC) AS reading
FROM [BogusBQ:TableID]
WHERE meterID = {ID}
AND meterDate BETWEEN {startDate} AND {endDate}
Am I missing anything else major here? Would adding 'AND NOT IS_NAN(reading)' cause the Window Function to return the next row, or nothing? (Then we could use NaN to signify "deleted".)
Your SQL looks good. Couple of advices:
- I would use FIRST_VALUE to be a bit more explicit, but otherwise should work.
- If you can - use NULL instead of NaN. Or better yet, add new BOOLEAN column to mark deleted rows.
I made a Oracle Package like below.
And I will pass parameter String like '2014-11-05'.
--SEARCH 2014 11 04
FUNCTION SEARCHMYPAGE(v_created_after IN DATE, v_created_before IN DATE)
return CURSORTYPE is rtn_cursor CURSORTYPE;
BEGIN
OPEN
rtn_cursor FOR
select
news_id
from
(
select
news_id,
news_title, news_desc,
created, news_cd
from
news
)
where
1=1
AND (created BETWEEN decode(v_created_after, '', to_date('2000-01-01', 'YYYY-MM-DD'), to_date(v_created_after, 'YYYY-MM-DD'))
AND (decode(v_created_before, '', sysdate, to_date(v_created_before, 'YYYY-MM-DD')) + 0.999999));
return rtn_cursor ;
END SEARCHMYPAGE;
I confirmed my parameter in Eclipse console Message, since I am working on Eclipse IDE.
I got contents, which are made in 2014-10-29 ~ 2014-10-31.
when I pass '2014-11-01' as created_after, It returns 0 records.(But I expected all contents, since every contents are made between 10-29 and 10-31)
Would you find anything wrong with my Function?
Thanks :D
create function search_my_page(p_created_after in date, p_created_before in date)
return cursortype
is rtn_cursor cursortype;
begin
open rtn_cursor for
select news_id
from news
where created between
nvl(v_created_after, date '1234-01-01')
and
nvl(v_created_before, sysdate) + interval '1' day - interval '1' second;
return rtn_cursor;
end search_my_page;
/
Changes:
Re-wrote predicates - there was a misplaced parentheses changing the meaning.
Replaced to_date with date literals and variables. Since you're already using ANSI date format, might as well use literals. And date variables do not need to be cast to dates.
Replace DECODE with simpler NVL.
Removed extra parentheses.
Renamed v_ to p_. It's typical to use p_ to mean "parameter" and v for "(local) variable".
Removed extra inline view. Normally inline views are underused, in this case it doesn't seem to help much.
Removed unnecessary 1=1.
Replaced 0.99999 with date intervals, to make the math clearer.
Changed to lower case (this ain't COBOL), added underscores to function name.
Changed 2000-01-01 to 1234-01-01. If you use a magic value it should look unusual - don't try to hide it.
I need to count the number of days between 2 dates in JPA.
For example :
CriteriaBuilder.construct(
MyCustomBean.class
myBean.get(MyBean_.beginDate), //Expression<Date>
myBean.get(MyBean_.endDate), //Expression<Date>
myDiffExpr(myBean) //How to write this expression from the 2 Expression<Date>?
);
So far, I tried :
CriteriaBuilder.diff(). but it does not compile because this method expects some N extends Number and the Date does not extend Number.
I tried to extend the PostgreSQL82Dialect (as my target database is PostgreSQL) :
public class MyDialect extends PostgreSQL82Dialect {
public MyDialect() {
super();
registerFunction("datediff",
//In PostgreSQL, date2 - date1 returns the number of days between them.
new SQLFunctionTemplate(StandardBasicTypes.LONG, " (?2 - ?1) "));
}
}
This compiles and the request succeeds but the returned result is not consistent (78 days between today and tomorrow).
How would you do this?
It looks like you are looking for a solution with JPQL to perform queries like SELECT p FROM Period p WHERE datediff(p.to, p.from) > 10.
I'm afraid there is no such functionality in JPQL so I recommend using native SQL. Your idea if extending Dialect with Hibernate's SQLFunctionTemplate was very clever. I'd rather change it to use DATE_PART('day', end - start) as this is the way to achieve days difference between dates with PostgreSQL.
You might also define your function in PostgreSQL and using it with criteria function().
'CREATE OR REPLACE FUNCTION "datediff"(TIMESTAMP,TIMESTAMP) RETURNS integer AS \'DATE_PART('day', $1 - $2);\' LANGUAGE sql;'
cb.function("datediff", Integer.class, end, start);
JPA 2.1 provides for use of "FUNCTION(funcName, args)" in JPQL statements. That allows such handling.
I finally found that the problem comes from the fact that the order of the parameters is not the one I expected :
/*
*(?2 - ?1) is actually equivalent to (? - ?).
* Hence, when I expect it to evaluate (date2 - date1),
* it will actually be evaluated to (date1 - date2)
*/
new SQLFunctionTemplate(StandardBasicTypes.LONG, " (?2 - ?1) "));
I opened a new question in order to know if this behavior is a bug or a feature :
1) CriteriaBuilder.diff(). but it does not compile because this method expects some N extends Number and the Date does not extend Number.
Try to use no of mili seconds for each date as shown below.
Date date = new Date()//use your required date
long millisecond = date.getTime();//Returns no of mili seconds from 1 Jan, 1970 GMT
Long in Number in java and according to autoboxing you can use this. May be this can help.