Apache flink 1.52 Rowtime timestamp is null

Apache flink 1.52 Rowtime timestamp is null - java

I am doing some query with the following code:
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<Row> ds = SourceHelp.builder().env(env).consumer010(MyKafka.builder().build().kafkaWithWaterMark2())
.rowTypeInfo(MyRowType.builder().build().typeInfo())
.build().source4();
//,proctime.proctime,rowtime.rowtime
String sql1 = "select a,b,max(rowtime)as rowtime from user_device group by a,b";
DataStream<Row> ds2 = TableHelp.builder().tableEnv(tableEnv).tableName("user_device").fields("a,b,rowtime.rowtime")
.rowTypeInfo(MyRowType.builder().build().typeInfo13())
.sql(sql1).in(ds).build().result();
ds2.print();
// String sql2 = "select a,count(b) as b from user_device2 group by a";
String sql2 = "select a,count(b) as b,HOP_END(rowtime,INTERVAL '5' SECOND,INTERVAL '30' SECOND) as c from user_device2 group by HOP(rowtime, INTERVAL '5' SECOND, INTERVAL '30' SECOND),a";
DataStream<Row> ds3 = TableHelp.builder().tableEnv(tableEnv).tableName("user_device2").fields("a,b,rowtime.rowtime")
.rowTypeInfo(MyRowType.builder().build().typeInfo14())
.sql(sql2).in(ds2).build().result();
ds3.print();
env.execute("test");
note: For sql1, I use max function with rowtime, it is not working, and following Exception is thrown:
Exception in thread "main"
org.apache.flink.runtime.client.JobExecutionException:
java.lang.RuntimeException: Rowtime timestamp is null. Please make
sure that a proper TimestampAssigner is defined and the stream
environment uses the EventTime time characteristic. at
org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:625)
at
org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123)
at
com.aicaigroup.water.WaterTest.testRowtimeWithMoreSqls5(WaterTest.java:158)
at com.aicaigroup.water.WaterTest.main(WaterTest.java:20) Caused by:
java.lang.RuntimeException: Rowtime timestamp is null. Please make
sure that a proper TimestampAssigner is defined and the stream
environment uses the EventTime time characteristic. at
DataStreamSourceConversion$24.processElement(Unknown Source) at
org.apache.flink.table.runtime.CRowOutputProcessRunner.processElement(CRowOutputProcessRunner.scala:67)
at
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:558)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:533)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:513)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:628)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:581)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:679)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:657)
at
org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at com.aicaigroup.TableHelp$1.processElement(TableHelp.java:42) at
com.aicaigroup.TableHelp$1.processElement(TableHelp.java:39) at
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:558)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:533)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:513)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:679)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:657)
at
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:558)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:533)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:513)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:679)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:657)
at
org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at
org.apache.flink.table.runtime.aggregate.GroupAggProcessFunction.processElement(GroupAggProcessFunction.scala:151)
at
org.apache.flink.table.runtime.aggregate.GroupAggProcessFunction.processElement(GroupAggProcessFunction.scala:39)
at
org.apache.flink.streaming.api.operators.LegacyKeyedProcessOperator.processElement(LegacyKeyedProcessOperator.java:88)
at
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:104)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) at
java.lang.Thread.run(Thread.java:748) 2018-09-17 09:51:53.679 [Kafka
0.10 Fetcher for Source: Custom Source -> Map -> from: (a, b, rowtime) -> select: (a, b, CAST(rowtime) AS rowtime) (2/8)] INFO o.a.kafka.clients.consumer.internals.AbstractCoordinator - Discovered
coordinator 172.16.11.91:9092 (id: 2147483647 rack: null) for group
test.
then I tried to update sql1 like this "select a,b,rowtime from user_device", and it works.
So how to fix the error? First sql should use group by, and second sql should use rowtime by
timeWindow. 3QS

I started flink from 1.6 , meet the similar question like yours.
Solved by the those steps :
using assignTimestampsAndWatermarks , just use the default and normal implement BoundedOutOfOrdernessTimestampExtractor. You need write the extractTimestamp function to extract timestamp value and declare window interval in the constructor.
append ,proctime.proctime,rowtime.rowtime at the end of fields (i'm using fromDataStream(Flink 1.6) to convert stream as table)
if you want use the exist field as rowtime. for example, data source fields is "a,clicktime,c" , you can declare "a,clicktime.rowtime,c"
Wish it can help you.

Related

How can I look at the SQL query really used by myBatis?

The query, created by myBatis, falls with the cause ORA-00933, "SQL command not properly ended".
All advice that I found on the net, says that there is some error in the SQL syntax. On the other hand, if I make the SQL by hand, Oracle SQL developer takes it as a correct one.
Obviously, I am making it by hand differently that myBatis does. And I need to check it. But how can I check the SQL query really sent by myBatis to the DB?
The MyBatis version used: 3.0
I am interested much more in the possibility to look for problems myself than to look for help with every problem. But some people here think that the question must contain the code, so here it is. (mapper.xml)
<sql id="pracovisteSql">
/* pozor, pracuje pouze do max 3 vrstev v cis_pracovist */
cis_pracoviste A
join cis_pracoviste B
on (A.stupen_rizeni in (0,1) and B.kod_nadrizeneho = A.kod_pracoviste) or
(A.stupen_rizeni = 2 and A.kod_pracoviste = B.kod_pracoviste)
join cis_pracoviste C
on (B.stupen_rizeni = 1 and C.kod_nadrizeneho = B.kod_pracoviste) or
(B.stupen_rizeni = 2 and C.kod_pracoviste = B.kod_pracoviste)
</sql>
<sql id="organizaceSql">
WITH organizace
AS (
SELECT a.kod_pracoviste as AKP, a.nazev as ANZ, a.stupen_rizeni as AST, a.kod_nadrizeneho as ANR,
b.kod_pracoviste as BKP, b.nazev as BNZ, b.stupen_rizeni as BST, b.kod_nadrizeneho as BNR,
c.kod_pracoviste as CKP, c.nazev as CNZ, c.stupen_rizeni as CST, c.kod_nadrizeneho as CNR
from
<include refid="pracovisteSql"/>
)
</sql>
<sql id="zahajeniOdDo">
(r01.dat_zahajeni between to_date(#{mesicRokOd}, 'MMYYYY') and to_date(#{mesicRokDo, 'MMYYYY'))
</sql>
<select id="getReportSR02Sql1"
parameterType="amcssz.spr.srv.main.dto.reports.ReportSR02QueryDTO"
resultType="amcssz.spr.srv.main.dto.reports.ReportSR02Sql1DTO">
<include refid="organizaceSql"/>
SELECT Count(Distinct (r01.id_r01_rizeni)) as pocetVRSP
From organizace
Left join r01_rizeni r01
on organizace.ckp = r01.kod_pracoviste and
r01.je_stornovano = 0 and
<include refid="zahajeniOdDo"/> and
r01.kod_skup_rizeni = 'VRSP' /* r01.kod_rizeni in ('VRSPUC', 'VRSPSR', 'VRSPPE', 'VRSPJI') */
Join r02_stavrizeni r02
on R01.ID_R01_RIZENI = R02.ID_R01_RIZENI and /* Změna 17.1.2020 */
R02.JE_AKTUALNI = '1' and
R02.KOD_STAV_RIZENI != 'STR'
Join r08_ukon r08
on R01.ID_R01_RIZENI = R08.ID_R01_RIZENI and /* Změna 17.1.2020 */
R08.KOD_UKON IN ('1','14','23','32') and
R08.JE_STORNOVAN = '0'
Join d02_obalka d02
on R08.ID_R08_UKON = D02.ID_R08_UKON and
d02.dat_doruceni IS NOT NULL
Where organizace.AKP = #{kodPracoviste} and
r01.kod_skup_rizeni is Not Null
Group by r01.kod_skup_rizeni
Order by 1;
</select>

You can configure your log4j level to show the mybatis logging like log4j.logger.org.mybatis.example=DEBUG
Full documentation will be found here

If you use intellij idea, try to add a plugin mybatis-log-plugin.The executed SQL will be output in the console

Dataset api of Spark giving different result as compare to Dataframe

I am using Spark 2.1 and having one hive table with orc format, following is the schema.
col_name data_type
tuid string
puid string
ts string
dt string
source string
peer string
# Partition Information
# col_name data_type
dt string
source string
peer string
# Detailed Table Information
Database: test
Owner: test
Create Time: Tue Nov 22 15:25:53 GMT 2016
Last Access Time: Thu Jan 01 00:00:00 GMT 1970
Location: hdfs://apps/hive/warehouse/nis.db/dmp_puid_tuid
Table Type: MANAGED
Table Parameters:
transient_lastDdlTime 1479828353
SORTBUCKETCOLSPREFIX TRUE
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Storage Desc Parameters:
serialization.format 1
When i am applying filter on top of this table using partition column, its working fine and only reading specific partitions.
val puid = spark.read.table("nis.dmp_puid_tuid")
.as(Encoders.bean(classOf[DmpPuidTuid]))
.filter( """peer = "AggregateKnowledge" and dt = "20170403"""")
and this is my physical plan for this query
== Physical Plan ==
HiveTableScan [tuid#1025, puid#1026, ts#1027, dt#1022, source#1023, peer#1024], MetastoreRelation nis, dmp_puid_tuid, [isnotnull(peer#1024), isnotnull(dt#1022),
(peer#1024 = AggregateKnowledge), (dt#1022 = 20170403)]
but when i am using below code, its reading entire data into spark
val puid = spark.read.table("nis.dmp_puid_tuid")
.as(Encoders.bean(classOf[DmpPuidTuid]))
.filter( tp => tp.getPeer().equals("AggregateKnowledge") && Integer.valueOf(tp.getDt()) >= 20170403)
Physical plan for above dataframe
== Physical Plan ==
*Filter <function1>.apply
+- HiveTableScan [tuid#1058, puid#1059, ts#1060, dt#1055, source#1056, peer#1057], MetastoreRelation nis, dmp_puid_tuid
Note :- DmpPuidTuid is java bean class

When you pass a Scala function to filter, you prevent the Spark optimizer from seeing which columns of the dataset are actually used (because the optimizer does not try to look inside the compiled code of the function). If you pass a column expression, such as col("peer") === "AggregateKnowledge" && col("dt").cast(IntegerType) >= 20170403 then the optimizer will be able to see which columns are actually required and adjust the plan accordingly.

writeConcern is not setting to Acknowledged in mongodb

private val DATABASE:String = config.getString("db.dbname")
private val SERVER:ServerAddress = {
val hostName=config.getString("db.hostname")
val port=config.getString("db.port").toInt
new ServerAddress(hostName,port)
}
val connectionMongo = MongoConnection(SERVER)
def collectionMongo(name:String) = connectionMongo(DATABASE)(name)
val result:WriteResult = collectionMongo("pgroup")
.insert(new BasicDBObject("_id",privateArtGroup.getUuid)
.append("ArtGroupStatus",privateArtGroup.artGroupStatus.toString())
.append("isNew",privateArtGroup.isNew), WriteConcern.Acknowledged)
log.info("what is the write concern " + collectionMongo(pgroup).getWriteConcern)
log.info("what is the write concern "+collectionMongo(pgroup).getWriteConcern)
I am setting WriteConcern to Acknowledged but its not setting
the log stataments prints this from where i get to know its not setting
What is the write concer WriteConcern{w=0, wTimeout=null ms, fsync=null, journal=null
Why w=0 ? it should be w=1
I am using casbah V 3.1.1

val result:WriteResult = collectionMongo("pgroup")
.insert(new BasicDBObject("_id",privateArtGroup.getUuid)
.append("ArtGroupStatus",privateArtGroup.artGroupStatus.toString())
.append("isNew",privateArtGroup.isNew), WriteConcern.Acknowledged)
WriteConcern.Acknowledged - Write operations that use this write concern will wait for acknowledgement from the primary server before returning.
w: 1 - Requests acknowledgement that the write operation has propagated to the standalone mongod or the primary in a replica set.
Reason for Why w=0 ? i
Once the given insert query is executed with writeconcern acknowledge the job is done. Moreover we are setting the writeconcern for the insert query alone and not for the collection. This could be a reason that you are getting w=0.
But still I couldn't figure out - In general we have w: 1 is the default write concern for MongoDB and why you are getting w=0.

How to avoid automatic truncation of GROUP_CONCAT result

I am trying to retrieve a long set of result and in doing so I find that the result data is being truncated automatically. I also noticed that the limit of result(String) is 1024 characters. How can I avoid this to get whole data?
I am accessing tables through JDBC, if that matters.
Code:-
SELECT OLL.EMAIL_ID,COUNT(1) AS TOTAL_ORDERS,OLL.SHIPPING_NAME ,SUM(OFF.UNIT_PRICE*OFF.QUANTITY) AS TOTAL_GMV,
MIN(ODD.CREATION_DATE) AS FIRST_PURCHASE,MAX(ODD.CREATION_DATE) AS LAST_PURCHASE,GROUP_CONCAT(OFF.PAYMENT_MODE) AS PAY_MODES,
GROUP_CONCAT(OFLL.PRODUCT_ID)AS PRODUCT_IDS,LEFT(GROUP_CONCAT(OFLL.PRODUCT_NAME),2048) AS PRODUCTS,OLL.BRAND_NAME,GROUP_CONCAT(OFF.GCTYPE) AS GCTYPE,
OLL.SHIPPING_CITY,OLL.SHIPPING_STATE,GROUP_CONCAT(OFF.UNIT_PRICE-OFF.UNIT_SHIPPING_PRICE) AS PRODUCT_PRICE,GROUP_CONCAT(CEILING(OFF.QUANTITY*OFF.UNIT_PRICE)) AS PRODUCT_GMV,GROUP_CONCAT(OFLL.CHANNEL) AS SALES_CHANNEL
FROM (
SELECT COUNT(1), OL.ORDER_ID, OL.EMAIL_ID, OFL.CREATED_BY,OL.SELLER_CITY,OL.SHIPPING_STATE
FROM (SELECT CREATION_DATE,ORDER_ID,ORDER_ITEM_SEQ_ID FROM ORDER_DATES WHERE CREATION_DATE
BETWEEN '2015-01-30 23:55:00' AND '2015-01-30 23:59:59') OD
INNER JOIN ORDER_LOGISTICS OL ON OD.ORDER_ID = OL.ORDER_ID AND OD.ORDER_ITEM_SEQ_ID = OL.ORDER_ITEM_SEQ_ID
INNER JOIN ORDER_FINANCE OF ON OL.ORDER_ID = OF.ORDER_ID AND OL.ORDER_ITEM_SEQ_ID = OF.ORDER_ITEM_SEQ_ID
INNER JOIN ORDER_FILTERS OFL ON OL.ORDER_ID = OFL.ORDER_ID AND OL.ORDER_ITEM_SEQ_ID = OFL.ORDER_ITEM_SEQ_ID
INNER JOIN ORDER_STATUS OS ON OL.ORDER_ID = OS.ORDER_ID AND OL.ORDER_ITEM_SEQ_ID = OS.ORDER_ITEM_SEQ_ID
WHERE (OF.PAYMENT_MODE='Cash On Delivery' OR OS.PAYMENT_STATUS='Received')AND (OFL.PRODUCT_ID IN ('B4333897','B5163012','B5654542') OR OF.UNIT_PRICE-OF.UNIT_SHIPPING_PRICE='1.00') AND
OFL.CHANNEL IN ('Web Channel', 'Mobile Web Channel') AND
OL.EMAIL_ID IS NOT NULL GROUP BY OL.EMAIL_ID) AAA, ORDER_LOGISTICS OLL, ORDER_FILTERS OFLL,
ORDER_FINANCE OFF,ORDER_DATES ODD WHERE AAA.EMAIL_ID=OLL.EMAIL_ID AND
OLL.ORDER_ID=OFLL.ORDER_ID AND OLL.ORDER_ITEM_SEQ_ID=OFLL.ORDER_ITEM_SEQ_ID AND (OFLL.PRODUCT_ID IN ('B4333897','B5163012','B5654542') OR OFF.UNIT_PRICE-OFF.UNIT_SHIPPING_PRICE='1.00') AND
OFLL.ORDER_ID=ODD.ORDER_ID AND OFF.ORDER_ID=OLL.ORDER_ID AND OFF.ORDER_ITEM_SEQ_ID=OLL.ORDER_ITEM_SEQ_ID
GROUP BY OLL.EMAIL_ID
Since I am using GROUP_CONCAT(), the result comes out to be lengthy and that's where it gets truncated.

JDB: DB2-Query with WITH clause and two host variables does not work

I've optimized my DB2-query with a WITH-clause. Now it is fast, but does not work under JDBC anymore. Does anyone has an idea? Thanks!
WITH tmp AS (SELECT ID_EINSENDUNG, BEURTEILUNG, VERSANDDAT_BVD, LASTUSER
FROM BVDT.TEINSENDUNG_BVD WHERE VERSANDDAT_BVD =
'2008-02-26' --HOST VARIABLE 1
)
SELECT ID_EINSENDUNG, VERSANDDAT_BVD FROM tmp WHERE ID_EINSENDUNG >
4100 --HOST VARIABLE 2
Error Message: ERRORCODE=-4461, SQLSTATE=42815 SQLState: 42815
ErrorCode: -4461
Java Code:
public DBCursor searchViewLandwirtOhrmarke() throws Exception {
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("WITH tmp AS " +
"(SELECT ID_EINSENDUNG, BEURTEILUNG, VERSANDDAT_BVD, LASTUSER FROM BVDT.TEINSENDUNG_BVD WHERE VERSANDDAT_BVD = ?) " +
"SELECT ID_EINSENDUNG, VERSANDDAT_BVD FROM tmp WHERE ID_EINSENDUNG > ?");
prepareStatement(stringBuilder.toString());
ps.setDate(1, DateUtils.getSQLDate("26.02.2008"));
ps.setInt(2,new Integer(4100));
executeCursorSelect();
return this;
}
public EinsendungBvd nextViewLandwirtOhrmarke() throws Exception {
if (endFetch()) {
return null;
}
EinsendungBvd result = new EinsendungBvd(dbConn);
result.setId_einsendung(new Integer(rs.getInt(1)));
if (rs.wasNull()) {
result.setId_einsendung(null);
}
result.setVersanddat_bvd(rs.getDate(11));
return result;
}

Looks like a date format problem to me. Try changing your date in ps.setDate(1, DateUtils.getSQLDate("26.02.2008")); to ps.setDate(1, DateUtils.getSQLDate("2008-02-26"));. I don't know the DateUtil specs, so this is just a wild guess.. (see here for a list of SQLSTATE/ERRORCODES)

Looking at IBM DB2 LUW version 9.7 Error codes issued by the IBM Data Server Driver for JDBC and SQLJ for error code -4461:
-4461 text-from-getMessage 42815
Explanation: The specified value is invalid or out of range.
User response: Call SQLException.getMessage to retrieve specific information about the problem.
Things to try:
Log the value of SQLException.getMessage().
Verify the data types for VERSANDDAT_BVD and ID_EINSENDUNG.
Test the query with one parameter at a time; isolate which parameter is the violator.
Verify DateUtils.getSQLDate() is parsing your date string correctly; compare the result of Date.getTime() on the Date from DateUtils.getSQLDate()and the Date from SimpleDateFormat.parse().

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache flink 1.52 Rowtime timestamp is null - java

Related

How can I look at the SQL query really used by myBatis?

Dataset api of Spark giving different result as compare to Dataframe

writeConcern is not setting to Acknowledged in mongodb

How to avoid automatic truncation of GROUP_CONCAT result

JDB: DB2-Query with WITH clause and two host variables does not work

Categories

Resources