Can't get results from flink SQL query - java

I'm facing a problem in which I don't get results from my query in Flink-SQL.
I have some informations stored in two Kafka Topics, I want to store them in two tables and perform a join between them in a streaming way.
These are my flink instructions :
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
// configure Kafka consumer
Properties props = new Properties();
props.setProperty("bootstrap.servers", "localhost:9092"); // Broker default host:port
props.setProperty("group.id", "flink-consumer"); // Consumer group ID
FlinkKafkaConsumer011<Blocks> flinkBlocksConsumer = new FlinkKafkaConsumer011<>(args[0], new BlocksSchema(), props);
flinkBlocksConsumer.setStartFromEarliest();
FlinkKafkaConsumer011<Transactions> flinkTransactionsConsumer = new FlinkKafkaConsumer011<>(args[1], new TransactionsSchema(), props);
flinkTransactionsConsumer.setStartFromEarliest();
DataStream<Blocks> blocks = env.addSource(flinkBlocksConsumer);
DataStream<Transactions> transactions = env.addSource(flinkTransactionsConsumer);
tableEnv.registerDataStream("blocksTable", blocks);
tableEnv.registerDataStream("transactionsTable", transactions);
Here is my SQL query :
Table sqlResult
= tableEnv.sqlQuery(
"SELECT block_timestamp,count(tx_hash) " +
"FROM blocksTable " +
"JOIN transactionsTable " +
"ON blocksTable.block_hash=transactionsTable.tx_hash " +
"GROUP BY blocksTable.block_timestamp");
DataStream<Test> resultStream = tableEnv
.toRetractStream(sqlResult,Row.class)
.map(t -> {
Row r = t.f1;
String field2 = r.getField(0).toString();
long count = Long.valueOf(r.getField(1).toString());
return new Test(field2,count);
})
.returns(Test.class);
Then, I print the results :
resultStream.print();
But I don't get any answers, my program is stuck...
For the schema used for serialization and deserialization, here is my test class which stores the result of my query (two fields a string and a long for respectively the block_timestamp and the count) :
public class TestSchema implements DeserializationSchema<Test>, SerializationSchema<Test> {
#Override
public Test deserialize(byte[] message) throws IOException {
return Test.fromString(new String(message));
}
#Override
public boolean isEndOfStream(Test nextElement) {
return false;
}
#Override
public byte[] serialize(Test element) {
return element.toString().getBytes();
}
#Override
public TypeInformation<Test> getProducedType() {
return TypeInformation.of(Test.class);
}
}
This is the same principle for BlockSchema and TransactionsSchema classes.
Do you know why I can't get the result of my query ? Should I test with BatchExecutionEnvironment ?

Related

How do I create a very simple rule using Apache Calcite and use it on Apache Flink?

I have this application in Flink which use Table API to print data from a source. THe official documentation of Flink says that the Table API uses Calcite on its core to translate and optimize query plans. They don't describe it very in deep, so I went to the source code and tried to copy some codes from there. But, as far as I saw, they use Calcite rules as well.
What if I want to implement my own rule? Is it possible? How do I implement a simple rule in Calcite to change the parameter of a filter for example?
Here is my code
public class HelloWorldCalcitePlanTableAPI {
private static final Logger logger = LoggerFactory.getLogger(HelloWorldCalcitePlanTableAPI.class);
private static final String TICKETS_STATION_01_PLATFORM_01 = "TicketsStation01Plat01";
public static void main(String[] args) throws Exception {
new HelloWorldCalcitePlanTableAPI("127.0.0.1", "127.0.0.1");
}
public HelloWorldCalcitePlanTableAPI(String ipAddressSource01, String ipAddressSink) throws Exception {
// Start streaming from fake data source sensors
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
// StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, tableConfig);
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
// Calcite configuration file to change the query execution plan
// CalciteConfig cc = tableEnv.getConfig().getCalciteConfig();
CalciteConfig cc = new CalciteConfigBuilder()
.addNormRuleSet(RuleSets.ofList(MyFilterReduceExpressionRule.FILTER_INSTANCE))
.replaceDecoRuleSet(RuleSets.ofList(MyDataStreamRule.INSTANCE))
.build();
tableEnv.getConfig().setCalciteConfig(cc);
// obtain query configuration from TableEnvironment
StreamQueryConfig qConfig = tableEnv.queryConfig();
qConfig.withIdleStateRetentionTime(Time.minutes(30), Time.hours(2));
// Register Data Source Stream tables in the table environment
tableEnv.registerTableSource(TICKETS_STATION_01_PLATFORM_01,
new MqttSensorTableSource(ipAddressSource01, TOPIC_STATION_01_PLAT_01_TICKETS));
Table result = tableEnv.scan(TICKETS_STATION_01_PLATFORM_01)
.filter(VALUE + " >= 50 && " + VALUE + " <= 100 && " + VALUE + " >= 50")
;
tableEnv.toAppendStream(result, Row.class).print();
System.out.println("Execution plan ........................ ");
System.out.println(env.getExecutionPlan());
System.out.println("Plan explaination ........................ ");
System.out.println(tableEnv.explain(result));
System.out.println("........................ ");
System.out.println("NormRuleSet: " + cc.getNormRuleSet().isDefined());
System.out.println("LogicalOptRuleSet: " + cc.getLogicalOptRuleSet().isDefined());
System.out.println("PhysicalOptRuleSet: " + cc.getPhysicalOptRuleSet().isDefined());
System.out.println("DecoRuleSet: " + cc.getDecoRuleSet().isDefined());
// #formatter:on
env.execute("HelloWorldCalcitePlanTableAPI");
}
}
public class MyDataStreamRule extends RelOptRule {
public static final MyDataStreamRule INSTANCE = new MyDataStreamRule(operand(DataStreamRel.class, none()), "MyDataStreamRule");
public MyDataStreamRule(RelOptRuleOperand operand, String description) {
super(operand, "MyDataStreamRule:" + description);
}
public MyDataStreamRule(RelBuilderFactory relBuilderFactory) {
super(operand(DataStreamRel.class, any()), relBuilderFactory, null);
}
public void onMatch(RelOptRuleCall call) {
DataStreamRel dataStreamRel = (DataStreamRel) call.rel(0);
System.out.println("======================= MyDataStreamRule.onMatch ====================");
}
}
public class MyFilterReduceExpressionRule extends RelOptRule {
public static final MyFilterReduceExpressionRule FILTER_INSTANCE = new MyFilterReduceExpressionRule(
operand(LogicalFilter.class, none()), "MyFilterReduceExpressionRule");
public MyFilterReduceExpressionRule(RelOptRuleOperand operand, String description) {
super(operand, "MyFilterReduceExpressionRule:" + description);
}
public MyFilterReduceExpressionRule(RelBuilderFactory relBuilderFactory) {
super(operand(LogicalFilter.class, any()), relBuilderFactory, null);
}
public MyFilterReduceExpressionRule(RelOptRuleOperand operand) {
super(operand);
}
#Override
public void onMatch(RelOptRuleCall arg0) {
System.out.println("======================= MyFilterReduceExpressionRule.onMatch ====================");
}
}

How to call a method that returns a list in another method and delete items in this list using iterator

Hey guys am new to this and I would appreciate any help.
I want to call getListTenant() from my save function below and clear the list using iterator before doing my save. Below is the code in my controller:
package controllers;
public class TenantController extends AppController {
Tenant tenant;
FacilityUnit unit;
// list tenants in selected facility
public Result listTenant() {
return ok(Json.toJson(getTenantList()));
}
private List<Tenant> getTenantList() {
List<Tenant> tenants = Tenant.find
.fetch("unit.facility")
.where().eq("unit.facility", currentFacility())
.findList();
return tenants;
}
public Result saveTenant() {
JsonNode submissionNode = request().body().asJson();
JsonNode itemsArray = submissionNode.get("items");
//clear tenant
// create the new tenant
if (itemsArray.isArray()) {
for (JsonNode itemNode : itemsArray) {
JsonNode tenantNode = itemNode.get("tenant");
String tenantId = tenantNode.get("id").asText();
JsonNode unitNode = itemNode.get("unit");
String unitId = unitNode.get("id").asText();
System.out.println("##### Tenant ID IS " + tenantId);
System.out.println("##### unit ID IS " + unitId);
// Tenant.find.where().eq("tenant.id",
// tenant.getTenant().getId()).eq("unit.id", unit.getId()
// ).delete();
// Util.isNotEmpty() &&
if (Util.isNotEmpty(tenantId) && Util.isNotEmpty(unitId)) {
// these two are the minimal criteria for an tenant
Tenant tenant = new Tenant();
tenant.setTenant(Person.find.byId(tenantId));
tenant.setUnit(FacilityUnit.find.byId(unitId));
tenant.save();
System.out.println("##### SAVED A TENANT");
}
}
}
System.out.println("##### DONE");
return ok(infoMessage("Update of " + tenant.getTenant() + "successful"));
}
Iterate over the Tenants from getTenantList() and call delete() on them to clear them.
For example:
private List<Tenant> getTenantList() {
List<Tenant> tenants = Tenant.find
.fetch("unit.facility")
.where().eq("unit.facility", currentFacility())
.findList();
return tenants;
}
public Result saveTenant() {
// Do something...
// ...
// Get the tenants list that we want to clear before saving.
// Best way to do this is loop over it and call delete.
for (Tenant tenant : getTenantList()) {
tenant.delete();
}
// Do some more things...
// ...
}

Spring JDBC & MySQL - Getting EmptyResultDataAccessException when using queryForObject() but data exists

my code is:
public AppMetaDBO getAppMetaRecord() throws SQLException {
JdbcOperations jdbcOperations = new JdbcTemplate(dataSource);
String query = "SELECT * FROM " + TABLE_APP_META;
logger.config("Executing SQL query:[" + query + "]");
return jdbcOperations.queryForObject(query, new AppMetaRowMapper());
}
Where AppMetaRowMapper is:
public class AppMetaRowMapper implements org.springframework.jdbc.core.RowMapper<AppMetaDBO> {
#Override
public AppMetaDBO mapRow(ResultSet resultSet, int i) throws SQLException {
return new AppMetaDBO(resultSet.getDouble(COL_LAST_SUPPORTED_VER));
}
}
And AppMetaDBO is simply:
#Data
#AllArgsConstructor
public class AppMetaDBO implements Serializable {
private double last_supported_version;
}
This throws a EmptyResultDataAccessExceptioneven though the DB table looks like this:
It is important to note that the same server is able to write to the db with no issues, and even read from a different table with no issues.
The code that reads without issues is very similar and works fine:
UserDBO result = null;
try {
JdbcOperations jdbcOperations = new JdbcTemplate(dataSource);
String query = "SELECT *" + " FROM " + TABLE_USERS + " WHERE " + COL_UID + "=" + quote(uid);
logger.config("Executing SQL query:[" + query + "]");
result = jdbcOperations.queryForObject(query, new UserDboRowMapper());
} catch(EmptyResultDataAccessException ignored) {}
return result;
Also important to note that the db server is not on a remote machine. In the local machine these issues are non-existent.
Please help I have absolutely no idea what I am doing wrong.
Thanks in advance.

Efficient Spark Cassandra Java join

I've got two tables:
my_keyspace.name with columns:
name (string) - partition key
timestamp (date) - second part of partition key
id (int) - third part of partition key
my_keyspace.data with columns:
timestamp (date) - partition key
id (int) - second part of partition key
data (string)
I'm trying to join on timestamp and id from a name table. I'm doing it by getting all timestamps and ids associated with a given name and retrieving data from data table for those entries.
It's really fast to do it in CQL. I expected Spark Cassandra to be equally fast at it, but instead it seems to be doing a full table scan. It might be due to not knowing which fields are partition/primary key. Though I don't seem to be able to find a way to tell it the mappings.
How can I make this join as efficient as it should be? Here's my code sample:
private static void notSoEfficientJoin() {
SparkConf conf = new SparkConf().setAppName("Simple Application")
.setMaster("local[*]")
.set("spark.cassandra.connection.host", "localhost")
.set("spark.driver.allowMultipleContexts", "true");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaPairRDD<DataKey, NameRow> nameIndexRDD = javaFunctions(sc).cassandraTable("my_keyspace", "name", mapRowTo(NameRow.class)).where("name = 'John'")
.keyBy(new Function<NameRow, DataKey>() {
#Override
public DataKey call(NameRow v1) throws Exception {
return new DataKey(v1.timestamp, v1.id);
}
});
JavaPairRDD<DataKey, DataRow> dataRDD = javaFunctions(sc).cassandraTable("my_keyspace", "data", mapRowTo(DataRow.class))
.keyBy(new Function<DataRow, DataKey>() {
#Override
public DataKey call(DataRow v1) throws Exception {
return new DataKey(v1.timestamp, v1.id);
}
});
JavaRDD<String> cassandraRowsRDD = nameIndexRDD.join(dataRDD)
.map(new Function<Tuple2<DataKey, Tuple2<NameRow, DataRow>>, String>() {
#Override
public String call(Tuple2<DataKey, Tuple2<NameRow, DataRow>> v1) throws Exception {
NameRow nameRow = v1._2()._1();
DataRow dataRow = v1._2()._2();
return nameRow + " " + dataRow;
}
});
List<String> collect = cassandraRowsRDD.collect();
}
The way to do this join more efficiently is to actually invoke joinWithCassandraTable this can be done by wrapping results with another javaFunctions call:
private static void moreEfficientJoin() {
SparkConf conf = new SparkConf().setAppName("Simple Application")
.setMaster("local[*]")
.set("spark.cassandra.connection.host", "localhost")
.set("spark.driver.allowMultipleContexts", "true");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<DataKey> nameIndexRDD = sc.parallelize(javaFunctions(sc).cassandraTable("my_keyspace", "name", mapRowTo(DataKey.class))
.where("name = 'John'")
.collect());
JavaRDD<Data> dataRDD = javaFunctions(nameIndexRDD).joinWithCassandraTable("my_keyspace", "data", allColumns, someColumns("timestamp", "id"), mapRowTo(Data.class), mapToRow(DataKey.class))
.map(new Function<Tuple2<DataKey, Data>, Data>() {
#Override
public Data call(Tuple2<DataKey, Data> v1) throws Exception {
return v1._2();
}
});
List<Data> data = dataRDD.collect();
}
The important thing is to wrap a JavaRDD with javaFunctions. So it is possible to not call collect and sc.parallelize on nameIndexRDD

Can not fix "Unknown table" exception from JOOQ query

I am having trouble getting data from a database I know exists and I know the format of.
In the code snippet below the "if conn != null" is just a test to verify the database name, table name, etc are all correct, and they DO verify.
The last line below is what generates the exception
public static HashMap<Integer, String> getNetworkMapFromRemote(DSLContext dslRemote, Connection conn, Logger logger) {
HashMap<Integer,String> remoteMap = new HashMap<Integer, String>();
// conn is only used for test purposes
if (conn != null) {
// test to be sure database is ok
try
{
ResultSet rs = conn.createStatement().executeQuery("SELECT networkid, name FROM network");
while (rs.next()) {
System.out.println("TEST: nwid " + rs.getString(1) + " name " + rs.getString(2));
}
rs.close();
}
catch ( SQLException se )
{
logger.trace("getNetworksForDevices SqlException: " + se.toString());
}
}
// ----------- JOOQ problem section ------------------------
Network nR = Network.NETWORK.as("network");
// THE FOLLOWING LINE GENERATES THE UNKNOWN TABLE
Result<Record2<Integer, String>> result = dslRemote.select( nR.NETWORKID, nR.NAME ).fetch();
This is the output
TEST: nwid 1 name Network 1
org.jooq.exception.DataAccessException: SQL [select `network`.`NetworkId`, `network`.`Name` from dual]; Unknown table 'network' in field list
at org.jooq.impl.Utils.translate(Utils.java:1288)
at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:495)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:327)
at org.jooq.impl.AbstractResultQuery.fetch(AbstractResultQuery.java:330)
at org.jooq.impl.SelectImpl.fetch(SelectImpl.java:2256)
at com.nvi.kpiserver.remote.KpiCollectorUtil.getNetworkMapFromRemote(KpiCollectorUtil.java:328)
at com.nvi.kpiserver.remote.KpiCollectorUtilTest.testUpdateKpiNetworksForRemoteIntravue(KpiCollectorUtilTest.java:61)
.................
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown table 'network' in field list
.................
For the sake of completness here is part of the JOOQ generated class file for Network
package com.wbcnvi.intravue.generated.tables;
#javax.annotation.Generated(value = { "http://www.jooq.org", "3.3.1" },
comments = "This class is generated by jOOQ")
#java.lang.SuppressWarnings({ "all", "unchecked", "rawtypes" })
public class Network extends org.jooq.impl.TableImpl<com.wbcnvi.intravue.generated.tables.records.NetworkRecord> {
private static final long serialVersionUID = 1729023198;
public static final com.wbcnvi.intravue.generated.tables.Network NETWORK = new com.wbcnvi.intravue.generated.tables.Network();
#Override
public java.lang.Class<com.wbcnvi.intravue.generated.tables.records.NetworkRecord> getRecordType() {
return com.wbcnvi.intravue.generated.tables.records.NetworkRecord.class;
}
public final org.jooq.TableField<com.wbcnvi.intravue.generated.tables.records.NetworkRecord, java.lang.Integer> NWID = createField("NwId", org.jooq.impl.SQLDataType.INTEGER.nullable(false), this, "");
public final org.jooq.TableField<com.wbcnvi.intravue.generated.tables.records.NetworkRecord, java.lang.Integer> NETWORKID = createField("NetworkId", org.jooq.impl.SQLDataType.INTEGER.nullable(false).defaulted(true), this, "");
public final org.jooq.TableField<com.wbcnvi.intravue.generated.tables.records.NetworkRecord, java.lang.String> NAME = createField("Name", org.jooq.impl.SQLDataType.CHAR.length(40).nullable(false).defaulted(true), this, "");
public final org.jooq.TableField<com.wbcnvi.intravue.generated.tables.records.NetworkRecord, java.lang.Integer> USECOUNT = createField("UseCount", org.jooq.impl.SQLDataType.INTEGER.nullable(false).defaulted(true), this, "");
public final org.jooq.TableField<com.wbcnvi.intravue.generated.tables.records.NetworkRecord, java.lang.Integer> NETGROUP = createField("NetGroup", org.jooq.impl.SQLDataType.INTEGER.nullable(false).defaulted(true), this, "");
public final org.jooq.TableField<com.wbcnvi.intravue.generated.tables.records.NetworkRecord, java.lang.String> AGENT = createField("Agent", org.jooq.impl.SQLDataType.CHAR.length(16), this, "");
public Network() {
this("network", null);
}
public Network(java.lang.String alias) {
this(alias, com.wbcnvi.intravue.generated.tables.Network.NETWORK);
}
..........
Based on the "unknown table" exception I thought there was a problem connected to the wrong database or wrong server, but the console output is correct for a JDBC query.
Any thoughts are appreciated, perhaps something else can be the root cause or the DSLContext is not valid (but I would think that would generate a different exception).
The answer ends up being simple, I did not include the .from() method
Result<Record2<Integer, String>> result = dslRemote.select( nR.NETWORKID, nR.NAME )
.from(nR)
.fetch();
That is why the table was unknown, I never put the from method in.

Categories