I have a SQL query that I would like to parse evaluate. I have parsed the SQL using JSQL Parser. Now, I need to evaluate the where clause in the SQL. I would like to do it in Flink as part of the filter function. Basically, stream.filter(Predicate<Row>). The Predicate<Row> is what I need to get from the evaluation of the SQL's where clause and apply it on the streaming record.
Ex: SELECT COLUMN FROM TABLE WHERE (ac IS NOT NULL AND ac = 'On')
I would like to parse the above query and given a streaming record with say ac = on, I would like to run the above expression evaluation on that record.
Any thoughts on how I can do it?
I would like to try using expression evaluation with DFS but kinda confused how to run by it. Any help is appreciated!
If the SQL query is known at compile time, it's more straightforward to do this by integrating Flink SQL (via the Table API) into your DataStream application. See the docs for more info and examples.
The overall approach would be to convert your DataStream into a dynamic Table (which can be done automatically if the stream is a convenient type, such as a POJO), apply the SQL query to it, and then (if necessary) convert the resulting Table back to a DataStream.
Or maybe just implement the entire application with the Table API if you don't need any of the features that are unique to DataStreams.
On the other hand, if the query is dynamic and isn't provided until runtime, you'll need to pursue something like what you've proposed. FWIW, others with similar requirements have used dynamic languages with JVM-based runtimes, such as Javascript via Rhino, or Groovy. The overall approach is to use a BroadcastProcessFunction, with the dynamic code being broadcast into the operator.
Related
I have a Korma based software stack that constructs fairly complex queries against a MySQL database. I noticed that when I am querying for datetime columns, the type that I get back from the Korma query changes depending on the syntax of the SQL query being generated. I've traced this down to the level of clojure.java.jdbc/query. If the form of the query is like this:
select modified from docs order by modified desc limit 10
then I get back maps corresponding to each database row in which :modified is a java.sql.Timestamp. However, sometimes our query generator generates more complex union queries, such that we need to apply an order by ... limit ... constraint to the final result of the union. Korma does this by wrapping the query in parentheses. Even with only a single subquery--i.e., a simple parenthesized select--so long as we add an "outer" order by ..., the type of :modified changes.
(select modified from docs order by modified desc limit 10) order by modified desc
In this case, clojure.java.jdbc/query returns :modified values as strings. Some of our higher level code isn't expecting this, and gets exceptions.
We're using a fork of Korma, which is using an old (0.3.7) version of clojure.java.jdbc. I can't tell if the culprit is clojure.java.jdbc or java.jdbc or MySQL. Anyone seen this and have ideas on how to fix it?
Moving to the latest jdbc in a similar situation changed several other things for us and was a decidedly "non-trvial" task. I would suggest getting off of a korma fork soon and then debugging this.
For us the changes focused around what korma returned on update calls changed between the verions of the backing jdbc. It was well worth getting current even though it's a moderately painful process.
Getting current with jdbc will give you fresh new problems!
best of luck with this :-) These things tend to be fairly specific to the DB server you are using.
Other options for you is to have a policy of aways specifying an order-by parameter or building a library to coerce the strings into dates. Both of these have some long term technical dept problems.
Is the following possible for the BaseX database?
insert one or more XQuery functions into the database
call BaseX from Java, specifying which function to call, and receive a response
Or perhaps in a worse case:
Have a single file with all the XQuery functions I wish to define (this is certainly possible)
Somehow select a single function from that file and execute the query
At this moment I have a number of files, each of which contains a single xquery function. It is a bad solution; I would like to meet a more elegant one.
I've just tested my application under the profiler and found out that sql strings use about 30% of my memory! This is bizarre.
There are a lot of strings like this stored in app memory. This is SQL queries generated by hibernate, note the different numbers and trailing underscores:
select avatardata0_.Id as Id4305_0_,...... where avatardata0_.Id=? for update
select avatardata0_.Id as Id4347_0_,...... where avatardata0_.Id=? for update
Here is the part I can't understand. Why does hibernate have to generate different sql strings with different identifiers like "Id4305_0_" for each query? Why can't it use one query string for all identical queries? Is this some kind of trick to bypass query caching?
I would greatly appreciate if someone would describe me why it happening and how to avoid such resource wasting.
UPDATE
Ok. I found it. I was wrong assuming memory leak, It was my fault. Hibernate is working as intended.
My app created 121(!) SessionFactories in 10 threads, they produced about 2300 instances of SingleTableEntityPersisters. And each SingleTableEntityPersister generates about 15 SQL queries with different identifiers. Hibernate was forced to generate about 345.000 different SQL queries. Everything is fine, nothing weird :)
There is a logic behind the query string that hibernate generates. Its primary aim is to get unique aliases for tables and columns names.
From your query,
select avatardata0_.Id as Id4305_0_,...... where avatardata0_.Id=?
avatardata0_ ==> avatardata is the alias of the table and 0_ is appended to indicate it is the first table in the query. So if it were the second table(or Entity) in the query it should have been shown as avatardata1_. It uses the same logic for the column aliases.
So, this way all the possible conflicts are avoided.
You are seeing theses queries because you have turns on the show_sql flag the configuration. This is intended for the debugging of queries. Once you application started working you are supposed turn it off.
Read more on the API docs here.
I am not much aware of the memory consumption part, but you repeat your tests with the above flag turned off and see if there is any improvement.
Assuming you are using sql server, you might want to check the parameter type declaration for '?', making sure the declaration results in the same, fixed length declaration every time.
Dynamic length parameters would result in separate execution plans for each query. This could possibly comsume a lot of resources. What we see as the same procedure, get's interpreted by sql server as a different query, rendering a separate execution plan.
Thus,
exec myprocedure #p1 varchar(3)='foo'
and
exec myprocedure #p1 varchar(6)='foobar'
would result in different plans. Simply by the fact that the declarations of #p1, differ in size.
There is a lot to know about this behaviour. If the above applies to you, I would recommend you read up on 'parameter sniffing'.
No... you can generate you common query inside the hibernate. The logic behind is to mapping with table and fetch the record from there. It is used common query for all the database. Please create a common query like that :
Example :
select t.Id as Id4305_0_,...... from t where t.Id=?
I am building an application at work and need some advice. I have a somewhat unique problem in which I need to gather data housed in a MS SQL Server, and transplant it to a mySQL Server every 15 mins.
I have done this previously in C# with a DataGrid, but now am trying to build a Java version that I can run on an Ubuntu Server, but I can not find a similar model for Java.
Just to give a little background
When I pull the data from the MS SQL Server, it always has 9 columns, but could have anywhere from 0 - 1000 rows.
Before inserting into the mySQL Server blindly, I do manipulate some of the data.
I convert a time column to CST based on a STATE column
I strip some characters to prevent SQL injection
I tried using the ResultSet, but I am having issues with the "forward only result sets" rules.
What would be the best data structure to hold that information, manipulate it, and then parse it to insert later into mySQL?
This sounds like a job for PreparedStatements!
Defined here: http://download.oracle.com/javase/6/docs/api/java/sql/PreparedStatement.html
Quick example: http://download.oracle.com/javase/tutorial/jdbc/basics/prepared.html
PreparedStatements allows you to batch up sets of data before pushing them into the target database. They also allow you use the PreparedStatement.setString method which handles escaping characters for you.
For the time conversion thing, I would retrieve the STATE value from the row and then retrieve the time value. Before calling PreparedStatement.setDate, convert the time to CST if necessary.
I dont think that you would need all the overhead that an ORM tool requires.
You could consider using an ORM technology like Hibernate. This might seem a little heavyweight at first, but it means you can maintain the various table mappings for various databases with ease as well as having the power of Java's RegEx lib for any manipulation requirements.
So you'd have a Java class that represents the source table (with its Hibernate mapping) and another Java class that represents the target table and lastly a conversion utility class that does any manipulation of that data. Hibernate takes care of the CRUD SQL for you, so no need to worry about Database specific SQL (as long as you get the mapping correct).
It also lessens the SQL injection problem
We have an application where the user is allowed to enter expressions for performing calculations on the fields of a database table. The calculations allows various types of functions (math, logic, string, date etc). For e.g MAX(col1, col2, col3).
Note that these expressions can get complex by having nested functions. For e.g.
IF(LENGTH(StringColumn)=0, MAX(col1, col2, 32), MIN(col1, col2, col3)) > LENGTH(col2)
One way we have implemented this is having a java cc parser to parse the user entered expressions and then generating a tree type of data structure. The tree is then parsed in java and sql queries are generated for each of the functions used in the expressions. Finally after the queries are generated for each of the user entered expression, java executes this query using simple database call.
A major problem with this framework is that the database issues are to be handled in java. By database issues I mean some database limitation or any performance optimization. One database limitation with Microsoft SQL Server is that only 10 nested CASE WHEN statements are allowed. This means that while parsing the java code needs to estimate how many CASE WHEN's would the query string have before it is translated.
Similarly if there are any sql performance optimizations to be done, handling them in java simply not logical.
Does anyone know about any better design approaches for this problem?
Rather than reimplement a very SQL-like language that gets translated to SQL, have your users query the database with SQL.
I would look into Hibernate and it's HQL query language.
In response to the poster above, I think it would be a bad idea to let your users query the database with SQL directly, as you'd be opening yourself up to SQL injection attacks.
Some time ago i wrote a java applet with dynamic filter routines and there i translate the sql statements to javascript statements and execute them with javascripts exec function
You could have a look at JPA 2.0 Criteria API or Hibernate Criteria API
JPA 2.0 provides the so called Criteria API (http://stackoverflow.com/questions/2602757/creating-queries-using-criteria-api-jpa-2-0)
Hibernate has its own Criteria API (even before JPA 2.0) - but it is different from JPA 2.0 Criteria API. (http://www.ibm.com/developerworks/java/library/j-typesafejpa/)
The aim of both Criteria APIs is to provide a way to create sql queries at runtime in a more pleasant way then concatenating strings. (http://docs.jboss.org/hibernate/core/3.3/reference/en/html/querycriteria.html)
(JPA 2.0 Critiera API has a extra feature, it provides some kind of code generation, that makes it possible to write queries in a compile time save way. (http://docs.jboss.org/hibernate/core/3.3/reference/en/html/querycriteria.html))
Another approach which I could think was to look for language recognizers supported by database (which is Oracle in my case). Similar to what we currently use in java (i.e. javacc) if a similar framework is supported by the database then the intermediate string could be parsed and translated into a sql query.
The intermediate string I refer here is similar to the user entered string but may be exactly the same (e.g. column names could be transformed to actual physical column names).
Any thoughts (pros and cons) about this approach? Also any suggestions on language recognizers in Oracle would be highly appreciated.
Thank you.