I am developing and maintaining a database-abstraction library called jOOQ, which aims to "internalise" SQL as an external DSL into Java. The goal of this endeavour is to allow for type-safely constructing and executing all possible SQL syntax elements of the most popular RDBMS. jOOQ's internal DSL is becoming more and more complex, and I'd like to get a formal hold of it. The idea is that I would like to be able to have some sort of formal definition of SQL as input, e.g.
select ::= subquery [ for-update-clause ]
subquery ::= SELECT [ { ALL | DISTINCT | UNIQUE } ] select-list
[ FROM table-reference ] ..
select-list ::= expression [ [ AS ] alias ] [, expression ... ]
expression ::= ...
alias ::= ...
table-reference ::= ...
The input could also be defined in XML or any other descriptive meta-language. Once I have that input, I'd like to generate from that input a set of Java interfaces, that model the defined syntax in Java. Example interfaces would be:
// The first "step" of query creation is modelled with this interface
interface Select0 {
// The various SELECT keywords are modelled with methods
// returning the subsequent generated syntax-element
Select1 select(Expression...);
Select1 selectAll(Expression...);
Select1 selectDistinct(Expression...);
Select1 selectUnique(Expression...);
}
// The second "step" of query creation is optional, hence it
// inherits from the third "step"
interface Select1 extends Select2 {
// Here a FROM clause may be added optionally
Select2 from(TableReference...);
}
// To keep it simple, the third "step" is the last for this example
interface Select2 extends SelectEnd {
// WHERE, CONNECT BY, PIVOT, UNPIVOT, GROUP BY, HAVING, ORDER BY, etc...
}
With the above interfaces, it will be possible to construct SQL queries in Java, like jOOQ already allows to do today:
create.select(ONE, TWO).from(TABLE)...
create.selectDistinct(ONE, TWO).from(TABLE)...
// etc...
Also, I'd like to exclude some syntax elements for some specific builds. E.g. when I build jOOQ for exclusive use with MySQL, there is no need to support for the SQL MERGE statement.
Is there any existing library implementing such a general approach in order to formally internalise and external DSL to Java? Or should I roll my own?
What you are really trying to do is to translate generic SQL into calls on your internal APIs. Seems reasonable.
To do that, you need a parser for "generic SQL", and a means to generate code from that parser. Typically you need the parser to build an abstract syntax tree, you pretty much need a symbol table (so that you know what things are table names, what are column names, and whether those column names are from table A or table B, so somewhere you need access to the SQL DDL that define the data model.... which requires you parse SQL again :).
With the AST and the symbol table, you can generate code a lot of ways, but a simple method is to walk the AST an translate constructs as you encounter them. This won't allow for building optimized queries; that requires more complex code generation, but I'd expect it to be adequate if supported by appropriate API functions you supply.
Actual code generation could be done by just printing Java text; if you go the ANTLR route you'll have to do something like this. An alternative is to literally transform the SQL code fragments (as ASTs) into Java code fragments (as ASTs). The latter scheme gives you more control (you can actually do transformations on Java code fragments if they are ASTs) and if done right, can be checked by a code generation tool.
Our DMS Software Reengineering Toolkit would be a good foundation for this. DMS provides an ecosystem for building translation tools, including robust parser machinery, symbol table support, surface-syntax pattern-directed translation rules, and has generic SQL (SQL 2011, the standard) as an available, tested parser, as well as Java, to be used on the code generation side.
Related
I have a problem.
I have the following query:
SELECT
Agents.Owner,
Orders.*
FROM
Orders
INNER JOIN Agents ON Agents.id = Orders.agentid
WHERE
Agents.botstate = 'Active' AND Orders.state = 'Active' AND(
Orders.status = 'Failed' OR Orders.status = 'Processing' AND Orders.DateTimeInProgressMicro < DATE_SUB(NOW(), INTERVAL 10 SECOND))
ORDER BY
Orders.agentid
But now I need to convert this to JOOQ language. This is what I came up with:
create.select()
.from(DSL.table("Orders"))
.join(DSL.table("Agents"))
.on(DSL.table("Agents").field("Id").eq(DSL.table("Orders").field("AgentId")))
.where(DSL.table("Agents").field("botstate").eq("Active")
.and(DSL.table("Orders").field("state").eq("Active"))
.and((DSL.table("Orders").field("status").eq("Failed"))
.or(DSL.table("Orders").field("status").eq("Processing")))).fetch().sortAsc(DSL.table("Orders").field("AgentId"));
Now the first problem is that it doesn't like all the .eq() statements, because it gives me the error:
Cannot resolve method: eq(Java.lang.String). And my second problem is that I don't know how to write this statement in JOOQ: Orders.DateTimeInProgressMicro < DATE_SUB(NOW(), INTERVAL 10 SECOND).
The first problem is caused by the fact that I can't just use:
.on(Agents.Id).eq(Orders.AgentId)
But instead I need to enter for every table:
DSL.table("table_name")
And for every column:
DSL.field("column_name")
Without that it doesn't recognize my tables and columns
How can I write the SQL in the JOOQ version correctly or an alternative solution is that I can use normal SQL statements?
Why doesn't your code work?
Table.field(String) does not construct a path expression of the form table.field. It tries to dereference a known field from Table. If Table doesn't have any known fields (e.g. in the case of using DSL.table(String), then there are no fields to dereference.
Correct plain SQL API usage
There are two types of API that allow for working with dynamic SQL fragments:
The plain SQL API to construct plain SQL fragments and templates
The Name API to construct identifiers and jOOQ types from identifiers
Most people use these only when generated code isn't possible (see below), or jOOQ is missing some support for vendor-specific functionality (e.g. some built-in function).
Here's how to write your query with each:
Plain SQL API
The advantage of this API is that you can use arbitrary SQL fragments including vendor specific function calls that are unknown to jOOQ. There's a certain risk of running into syntax errors, SQL injection (!), and simple data type problems, because jOOQ won't know the data types unless you tell jOOQ explicitly
// as always, this static import is implied:
import static org.jooq.impl.DSL.*;
And then:
create.select()
.from("orders") // or table("orders")
.join("agents") // or table("agents")
.on(field("agents.id").eq(field("orders.id")))
.where(field("agents.botstate").eq("Active"))
.and(field("orders.state").eq("Active"))
.and(field("orders.status").in("Failed", "Processing"))
.orderBy(field("orders.agentid"))
.fetch();
Sometimes it is useful to tell jOOQ about data types explicitly, e.g. when using these expressions in SELECT, or when creating bind variables:
// Use the default SQLDataType for a Java class
field("agents.id", Integer.class);
// Use an explicit SQLDataType
field("agents.id", SQLDataType.INTEGER);
Name API
This API allows for constructing identifiers (by default quoted, but you can configure that, or use unquotedName()). If the identifiers are quoted, the SQL injection risk is avoided, but then in most dialects, you need to get case sensitivity right.
create.select()
.from(table(name("orders")))
.join(table(name("agents")))
.on(field(name("agents", "id")).eq(field(name("orders", "id"))))
.where(field(name("agents", "botstate")).eq("Active"))
.and(field(name("orders", "state")).eq("Active"))
.and(field(name("orders", "status")).in("Failed", "Processing"))
.orderBy(field(name("orders", "agentid")))
.fetch();
Using the code generator
Some use cases prevent using jOOQ's code generator, e.g. when working with dynamic schemas that are only known at runtime. In all other cases, it is very strongly recommended to use the code generator. Not only will building your SQL statements with jOOQ be much easier in general, you will also not run into problems like the one you're presenting here.
Your query would read:
create.select()
.from(ORDERS)
.join(AGENTS)
.on(AGENTS.ID.eq(ORDERS.ID))
.where(AGENTS.BOTSTATE.eq("Active"))
.and(ORDERS.STATE.eq("Active"))
.and(ORDERS.STATUS.in("Failed", "Processing"))
.orderBy(ORDERS.AGENTID)
.fetch();
Benefits:
All tables and columns are type checked by your Java compiler
You can use IDE auto completion on your schema objects
You never run into SQL injection problems or syntax errors
Your code stops compiling as soon as you rename a column, or change a data type, etc.
When fetching your data, you already know the data type as well
Your bind variables are bound using the correct type without you having to specify it explicitly
Remember that both the plain SQL API and the identifier API were built for cases where the schema is not known at compile time, or schema elements need to be accessed dynamically for any other reason. They are low level APIs, to be avoided when code generation is an option.
I have a Java, GraphQL, Hibernate, PostgreSQL, QueryDSL application that queries a very large PostgreSQL table with over 275 columns.
I've created a GraphQL schema with the 25 most popular columns as query-able fields. I'd like to add a generic "field" input type that consists of a name (the db column name + "_" + operation (like gte, gt, contains, etc.) and a value (the value the user is searching for).
So when the user (in GraphiQL) enters something like (field:{name:"age_gt", value:"50"}) as a search input to the GraphQL query, I can come up with: "age > 50".
All that works fine, but when it's time to create the Predicate and add it to the whole query ( booleanBuilder.and(new Predicate) ), I cannot figure out how to create a Predicate that just contains a raw String of SQL ("age > 50").
I've created several Predicates the "right" way using my entity POJO tied to Hibernate and the jpa generated "Q" object. But I need the ability to add one or more Predicates that are just a String of SQL. I'm not even sure if the ability exists, the documentation for QueryDSL Predicates is non-existent.
I'm thinking PredicateOperation() might be the answer, but again, no documentation and I cannot find any examples online.
My apologies for not posting code, all my stuff is behind a firewall on a different network so there's no cut and paste to my internet machine.
In Hibernate its possible to inject arbitrary SQL using custom functions or the FUNCTION-function (introduced in JPA 2.1). In QueryDSL its possible to inject arbitrary JPQL/HQL through TemplateExpressions. Combined you get:
Expressions.numberTemplate("FUNCTION('SUM', {0}), x)
However, age > 50 as expression is probably valid JPQL as well, so one can just write:
Expressions.numberTemplate("SUM(age)")
Either way, its probably best to create a visitor that traverses the GraphQL query and creates the proper expression in QueryDSL, as TemplateExpressions are prone to SQL injection.
Im exposing REST services through use of Sprint MVC 4.0 framework and I try following Odata specification for the Query Parameters such as $filter, $search and $orderBy. Each of these contains expressions that I need to parse, build abstract syntax trees and validate. They are all retrieved as String.
I do not need all the constructions that are defined in the Odata grammer (http://docs.oasis-open.org/odata/odata/v4.0/cos01/abnf/odata-abnf-construction-rules.txt), I just pick the ones that are relevant for my uses cases (very few actually)
I would like some tip on how to parse and build the abstract tree in a easy way and if Odata4j might be used as a Utility library to do this job for me? I would like to avoid dragging bunch of new dependencies to odata4j, since I will only use small piece of the code.
You can certainly use odata4j for building ASTs for query parameters. I've done that for exactly the purposes you cite. I split off the query parameters, and then split again on '&' to get parameters. For each of these I inspect the parameter name ($select, $filter, etc.) and then based on that use the corresponding OptionsQueryParser static method on the value, returning an number, or list, or AST specific to that query parameter. For expression ASTs, look at PrintExpressionVisitor and use that as a pattern for writing your own visitor to walk the AST.
I'm currently evaluating JOOQ because I believe I started reinventing the wheel which looks very close to part of JOOQ :)
Now, while digging in great JOOQ documentation I've found that my use case lies somewhere between Using JOOQ as SQL Builder and Using JOOQ as SQL Builder with Code generation i.e. I would like to:
Create plain SQL strings like it is shown in Using JOOQ as SQL Builder part
Instead of using hard-coded DSL.fieldByName("BOOK","TITLE") constructs, I prefer storing name of a table along with it's column names and types like it's shown in Using JOOQ as SQL Builder with Code generation part
I prefer not to use code generation (at least not on regular basis), but rather creating TableImpl myself when new table is needed.
While digging in manual, I've found how table implementation should look like in chapter Generated tables. However, TableImpl class as well as Table interface should be parameterized with record type and the same goes for TableField class. I believe this is done for easier type inference when directly querying database and retrieving results, though I may be mistaken.
So my questions are:
Is there a guide in manual on how to create Table and TableField implementations? Or I can simply generate them once for my database schema and use generated code as a guideline?
How can I gracefully "discard" record type parameters in implemented classes? First, I thought about using java.lang.Void class as type parameter but then I noticed that only subclasses of Record are allowed... The reason is that I don't need record types at all because I plan to use generated by JOOQ SQL queries in something like Spring JdbcTemplate so mapping is done by myself.
Thanks in advance for any help!
Given your use-case, I'm not sure why you'd like to roll your own Table and TableField implementations rather than using the ones generated by jOOQ. As you stated yourself, you don't have to regenerate that code every time the DB schema changes. Many users will just generate the schema once in a while and then put the generated artefacts under version control. This will help you keep track of newly added changes.
To answer your questions:
Yes, there are some examples around the use of CustomTable. You may also find some people sharing similar experiences on the user group
Yes you can just use Record. Your minimal custom table type would then be:
class X extends TableImpl<Record> {
public X() {
super("x");
}
}
Note that you will be using jOOQ's internal API (TableImpl), which is not officially supported. While I'm positive that it'll work, it might break in the future, e.g. as the super constructor signature might change.
I am having some difficulty structuring the exact Elasticsearch query that I am looking for, specifically using the java api.
It seems like if I construct a fieldsearch using the java api, I can only use a single field and a single term. If I use a querystring, it looks like I can apply an entire query to a set of fields. What I want to do is apply a specific query to one field, and another query to a different field.
This is confusing I know. This is the type of query I would like to construct
(name contains "foo" or name contains "bar") AND ( date equals today)
I am really loving Elasticsearch for it's speed and flexibility, but the docs on http://www.elasticsearch.org/ are kind of tough to parse (I noticed "introduction" and "concepts" have no links, but the API section does) If anyone has some good resources on mastering these queries, I'd love to see them. Thanks!
Sounds like a bool query with 2 must clause:
matchQuery("name", "foo bar")
rangeQuery("date").from("2013-02-05").to("2013-02-06")
Does it help?