Parse sql query using antlr parsetree to mongo bson document in Java

Parse sql query using antlr parsetree to mongo bson document in Java - java

I have a SQL like query example:
Select id,name from employee where age > 30 and department = 'IT' limit 200
The SQL query grammer is defined in an ANTLR4 grammar file. Is there any implementation that converts the parse tree of this query to a bson document?
The bson document will then be used to query a mongo db.

In one of my previous jobs I did something similar: got a query (not an sql, but pretty similar) and translated it to mongo query with antlr.
I don't have a code to share, However I can share my thoughts:
Mongo is not SQL compliant, so you can't just take a sql grammar. What about JOINs and all the relational algebra? What about aggregations that are pretty tricky in mongo with their aggregation framework? In the opposite direction, how do you generate SQL that gets translated to "exists" clause in mongo. There are many things like this, some are small, some are huge, but bottom line you must be talking about some kind of subset of sql ,some DSL that is allowed to be used as a query language and looks "like" an sql because people are used to SQL.
With that in mind, you should create your own grammar and Antlr will generate a lexer/parser for you. You'll also get for granted a syntax check of the query in Runtime. Antlr won't be able to parse the query if its not in a correct format obviously, some grammar rule will fail. This is an another reason to not take SQL "as is".
So far so good, you've created your own listener / visitor. In my case I've opted for creating an object representation of the query with internal state and everything.
So the query
Select id,name
from employee
where age > 30
and department = 'IT'
limit 200
Was translated to objects of type:
class Query {
private SelectClause select;
private FromClause from;
private WhereClause where;
private Limit limit;
}
class SelectClause {
private List<String> fields;
}
...
class WhereClause {
Condition root;
}
interface Condition {
...
}
class AndCondition implements Condition { // the same for Not, Or
}
For this particular query its something like:
Query q = new Query(new SelectClause(["id", "name"]), new FromClause("employee"), new WhereClause(new AndCondition(new SimpleLeafCondition("age", Operators.GT, 30), new SimpleLeafCondition("department", Operators.EQ, "IT" )), new Limit(30));
Then Its possible to make some optimizations in the query (like embedding of where clauses if you need, or, for example, manipulating the "For" part if you're working multi tenant environment and have different collections for different tenants).
After all you can go with design pattern "interpreter" and recursively parse the query objects and "translate" them to valid mongo query.
I remember that this step took me something like 1 day to accomplish (it was 7 years ago with mongo 2 I guess, but still), given the correct structure of objects representing the query, so this should not be that complicated. I'm bringing this up, because It looks like its your primary concern in the question.

Related

Optimize JPA dinamyc count query

Having the typical method which returns a paginated result, using CriteriaBuilder and performing 2 queries:
one that counts the total number of results
and another one that gives us the subset for the specified page
We have noticed that the first query, JPA does not optimize it at all because it's using the exists (from Oracle).
Java code:
Root<Foo> from = criteriaQuery.from(Foo.class);
//... predicates
CriteriaQuery<Long> countQuery = criteriaBuilder.createQuery(Long.class)
.select(criteriaBuilder.countDistinct(from))
.where(predicates.toArray(new Predicate[predicates.size()]));
Long numberResults = entityManager.createQuery(countQuery).getSingleResult();
SQL generated query:
SELECT COUNT(t0.REFERENCE)
FROM foo t0
WHERE EXISTS (
SELECT t1.REFERENCE
FROM foo t1
WHERE ((((t0.REFERENCE = t1.REFERENCE) AND (t0.VERSION_NUM = t1.VERSION_NUM)) AND (t0.ISSUER = t1.ISSUER)) AND (t1.REFERENCE LIKE ? AND (t1.VERSION_STATUS = ?)))
);
How do I avoid using the exists? Is there something wrong with the java code?

For different reasons, this issue and this related article enumerate some of them, EclipseLink uses EXISTS in the countDistinct operation implementation.
Although I can agree with you, be aware that the performance offered by EXISTS in Oracle is in fact very dependent of the use case, and it doesn't have to be poor. Please, consider review this mythical blog entry in the Tom Kyte blob.
So my advice is, please, keep using the generated code and corresponding SQL.
If you need or want to use a different approach, a perhaps more performant way of counting the records could be fetching the ids of the entities that match the provided predicates (the actual performance in fact is mostly dependent on these predicates in fact), and count the results in memory, with Java. I mean:
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
// I assume reference is String here
CriteriaQuery<String> query = cb.createQuery(String.class);
Root<Foo> root = query.from(Foo.class);
query
.select(root.get("reference"))
.distinct(true)
.where(predicates.toArray(new Predicate[predicates.size()]))
;
List<String> references = entityManager.createQuery(query).getResultList();
int count = references.size();
Although I think it is always not advisable, if the amount of data is not large, you could even fetch the results once from the database, and do the paging in memory with Java, it is straightforward using subList, for instance.
At a final word, AFAIK other JPA providers such as Hibernate implements count in a different way: if switching the JPA provider is an option you could try using it instead.

With or without EXISTS, the query plans are identical. The only optimisation would be to return COUNT() and the result in the same query, easy to do in SQL with "OVER()". But mapping the Foo.class on a view and adding a transient column to contain the count will complicate a lot of other parts of the application, and mapping the result of paginated queries on a new CountedFoo.class will also complicate the solution.

JOOQ multiple select count in one connection with PostgreSQL

I have a table SUBSCRIPTION, I want to run multiple selectCount written with JOOQ in one connection with different predicates to the database.
To do so, I have created a list of queries:
List<Query> countQueries = channels.stream().map(c ->
selectCount().from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.equal(c))
).collect(toList());
And finally, I have launched this list of queries using batch:
using(configuration).batch(countQueries).execute();
I have expected to have the results of the above queries in the return values of execute, but I get an array of integer filled with 0 values.
Is this the right way to run multiple selectCount using JOOQ?
What is the signification of the integer array returned by the execute method?
I have checked this link, in the JOOQ blog, talking about "How to Calculate Multiple Aggregate Functions in a Single Query", but It's just about SQL queries, no JOOQ dialects.

Comments on your assumptions
I have expected to have the results of the above queries in the return values of execute, but I get an array of integer filled with 0 values.
The batch() API can only be used for DML queries (INSERT, UPDATE, DELETE), just like with native JDBC. I mean, you can run the queries as a batch, but you cannot fetch the results this way.
I have checked this link, in the JOOQ blog, talking about "How to Calculate Multiple Aggregate Functions in a Single Query", but It's just about SQL queries, no JOOQ dialects.
Plain SQL queries almost always translate quite literally to jOOQ, so you can apply the technique from that article also in your case. In fact, you should! Running so many queries is definitely not a good idea.
Translating that linked query to jOOQ
So, let's look at how to translate that plain SQL example from the link to your case:
Record record =
ctx.select(
channels.stream()
.map(c -> count().filterWhere(CHANNEL.CODE.equal(c)).as(c))
.collect(toList())
)
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels)) // Not strictly necessary, but might speed up things
.fetch();
This will produce a single record containing all the count values.
As always, this is assuming the following static import
import static org.jooq.impl.DSL.*;
Using classic GROUP BY
Of course, you can also just use a classic GROUP BY in your particular case. This might even be a bit faster:
Result<?> result =
ctx.select(CHANNEL.CODE, count())
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels)) // This time, you need to filter
.groupBy(CHANNEL.CODE)
.fetchOne();
This now produces a table with one count value per code. Alternatively, fetch this into a Map<String, Integer>:
Map<String, Integer> map =
ctx.select(CHANNEL.CODE, count())
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels))
.groupBy(CHANNEL.CODE)
.fetchMap(CHANNEL.CODE, count());

Runtime SQL Query Builder

My question is similar to
Is there any good dynamic SQL builder library in Java?
However one important point taken from above thread:
Querydsl and jOOQ seem to be the most popular and mature choices however there's one thing to be aware of: Both rely on the concept of code generation, where meta classes are generated for database tables and fields. This facilitates a nice, clean DSL but it faces a problem when trying to create queries for databases that are only known at runtime.
Is there any way to create the queries at runtime besides just using plain JDBC + String concatenation?
What I'm looking for is a web application that can be used to build forms to query existing databases. Now if something like that already exists links to such a product would be welcome too.

While source code generation for database meta data certainly adds much value to using jOOQ, it is not a prerequisite. Many jOOQ users use jOOQ for the same use-case that you envision. This is also reflected in the jOOQ tutorials, which list using jOOQ without code generation as a perfectly valid use-case. For example:
String sql = create.select(
fieldByName("BOOK","TITLE"),
fieldByName("AUTHOR","FIRST_NAME"),
fieldByName("AUTHOR","LAST_NAME"))
.from(tableByName("BOOK"))
.join(tableByName("AUTHOR"))
.on(fieldByName("BOOK", "AUTHOR_ID").eq(
fieldByName("AUTHOR", "ID")))
.where(fieldByName("BOOK", "PUBLISHED_IN").eq(1948))
.getSQL();
In a similar fashion, bind values can be extracted from any Query using Query.getBindValues().
This approach will still beat plain JDBC + String concatenation for dynamic SQL statements, as you do not need to worry about:
Syntax correctness
Cross-database compatibility
SQL Injection
Bind variable indexing
(Disclaimer: I work for the vendor of jOOQ)

SQLBuilder http://openhms.sourceforge.net/sqlbuilder/ is very useful for me.
Some simple examples:
String query1 = new InsertQuery("table1")
.addCustomColumn("s01", "12")
.addCustomColumn("stolbez", 19)
.addCustomColumn("FIRSTNAME", "Alexander")
.addCustomColumn("LASTNAME", "Ivanov")
.toString();
String query2 = new UpdateQuery("table2")
.addCustomSetClause("id", 1)
.addCustomSetClause("FIRSTNAME", "Alexander")
.addCustomSetClause("LASTNAME", "Ivanov")
.toString();
Results:
INSERT INTO table1 (s01,stolbez,FIRSTNAME,LASTNAME) VALUES ('12',19,'Alexander','Ivanov')
UPDATE table2 SET id = 1,FIRSTNAME = 'Alexander',LASTNAME = 'Ivanov'

I have a custom solution for dynamically generating such SQL queries with just 2-3 classes for similar requirement. It is a simple approch.
This can be referred at Creating Dynamic SQL queries in Java
For simpler use cases like a dynamic filter condition based on the inputs selected from UI, one can use the below simpler approach by directly modifying the query in below style:
select t1.id, t1.col1, t1.col2,
from table1 t1
where (:col1Value is null or t1.col1 = :col1Value)
and (:col2Value is null or t1.col2 = :col2Value);
Here values for col1 or col2 can be null but the query will work fine.

Sort the values by Date in mongodb

I am new to mongodb and I am trying to sort all my rows by date. I have records from mixed sources and I trying to sort it separately. I didn't update the dateCreated while writing into db for some records. Later I found and I added dateCreated to all my records in the db. Say I have total of 4000 records, first 1000 I don't have dateCreated. Latest 3000 has that column. Here I am trying to get the last Updated record using dateCreated column. Here is my code.
db.person.find({"source":"Naukri"}&{dateCreated:{$exists:true}}).sort({dateCreated: 1}).limit(10)
This code retruns me some results (from that 1000 records) where I can't see that dateCreated column at all. Moreover if I change (-1) here {dateCreated: -1} I am getting results from some other source, but not Naukri.
So I need help this cases,
How do I sort by dateCreated to get the latest updated record and by sources also.
I am using Java API to get the records from Mongo. I'd be grateful if someone helps me to find how I will use the same query with java also.
Hope my question is clear. Thanks in advance.

From the documentation you will (and you will, won't you - nod yes) read, you will find that the first argument to the find command you are using is what is called a query document. In this document you are specifying a list of fields and conditions, "comma" separated, which is the equivalent of an and condition in declarative syntax such as SQL.
The problem with your query is it was not valid, and did not match anything. The correct syntax would be as follows:
db.person.find({"source":"Naukri", dateCreated:{$exists:true}})
.sort({dateCreated: -1})
.limit(10)
So now this will filter by the value provided for "source" and where the "dateCreated" field exists, meaning it is there and it contains something.
I recommend looking at the links below, the first of the two concerned with structuring mongoDB queries and the find method and it's arguments. All of the functionality translates to every language implementation.
As for the Java API and how to use, there are different methods depending on which you are comfortable with. The API provides a BasicDBObject class which is more or less equivalent to the JSON document notation, and is sort of a hashmap concept. For something a bit more along the lines of the shell methods and a helper to be a little more like some of the dynamic languages approach, there is the QueryBuilder class which the last two links give example and information on. These allow chaining to make your query more readable.
There are many examples on Stack Overflow alone. I suggest you take a look.
http://docs.mongodb.org/manual/tutorial/query-documents/
http://docs.mongodb.org/manual/reference/method/db.collection.find/
How to do this MongoDB query using java?
http://api.mongodb.org/java/2.2/com/mongodb/QueryBuilder.html

Your query is not correct.Update it as follows :
db.person.find({"source":"Naukri", dateCreated:{$exists:true}}).sort({dateCreated: 1}).limit(10)
In Java, you can do it as follows :
Mongo mongo = ...
DB db = mongo.getDB("yourDbName");
DBCollection coll = db.getCollection("person");
DBObject query = new BasicDBObject();
query.put("source", "Naukri");
query.put("dateCreated", new BasicDBObject($exists : true));
DBCursor cur = coll.find(query).sort(new BasicDBObject("dateCreated", 1)).limit(10);
while(cur.hasNext()) {
DBObject obj = cur.next();
// Get data from the result object here
}

HibernateSearch query

I am new to hibernate Search and i find difficulty in forming Hibernateserach query.
I need to use IN opeartor to the List the String in Hibernate query .
Can anybody help me to sort out this issue.
My current query look like this
String querystring="country:"+profile.getCountry()+" AND religion:"+profile.getReligion()+" AND caste:"+profile.getCaste()+" AND gender:"+profile.getGender()+"AND profession : "+professions+" AND age:["+profile.getFromage()+" TO "+profile.getToage()+"]" ;
here is professions is a list of string.
Regards,
Arun

There is no IN operator in Lucene query language. You will have to expand the string yourself. An alternative for using the query parser would be to use a Lucene BooleanQuery and add the different parts of your query to it, for example a RangeQuery etc. Effectively the QueryParser creates under the hood this lower level queries for you. Have a look at the Lucene API and the different sub classes of org.apache.lucene.search.Query. You still have to expand the collection string yourself though.
Last but not least, you could use the Hibernate Search query DSL. Have a look at the online docs of Hibernate Search if you want to know more - http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#search-query-querydsl

You need to add the following clauses to your query SELECT, FROM, and WHERE. Also the conditions are missing parts. For example here is a valid query. "SELECT e from Employee where e.country = :country and e.religion = :religion"...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse sql query using antlr parsetree to mongo bson document in Java - java

Related

Optimize JPA dinamyc count query

JOOQ multiple select count in one connection with PostgreSQL

Runtime SQL Query Builder

Sort the values by Date in mongodb

HibernateSearch query

Categories

Resources