Add limit and offset to query that was created from a String

Add limit and offset to query that was created from a String - java

I have query as String like
select name from employee
and want to limit the number of rows with limit and offset.
Is this possible with jOOQ and how do I do that?
Something like:
dsl.fetch("select name from employee").limit(10).offset(10);

Yes you're close, but you cannot use fetch(sql), because that eagerly executes the query and it will be too late to append LIMIT and OFFSET. I generally don't recommend the approach offered by Sergei Petunin, because that way, you will tell the RDBMS less information about what you're going to do. The execution plan and resource allocations are likely going to be better if you actually use LIMIT and OFFSET.
There are two ways to do what you want to achieve:
Use the parser
You can use DSLContext.parser() to parse your SQL query and then modify the resulting SelectQuery, or create a derived table from that. Creating a derived table is probably a bit cleaner:
dsl.selectFrom(dsl.parser().parse("select name from employee"))
.limit(10)
.offset(10)
.fetch();
The drawback is that the parser will have to understand your SQL string. Some vendor specific features will no longer be available.
The advantage (starting from jOOQ 3.13) is that you will be able to provide your generated code with attached converters and data type bindings this way, as jOOQ will "know" what the columns are.
Use plain SQL
You were already using plain SQL, but the wrong way. Instead of fetching the data eagerly, just wrap your query in DSL.table() and then use the same approach as above.
When using plain SQL, you will have to make sure manually, that the resulting SQL is syntactically correct. This includes wrapping your query in parentheses, and possibly aliasing it, depending on the dialect you're using:
dsl.selectFrom(table("(select name from employee)").as("t"))
.limit(10)
.offset(10)
.fetch();

The best thing you can do with a string query is to create a ResultQuery from it. It allows you to limit the maximum amount of rows fetched by the underlying java.sql.Statement:
create.resultQuery("select name from employee").maxRows(10).fetch();
or to fetch lazily and then scroll through the cursor:
create.resultQuery("select name from employee").fetchLazy().fetch(10);
Adding an offset or a limit to a query is only possible using a SelectQuery, but I don't think there's any way to transform a string query to a SelectQuery in JOOQ.
Actually, if you store SQL queries as strings in the database, then you are already in a non-typesafe area, and might as well append OFFSET x LIMIT y directly to a string-based query. Depending on the complexity of your queries, it might work.

Related

How to use SUM inside COALESCE in JOOQ

Given below is a gist of the query, which I'm able to run successfully in MySQL
SELECT a.*,
COALESCE(SUM(condition1 or condition2), 0) as countColumn
FROM table a
-- left joins with multiple tables
GROUP BY a.id;
Now, I'm trying to use it with JOOQ.
ctx.select(a.asterisk(),
coalesce(sum("How to get this ?")).as("columnCount"))
.from(a)
.leftJoin(b).on(someCondition)
.leftJoin(c).on(someCondition))
.leftJoin(d).on(someCondition)
.leftJoin(e).on(someCondition)
.groupBy(a.ID);
I'm having a hard time preparing the coalesce() part, and would really appreciate some help.

jOOQ's API is more strict about the distinction between Condition and Field<Boolean>, which means you cannot simply treat booleans as numbers as you can in MySQL. It's usually not a bad idea to be explicit about data types to prevent edge cases, so this strictness isn't necessarly a bad thing.
So, you can transform your booleans to integers as follows:
coalesce(
sum(
when(condition1.or(condition2), inline(1))
.else_(inline(0))
),
inline(0)
)
But even better than that, why not use a standard SQL FILTER clause, which can be emulated in MySQL using a COUNT(CASE ...) aggregate function:
count().filterWhere(condition1.or(condition2))

Differences between using sql IN() with subselect and code-generated string

Imagine we have an sql such as
SELECT something FROM TableName WHERE something NOT IN (SELECT ...);
And result size of second SELECT is a huge.
So what if I change second SELECT by generated string value such as
"a1, a2, a3, ... an", where is n - is a really big number. Will I get an error that sql query size is too large? Is this size limited? Is this size different for result of second SELECT and generated string?

This completely depends on your database engine/server. You can play with database specific settings to overcome (or) at least extend some of these limits.
But overall I think you should look for solutions liks "Join" instead of subqueries. There are some advantages with that approach.

The Performance Consequences of Parameterizing Constants in PreparedStatement Queries

When using JDBC's PreparedStatements to query Oracle, consider this:
String qry1 = "SELECT col1 FROM table1 WHERE rownum=? AND col2=?";
String qry2 = "SELECT col1 FROM table1 WHERE rownum=1 AND col2=?";
String qry3 = "SELECT col1 FROM table1 WHERE rownum=1 AND col2=" + someVariable ;
The logic dictates that the value of rownum is always a constant (1 in this example). While the value of col2 is a changing variable.
Question 1: Are there any Oracle server performance advantages (query compilation, caching, etc.) to using qry1 where rownum value is parameterized, over qry2 where rownum's constant value is hardcoded?
Question 2: Ignoring non-performance considerations (such as SQL Injections, readability, etc.), are there any Oracle server performance advantages (query compilation, caching, etc.) to using qry2 over qry3 (in which the value of col2 is explicitly appended, not parameterized).

Answer 1: There are no performance advantages to using qry1 (a softcoded query) over qry2 (a query with reasonable bind variables).
Bind variables improve performance by reducing query parsing; if the bind variable is a constant there is no extra parsing to avoid.
(There are probably some weird examples where adding extra bind variables improves the performance of one specific query. Like with any forecasting program, occasionally if you feed bad information to the Oracle optimizer the result will be better. But it's important to understand that those are exceptional cases.)
Answer 2: There are many performance advantages to using qry2 (a query with reasonable bind variables) over qry3 (a hardcoded query).
Bind variables allow Oracle re-use a lot of the work that goes into query parsing (query compilation). For example, for each query Oracle needs to check that the user has access to view the relevant tables. With bind variables that work only needs to be done once for all executions of the query.
Bind variables also allow Oracle to use some extra optimization tricks that only occur after the Nth run. For example, Oracle can use cardinality feedback to improve the second execution of a query. When Oracle makes a mistake in a plan, for example if it estimates a join will produce 1 row when it really produces 1 million, it can sometimes record that mistake and use that information to improve the next run. Without bind variables the next run will be different and it won't be able to fix that
mistake.
Bind variables also allow for many different plan management features. Sometimes a DBA needs to change an execution plan without changing the text of the query. Features like SQL plan baselines, profiles, outlines, and DBMS_ADVANCED_REWRITE will not work if the query text is constantly changing.
On the other hand, there are a few reasonable cases where it's better to hard-code the queries. Occasionally an Oracle feature like partition pruning cannot understand the expression and it helps to hardcode the value. For large data warehouse queries the extra time to parse a query may be worth it if the query is going to run for a long time anyway.
(Caching is unlikely to affect either scenario. Result caching of a statement is rare, it's much more likely that Oracle will cache only the blocks of the tables used in the statement. The buffer cache probably does not care if those blocks are accessed by one statement many times or by many statements one time)

Better to query once, then organize objects based on returned column value, or query twice with different conditions?

I have a table which I need to query, then organize the returned objects into two different lists based on a column value. I can either query the table once, retrieving the column by which I would differentiate the objects and arrange them by looping through the result set, or I can query twice with two different conditions and avoid the sorting process. Which method is generally better practice?
MY_TABLE
NAME AGE TYPE
John 25 A
Sarah 30 B
Rick 22 A
Susan 43 B
Either SELECT * FROM MY_TABLE, then sort in code based on returned types, or
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'A' followed by
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'B'

Logically, a DB query from a Java code will be more expensive than a loop within the code because querying the DB involves several steps such as connecting to DB, creating the SQL query, firing the query and getting the results back.
Besides, something can go wrong between firing the first and second query.
With an optimized single query and looping with the code, you can save a lot of time than firing two queries.
In your case, you can sort in the query itself if it helps:
SELECT * FROM MY_TABLE ORDER BY TYPE
In future if there are more types added to your table, you need not fire an additional query to retrieve it.

It is heavily dependant on the context. If each list is really huge, I would let the database to the hard part of the job with 2 queries. At the opposite, in a web application using a farm of application servers and a central database I would use one single query.
For the general use case, IMHO, I will save database resource because it is a current point of congestion and use only only query.
The only objective argument I can find is that the splitting of the list occurs in memory with a hyper simple algorithm and in a single JVM, where each query requires a bit of initialization and may involve disk access or loading of index pages.

In general, one query performs better.
Also, with issuing two queries you can potentially get inconsistent results (which may be fixed with higher transaction isolation level though ).
In any case I believe you still need to iterate through resultset (either directly or by using framework's methods that return collections).

From the database point of view, you optimally have exactly one statement that fetches exactly everything you need and nothing else. Therefore, your first option is better. But don't generalize that answer in way that makes you query more data than needed. It's a common mistake for beginners to select all rows from a table (no where clause) and do the filtering in code instead of letting the database do its job.

It also depends on your dataset volume, for instance if you have a large data set, doing a select * without any condition might take some time, but if you have an index on your 'TYPE' column, then adding a where clause will reduce the time taken to execute the query. If you are dealing with a small data set, then doing a select * followed with your logic in the java code is a better approach

There are four main bottlenecks involved in querying a database.
The query itself - how long the query takes to execute on the server depends on indexes, table sizes etc.
The data volume of the results - there could be hundreds of columns or huge fields and all this data must be serialised and transported across the network to your client.
The processing of the data - java must walk the query results gathering the data it wants.
Maintaining the query - it takes manpower to maintain queries, simple ones cost little but complex ones can be a nightmare.
By careful consideration it should be possible to work out a balance between all four of these factors - it is unlikely that you will get the right answer without doing so.

You can query by two conditions:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B'
This will do both for you at once, and if you want them sorted, you could do the same, but just add an order by keyword:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B' ORDER BY TYPE ASC
This will sort the results by type, in ascending order.
EDIT:
I didn't notice that originally you wanted two different lists. In that case, you could just do this query, and then find the index where the type changes from 'A' to 'B' and copy the data into two arrays.

Blacklist filtering data for SQL Keywords

I am trying to validate data before inserting them into the database(POSTGRESQL). The data corresponding to email, zip code etc are easily validated with the use of Apache Commons Validator. But in case of names I used this:
^[a-zA-Z][ a-zA-Z]{1-30}$
This prevents any special characters from being added as name, but it fails to prevent users from adding DROP or GRANT as a name. As I am using PreparedStatement, I didn't think it was going to be a problem but it is now required that SQL keywords shouldn't go in the db as it may lead to a Second Order SQL Injection.
I thought of using blacklisting all SQL keywords (surely, this will prevent Huge Grant from logging into our site. :P) but it seems that there are >64 keywords. Is this (Blacklist filtering data for SQL Keywords) a proper approach for preventing Second Order SQL Injection? What are my options?
I am using this code:
String sql="INSERT INTO users (username, password, name) VALUES (?,?,?);";
try{
conn=SomeStaticClass.createConnection();
ps=conn.prepareStatement(sql);
ps.setString(1, dataBean.getUsername());
ps.setString(2, dataBean.getPassword());
ps.setString(3, dataBean.getName());
ps.execute();
}catch(SQLException e){
e.printStackTrace()
}catch(Exception e){
e.printStackTrace();
}finally{
try{
if(ps!=null){
ps.close();
}
conn.close();
}catch(SQLException e){
e.printStackTrace();
}
}

Is this a proper approach for this kind of a situation?
No.
SQL injection happens when you assemble an SQL queries by concatenating Strings.
The "best practice" approach to preventing SQL injection is to use a PreparedStatement with constant SQL queries that have placeholders for the parameters. Then you use the prepared statement set methods to set values for each of the placeholder parameters. This approach will guarantee that any "nasty" string parameters containing SQL keywords will be interpreted as literal strings.
UPDATE - Using PreparedStatements consistently should protect against second order attacks too ... assuming that you are referring to something like this:
http://download.oracle.com/oll/tutorials/SQLInjection/html/lesson1/les01_tm_attacks2.htm
You just need to make sure that you don't build the SQL query string from anything that could possibly be tainted. Provided you handle any potentially tainted data using placeholders, it doesn't matter where it came from.
(Black listing SQL keywords will help to keep garbage out of your database. But as you mentioned, it can potentially cause damage to legitimate data and impact on your system's usability. I wouldn't do it. It would be better to rely on good programmer discipline ... and thorough code reviews.)

Second order injection only occurs if you store the keywords in the database and then later use them in an unsafe manner. If you use prepared statements and they are properly parameterized it won't occur. Cisco have a good summary of understanding SQL injection:
http://www.cisco.com/web/about/security/intelligence/sql_injection.html
Apart from your example of "Grant" there are also many such as IF, BY, IS, IN, TO that will occur very commonly in English language / names.

It is extreamly difficult, if not impossible, to ensure that all data in your database can be used with any script language (like SQL or HTML) without proper escaping in the future. It is impossible to distinguish between "safe" and "unsafe" characters before you now how the characters are going to be used anyway.
Trying to escape and clean all data before they are inserted into the database may lead you to belive that user-generated data in the database is "safe", which is a very dangerous belief. You can only know if the data is safe when you know how they are going to be used, and you will only know that when you actually use the data (since data in a database can live for a very long time).
The best strategy for avoiding this kind of problems is to always escape all data when you actually use them, either by using PreparedStatement like you do, properly escaping them when you use them in html, escaping them when you insert them into an email etc. etc.
I gave some examples in this answer:
How to allow specific characters with OWASP HTML Sanitizer?

Along with using PreparedStatement, you must check your input provided by user, on your webpages.
So now you have 2 different checks.
1. On your web pages, which will reduce processing time.
2. If something passes your initial check then preparedstatement will make sure your query is parsed properly.
E.g User is searching some item..
User input is
' OR ITEM in (Select ITEM from SOME_TABLE) OR ITEM = ''
And you are building your SQL, by concatenating the strings, then it will make SQL command as
Select * from TABLE_X WHERE ITEM = '' OR ITEM in (Select ITEM from SOME_TABLE) OR ITEM = ''
So your DATABASE is hacked, but in other case PreparedStatement it will parse your query and would not let make user to modify the SQL...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.