Matcher.quoteReplacement versus preparedstatement

Matcher.quoteReplacement versus preparedstatement - java

In my project I have to create insert statements based on the tables passed.
So I have two options
1)Write preparedstatement and run it in batch
2)Create insert into table values(..),(..),(..)
I want to know the reasons to prefer (1) over (2) if I use Matcher.quoteReplacement() to escape the values
Thanks in advance

Matcher.quoteReplacement isn't intended to quote SQL strings, so you can't be certain to avoid SQL insertions using it. But even if you had a working quoting function, prepared statements would still be better for several reasons:
You don't need to worry about forgetting to quote input
You're not tempted to take a shortcut and not quote values you "know" are safe
The database can cache the execution plan to avoid parsing similar SQL queries with different parameters, giving a performance boost
The code gets more readable (IMHO)

Related

Add limit and offset to query that was created from a String

I have query as String like
select name from employee
and want to limit the number of rows with limit and offset.
Is this possible with jOOQ and how do I do that?
Something like:
dsl.fetch("select name from employee").limit(10).offset(10);

Yes you're close, but you cannot use fetch(sql), because that eagerly executes the query and it will be too late to append LIMIT and OFFSET. I generally don't recommend the approach offered by Sergei Petunin, because that way, you will tell the RDBMS less information about what you're going to do. The execution plan and resource allocations are likely going to be better if you actually use LIMIT and OFFSET.
There are two ways to do what you want to achieve:
Use the parser
You can use DSLContext.parser() to parse your SQL query and then modify the resulting SelectQuery, or create a derived table from that. Creating a derived table is probably a bit cleaner:
dsl.selectFrom(dsl.parser().parse("select name from employee"))
.limit(10)
.offset(10)
.fetch();
The drawback is that the parser will have to understand your SQL string. Some vendor specific features will no longer be available.
The advantage (starting from jOOQ 3.13) is that you will be able to provide your generated code with attached converters and data type bindings this way, as jOOQ will "know" what the columns are.
Use plain SQL
You were already using plain SQL, but the wrong way. Instead of fetching the data eagerly, just wrap your query in DSL.table() and then use the same approach as above.
When using plain SQL, you will have to make sure manually, that the resulting SQL is syntactically correct. This includes wrapping your query in parentheses, and possibly aliasing it, depending on the dialect you're using:
dsl.selectFrom(table("(select name from employee)").as("t"))
.limit(10)
.offset(10)
.fetch();

The best thing you can do with a string query is to create a ResultQuery from it. It allows you to limit the maximum amount of rows fetched by the underlying java.sql.Statement:
create.resultQuery("select name from employee").maxRows(10).fetch();
or to fetch lazily and then scroll through the cursor:
create.resultQuery("select name from employee").fetchLazy().fetch(10);
Adding an offset or a limit to a query is only possible using a SelectQuery, but I don't think there's any way to transform a string query to a SelectQuery in JOOQ.
Actually, if you store SQL queries as strings in the database, then you are already in a non-typesafe area, and might as well append OFFSET x LIMIT y directly to a string-based query. Depending on the complexity of your queries, it might work.

The Performance Consequences of Parameterizing Constants in PreparedStatement Queries

When using JDBC's PreparedStatements to query Oracle, consider this:
String qry1 = "SELECT col1 FROM table1 WHERE rownum=? AND col2=?";
String qry2 = "SELECT col1 FROM table1 WHERE rownum=1 AND col2=?";
String qry3 = "SELECT col1 FROM table1 WHERE rownum=1 AND col2=" + someVariable ;
The logic dictates that the value of rownum is always a constant (1 in this example). While the value of col2 is a changing variable.
Question 1: Are there any Oracle server performance advantages (query compilation, caching, etc.) to using qry1 where rownum value is parameterized, over qry2 where rownum's constant value is hardcoded?
Question 2: Ignoring non-performance considerations (such as SQL Injections, readability, etc.), are there any Oracle server performance advantages (query compilation, caching, etc.) to using qry2 over qry3 (in which the value of col2 is explicitly appended, not parameterized).

Answer 1: There are no performance advantages to using qry1 (a softcoded query) over qry2 (a query with reasonable bind variables).
Bind variables improve performance by reducing query parsing; if the bind variable is a constant there is no extra parsing to avoid.
(There are probably some weird examples where adding extra bind variables improves the performance of one specific query. Like with any forecasting program, occasionally if you feed bad information to the Oracle optimizer the result will be better. But it's important to understand that those are exceptional cases.)
Answer 2: There are many performance advantages to using qry2 (a query with reasonable bind variables) over qry3 (a hardcoded query).
Bind variables allow Oracle re-use a lot of the work that goes into query parsing (query compilation). For example, for each query Oracle needs to check that the user has access to view the relevant tables. With bind variables that work only needs to be done once for all executions of the query.
Bind variables also allow Oracle to use some extra optimization tricks that only occur after the Nth run. For example, Oracle can use cardinality feedback to improve the second execution of a query. When Oracle makes a mistake in a plan, for example if it estimates a join will produce 1 row when it really produces 1 million, it can sometimes record that mistake and use that information to improve the next run. Without bind variables the next run will be different and it won't be able to fix that
mistake.
Bind variables also allow for many different plan management features. Sometimes a DBA needs to change an execution plan without changing the text of the query. Features like SQL plan baselines, profiles, outlines, and DBMS_ADVANCED_REWRITE will not work if the query text is constantly changing.
On the other hand, there are a few reasonable cases where it's better to hard-code the queries. Occasionally an Oracle feature like partition pruning cannot understand the expression and it helps to hardcode the value. For large data warehouse queries the extra time to parse a query may be worth it if the query is going to run for a long time anyway.
(Caching is unlikely to affect either scenario. Result caching of a statement is rare, it's much more likely that Oracle will cache only the blocks of the tables used in the statement. The buffer cache probably does not care if those blocks are accessed by one statement many times or by many statements one time)

Oracle JDBC PreparedStatement Ignore Trailing Spaces

I am currently writing a Java web application which interfaces with an Oracle database. I am using PreparedStatements because Hibernate would complicate things too much.
Due to a bug in the program which is writing to the database, the field I need to search for has trailing spaces written to the value. I have surrounded the value with quotation marks to demonstrate the whitespace.
"testFTP_receipt521 "
When I do a select query with SQLDeveloper, I am able to get a result when I run:
...and yfs_organization.ORGANIZATION_KEY='testFTP_receipt521';
(no whitespace)
However, when I use a PreparedStatement, I get no results when I try:
...and yfs_organization.ORGANIZATION_KEY=?");
preparedStatement.setString(1, "testFTP_receipt521");
(no whitespace)
and when I try:
...and yfs_organization.ORGANIZATION_KEY=?");
preparedStatement.setString(1, "testFTP_receipt521 ");
(with whitespace)
Are there any ways that I can query for this result with a PreparedStatement, or should I try another approach?
Thanks for all your help.

Due to a bug in the program which is writing to the database, the field I need to search for has trailing spaces
Maybe, given the circumstances, and if your version of Oracle is recent enough, you might consider adding a virtual column to your table containing the correct value?
ALTER TABLE yfs_organization ADD (
ORGANIZATION_KEY_FIXED VARCHAR(80)
GENERATED ALWAYS AS (TRIM(ORGANIZATION_KEY)) VIRTUAL
);
Then in your code, the only change will be to use the ORGANIZATION_KEY_FIXED to query the DB:
SELECT ID,ORGANIZATION_KEY_FIXED
FROM yfs_organization
WHERE ORGANIZATION_KEY_FIXED='testFTP_receipt521'
(try it on http://sqlfiddle.com/#!4/8251d/1)
This might avoid to scatter around your application the code required to work around that bug. And might ease the transition once it will be fixed.
As an added benefice, you could add index on virtual columns if you need too.

Maybe you can use it like this...
...and yfs_organization.ORGANIZATION_KEY like '%testFTP_receipt521%';
this way returns you all reg where contains 'testFTP_receipt521' independently of whitespace.
Antoher thing that i saw in your code in this part
...and yfs_organization.ORGANIZATION_KEY=?");
preparedStatement.setString(1, "testFTP_receipt521");
i thing this is the correct way
...and yfs_organization.ORGANIZATION_KEY='?'");
you need to put quotes around the criteria

If you have the ability to modify the query, you can TRIM(...) the column value and perform the comparison. For example:
...and TRIM(yfs_organization.ORGANIZATION_KEY)=?");
Hope it helps.

JDBC - Best way for perfomance of a preparedstatement with optional criteria

I try to improve the performance of my application. For this, I replace the current statement by prepared-statement to use the oracle cash.
But several of my queries are dynamic (conditional criteria and tables) and I would like to know the best solution to reduce the execution time.
In stackoverflow, I found two solutions.
The first is to build query with if statement :
if(a.equals("Test")) {
whereClause.append ("COLUMN1 = ? ");
}
if(b.equals("Test2")) {
whereClause.append ("COLUMN2 = ? ");
} etc...
The second is to create one query with all criteria but to add "OR var is null"
where (? is null OR (COLUMN1 = ?)) AND (? is null OR (COLUMN2 = ?)) etc.
In your opinion what is the best way In your to performance knowing that a request may contain 40/50 criteria and some tables can be optional too. Or is there another solution.
The database that use is oracle with datasource and jdk 1.5.
Thank you in advance for your help.

First JDBC and Oracle are capable of handling and caching multiple prepared statements.
First approach:
if(a.equals("Test")) {
whereClause.append ("COLUMN1 = ? ");
}
if(b.equals("Test2")) {
whereClause.append ("COLUMN2 = ? ");
}
will certainly lead to more different statements and thous to more prepared statements, but each of this statements is optimizable at preparation-time as no variable is optional.
Second approach:
where (? is null OR (COLUMN1 = ?)) AND (? is null OR (COLUMN2 = ?)) etc.
will result in exactly one prepared statement, but during prepare the query-plan needs to include the possibility that all parameters are non-null.
So it really becomes a matter of benchmarking and how many optional parameters there are. If you have too many prepared statements one might not be cached (but it might also be, that it is so seldomly used, that this does not disturb overall performance).
I also don't know how much overhead the optional parameters in the second approach create even if they are evaluated for each line as the optimizer could not remove them.
Finally if you have indexes prepared that cover some combinations of your optional parameters the first approach should be faster as during prepare the possible indexes are evaluated and the best is choosen (query-plan).

The biggest wins in prepared statement is the query-plan (optimization) being done ahead of time. I, to be honest, doubt the optimization will be very big in case you have 20 criteria like that and only a few are actually used. Why don't you test it and see which is quickest?
Also, I think making a stored procedure out of this could be very advantageous.

I would use PreparedStatement with dynamic query without any additional logic. If a query repeats JDBC will find previously executed cached statement.

I agree with Evgeniy, PreparedStatements are the way to go. If you have many parameters, the upkeep is easier if you use named parameters instead of question marks. See: http://www.javaworld.com/article/2077706/core-java/named-parameters-for-preparedstatement.html. I have used this with about 20 criteria.

Blacklist filtering data for SQL Keywords

I am trying to validate data before inserting them into the database(POSTGRESQL). The data corresponding to email, zip code etc are easily validated with the use of Apache Commons Validator. But in case of names I used this:
^[a-zA-Z][ a-zA-Z]{1-30}$
This prevents any special characters from being added as name, but it fails to prevent users from adding DROP or GRANT as a name. As I am using PreparedStatement, I didn't think it was going to be a problem but it is now required that SQL keywords shouldn't go in the db as it may lead to a Second Order SQL Injection.
I thought of using blacklisting all SQL keywords (surely, this will prevent Huge Grant from logging into our site. :P) but it seems that there are >64 keywords. Is this (Blacklist filtering data for SQL Keywords) a proper approach for preventing Second Order SQL Injection? What are my options?
I am using this code:
String sql="INSERT INTO users (username, password, name) VALUES (?,?,?);";
try{
conn=SomeStaticClass.createConnection();
ps=conn.prepareStatement(sql);
ps.setString(1, dataBean.getUsername());
ps.setString(2, dataBean.getPassword());
ps.setString(3, dataBean.getName());
ps.execute();
}catch(SQLException e){
e.printStackTrace()
}catch(Exception e){
e.printStackTrace();
}finally{
try{
if(ps!=null){
ps.close();
}
conn.close();
}catch(SQLException e){
e.printStackTrace();
}
}

Is this a proper approach for this kind of a situation?
No.
SQL injection happens when you assemble an SQL queries by concatenating Strings.
The "best practice" approach to preventing SQL injection is to use a PreparedStatement with constant SQL queries that have placeholders for the parameters. Then you use the prepared statement set methods to set values for each of the placeholder parameters. This approach will guarantee that any "nasty" string parameters containing SQL keywords will be interpreted as literal strings.
UPDATE - Using PreparedStatements consistently should protect against second order attacks too ... assuming that you are referring to something like this:
http://download.oracle.com/oll/tutorials/SQLInjection/html/lesson1/les01_tm_attacks2.htm
You just need to make sure that you don't build the SQL query string from anything that could possibly be tainted. Provided you handle any potentially tainted data using placeholders, it doesn't matter where it came from.
(Black listing SQL keywords will help to keep garbage out of your database. But as you mentioned, it can potentially cause damage to legitimate data and impact on your system's usability. I wouldn't do it. It would be better to rely on good programmer discipline ... and thorough code reviews.)

Second order injection only occurs if you store the keywords in the database and then later use them in an unsafe manner. If you use prepared statements and they are properly parameterized it won't occur. Cisco have a good summary of understanding SQL injection:
http://www.cisco.com/web/about/security/intelligence/sql_injection.html
Apart from your example of "Grant" there are also many such as IF, BY, IS, IN, TO that will occur very commonly in English language / names.

It is extreamly difficult, if not impossible, to ensure that all data in your database can be used with any script language (like SQL or HTML) without proper escaping in the future. It is impossible to distinguish between "safe" and "unsafe" characters before you now how the characters are going to be used anyway.
Trying to escape and clean all data before they are inserted into the database may lead you to belive that user-generated data in the database is "safe", which is a very dangerous belief. You can only know if the data is safe when you know how they are going to be used, and you will only know that when you actually use the data (since data in a database can live for a very long time).
The best strategy for avoiding this kind of problems is to always escape all data when you actually use them, either by using PreparedStatement like you do, properly escaping them when you use them in html, escaping them when you insert them into an email etc. etc.
I gave some examples in this answer:
How to allow specific characters with OWASP HTML Sanitizer?

Along with using PreparedStatement, you must check your input provided by user, on your webpages.
So now you have 2 different checks.
1. On your web pages, which will reduce processing time.
2. If something passes your initial check then preparedstatement will make sure your query is parsed properly.
E.g User is searching some item..
User input is
' OR ITEM in (Select ITEM from SOME_TABLE) OR ITEM = ''
And you are building your SQL, by concatenating the strings, then it will make SQL command as
Select * from TABLE_X WHERE ITEM = '' OR ITEM in (Select ITEM from SOME_TABLE) OR ITEM = ''
So your DATABASE is hacked, but in other case PreparedStatement it will parse your query and would not let make user to modify the SQL...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.