SQL assert - compare two SQL queries in unit tests

SQL assert - compare two SQL queries in unit tests - java

I am looking for a way to compare two MySQL queries in a unit test. Do you know any library that allows that (all of these asserts should pass):
SQLAssert.assertEquals("select id, name from users", "select id, name from users")
SQLAssert.assertEquals("select id, name from users", "select `id`,`name` from `users`")

While running the queries against an in-memory database and comparing results is the best answer, I think there are less-comprehensive and more brittle options that are nevertheless useful.
In practice it's likely that you can place additional constraints on the syntax of the queries. In your example, there are only select statements, a single table, no where clause, and the only query differences are backticks and spaces, so writing a method that normalizes queries with those constraints would probably be doable. Something like:
private String normalize(String str) {
return str.replaceAll(" +", " ").replaceAll("`", "");
}
These normalized strings can then be compared. This way of doing things is very brittle (and therefore not future proof), but that doesn't mean it can't provide value in certain circumstances. Sure, there are quite a few valid sql statements that would cause this to break, but you don't have to deal with the full set of strings that valid sql entails. You just have to deal with whatever subset of sql your queries use.
If your queries are different enough to make this code unreasonable, it might be easier to use a parser library like JSqlParser to parse out the pieces and then navigate the structure to do your comparison. Again, you don't have to support all of SQL, just whatever subset your queries use. Also, the tests don't have to test full logical equivalence to be useful. A test might just make sure that all the tables mentioned in two queries are the same regardless of joins and ordering. That doesn't make them equivalent, but it does guard against a particular kind of error and is more useful than nothing.
An example of a situation where this could be useful is if you are doing large groups of refactorings on your query-builders and you want to make sure the end queries are equivalent. In this case you aren't testing the queries themselves but the query-building.
I wouldn't suggest doing this as a regular practice in unit tests, but I think it can be useful in very particular circumstances.

You could use JSqlParser to parse your queries. Then you could use the so called Deparser of JSqlParser to get version of your SQL without additional spaces, tabs, linefeeds. From this on, you could use a simple String equality check. Sure you have to process all kinds of quotations like " or [] but it works, like the example code shows.
This does not work, of quotation comes into play or different orders of columns or expression within you SQL. The quotation problem is simple to solve through an extention to the expression deparser.
Statement stmt1 = CCJSqlParserUtil.parse("select id, name from users");
Statement stmt2 = CCJSqlParserUtil.parse("select id, name from users");
Statement stmt3 = CCJSqlParserUtil.parse("select `id`,`name` from `users`");
//Equality
System.out.println(stmt1.toString().equals(stmt2.toString()));
ExpressionDeParser exprDep = new ExpressionDeParser() {
#Override
public void visit(Column tableColumn) {
tableColumn.setColumnName(tableColumn.getColumnName().replace("`", ""));
super.visit(tableColumn);
}
};
SelectDeParser stmtDep = new SelectDeParser() {
#Override
public void visit(Table tableName) {
tableName.setName(tableName.getName().replace("`", ""));
super.visit(tableName);
}
};
exprDep.setBuffer(stmtDep.getBuffer());
stmtDep.setExpressionVisitor(exprDep);
((Select)stmt3).getSelectBody().accept(stmtDep);
String stmt3Txt = stmtDep.getBuffer().toString();
System.out.println(stmt1.toString().equals(stmt3Txt));

I would suggest that the only way to assert that 2 queries return the same result is to actually run them. Of course, what you don't want to do is have unit tests connect to a real database. There are a number of reasons for this:
The tests can affect what is in the database, and you don't want to introduce a load of test data into a production database
Each test should be self contained, and work the same way each time it is run, which requires the database be in the same known state at the start of each run. This requires a reset for each test - not something to do with a production (or dev environment) database.
With these constraints in mind, I suggest you look into DBUnit, which is designed for database-driven JUnit tests. I also suggest, instead of using MySQL for unit tests, use an in-memory database (the examples use HSQLDB), that way, you can test queries without having test data actually persisted.

Related

How can one see the SQL statements that jOOQ executes at Compile Time?

I use jOOQ to query/insert/update data from/into a table.
Is there a way to see the SQL statements that JOOQ executes at Compile Time instead of Run Time Logging?
The following answer shows them at run time. How can one see the SQL statements that jOOQ executes?
This tool only converts various SQL dialects. https://www.jooq.org/translate/

Statically evaluating a jOOQ query
While it might be possible to build some IDE plugins that are capable of evaluating some static-ish jOOQ statements, remember that in principle and by design, every jOOQ query is a dynamic SQL query. When you write something as simple as:
Result<?> r = ctx.select(T.A, T.B).from(T).fetch();
What the JVM sees (roughly) is:
Field<?> a = T.A;
Field<?> b = T.B;
Field<?>[] select = { a, b };
SelectFromStep<?> s1 = ctx.select(select);
Table<?> t = T;
SelectWhereStep<?> s2 = s1.from(t);
Result<?> r = s2.fetch();
Of course, no one is using jOOQ this way. The DSL was designed to produce call chains that look almost like SQL through its fluent API design. So, your query looks like it's static SQL (which could be evaluated in an IDE), but it is not. And you will often use the dynamic SQL capabilities, e.g.
Result<?> r = ctx
.select(T.A, T.B)
.from(T)
// Dynamic where clause
.where(someCondition ? T.A.eq(1) : T.B.gt(2))
.fetch();
There's no way an IDE could evaluate all this, including all of your SPI implementations, such as the ExecuteListener or the VisitListener, so again, even if it worked for some cases, it would work poorly for many others.
You'll have to execute your query to see the actual SQL (for that specific execution). Or, you put a breakpoint on your fetch() call, and evaluate the query object upon which fetch() is called in the debugger.
The underlying, actual problem
Whenever I see this question, I think there's an underlying actual problem that manifests in this desire of running the jOOQ query outside of your Java code. The problem is that your code seems to be hard to integration test.
This can't be fixed easily, but it is a good reminder that when you start from scratch, you make all of your SQL (jOOQ or not) easily integration testable using:
Something like testcontainers
By separating concerns and moving your SQL logic in an appropriate layer that can be easily integration tested independently of any other logic (UI, etc.)
With such an approach, you will be able to test your jOOQ queries in a much better feedback cycle, in case of which you probably won't even think of running the jOOQ query outside of your Java code again, at least most of the time.

Unit test for a large SELECT query with jOOQ

I am using jOOQ for working with a relational database. I have a SELECT query for which I need to write unit tests with mocking. Based on this doc and this post, I need to define my own data provider, which should look something like this:
class MyProvider implements MockDataProvider {
DSLContext create = DSL.using(SQLDialect.MYSQL);
#Override
public MockResult[] execute(MockExecuteContext mockExecuteContext) throws SQLException {
MockResult[] mock = new MockResult[1];
String sql = mockExecuteContext.sql();
if (sql.startsWith("select")) {
Result<Record2<String, String>> result = create.newResult(COL_1, COL_2);
result.add(create.newRecord(COL_1, COL_2)
.values("val1", "val2"));
mock[0] = new MockResult(1, result);
}
return mock;
}
}
where COL_1 and COL_2 are defined as follows:
Field<String> COL_1 = field("Column1", String.class);
Field<String> COL_2 = field("Column2", String.class);
It's quite simple and straightforward when SELECT is a small one (as in the above example, just 2 columns). I am wondering how it should be done in case of complex and large selects. For instance I have a SELECT statement which selects 30+ columns from multiple table joins. Seems the same approach of
Result<Record_X<String, ...>> result = create.newResult(COL_1, ...);
result.add(create.newRecord(COL_1, ...)
.values("val1", ...));
does not work in case of more than 22 columns.
Any help is appreciated.

Answering your question
There is no such limitation as a maximum of 22 columns. As documented here:
Higher-degree records
jOOQ chose to explicitly support degrees up to 22 to match Scala's typesafe tuple, function and product support. Unlike Scala, however, jOOQ also supports higher degrees without the additional typesafety.
You can still construct a record with more than 22 fields using DSLContext.newRecord(Field...). Now, there is no values(Object...) method on the Record type, because the Record type is the super type of all the Record1 - Record22 types. If such an overload were present, then the type safety on the sub types would be lost, because the values(Object...) method is applicable for all types of arguments. This might be fixed in the future by introducing a new RecordN subtype.
But you can load data into your record with other means, e.g. by calling Record.fromArray(Object...):
Record record = create.newRecord(COL_1, ...);
record.fromArray("val1", ...);
result.add(record);
The values() method being mere convenience (adding type safety) for fromArray().
Disclaimer:
I'm assuming you read the disclaimer on the documentation page you've linked. I'm posting it here anyway for other readers of this question, who might not have read the disclaimer:
Disclaimer: The general idea of mocking a JDBC connection with this jOOQ API is to provide quick workarounds, injection points, etc. using a very simple JDBC abstraction. It is NOT RECOMMENDED to emulate an entire database (including complex state transitions, transactions, locking, etc.) using this mock API. Once you have this requirement, please consider using an actual database instead for integration testing, rather than implementing your test database inside of a MockDataProvider.
It seems you're about to re-implement a database which can "run" any type of query, including a query with 23+ columns, and every time you change the query under test, you will also change this test here. I still recommend you do integration testing instead, using testcontainers or even with H2, which will help cover many more queries than any such unit test approach. Here's a quick example showing how to do that: https://github.com/jOOQ/jOOQ/tree/main/jOOQ-examples/jOOQ-testcontainers-example
Also, integration tests will help test query correctness. Unit tests like these will only provide dummy results, irrespective of the actual query. It is likely that such mocks can be implemented much more easily on a higher level than the SQL level, i.e. by mocking the DAO, or repository, or whatever methods, instead.

Using JOOQ, what more do I need to prevent sql injections

How is this a duplicate as i am specifically asking about JOOQ here?
I am using JOOQ in my JAVA project to handle all my PostgreSQL queries. I read in this article that JOOQ uses prepared statements to execute all queries.
Is it than safe to assume that I don't need to worry about SQL injection or user input when executing my queries?
I don't need to worry about escaping the user input before giving it over to JOOQ?
On the side note, which other vulnerabilities are there to my DB in getting user input (apart from those that are solved by prepared statements), that I should be careful of?

1) Yes, as long as you use the provided API's correctly. It is still possible to inject plain sql queries though so be careful.
All methods in the jOOQ API that allow for plain (unescaped, untreated) SQL contain a warning message in their relevant Javadoc
// This query will use bind values, internally.
create.fetch("SELECT * FROM BOOK WHERE ID = ? AND TITLE = ?", 5, "Animal Farm");
// This query will not use bind values, internally.
create.fetch("SELECT * FROM BOOK WHERE ID = 5 AND TITLE = 'Animal Farm'");
See JOOQ docs here for a more in depth explanation: https://www.jooq.org/doc/3.9/manual/sql-building/bind-values/sql-injection/
2) No, see above.
3) Aside from that just beware of general DB security issues, such as user authentication/roles and storing sensitive data in an unecrypted format etc

Little risk when using jOOQ as intended
When you use jOOQ as intended, then you will run into little risk of SQL injection. The intended usage is:
Using source code generation to generate meta data for your tables / columns, etc.
Using the DSL for type safe embedded SQL
As others have mentioned, jOOQ will always use bind variables, properly escape all inlined values (constants, literals). But again, as others have mentioned, jOOQ still allows for using plain SQL templating for those cases where you need to work around a lack of functionality or vendor specific feature support. In those cases, you have to be as careful as with JDBC and make sure to explicitly use bind variables and avoid string concatenation, yourself.
Preventing accidents with the PlainSQLChecker annotation processor
One way to prevent accidentally using plain SQL templating, and to make sure no one on the team uses it without approval is to use jOOQ's checker framework / error prone integration and disallow all plain SQL usage by default. With Maven, you could configure this (leaving out the JDK version specific details):
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<annotationProcessors>
<annotationProcessor>org.jooq.checker.PlainSQLChecker</annotationProcessor>
</annotationProcessors>
</configuration>
</plugin>
And now your code using methods like DSL.query(String) won't compile anymore, until you explicitly allow it with the #Allow.PlainSQL annotation on the scope of your choice (method, class, package)

It's always possible to write unsafe queries, no matter what language and framework you use.
The naive way of concatenating variables into SQL creates an opportunity for SQL injection:
String unsafeString = "O'Reilly";
create.fetch("SELECT * FROM BOOK WHERE ID = 5 AND TITLE = '"+unsafeString+"'");
// results in SQL syntax error because of unmatched ' marks
Merely using prepared queries does NOT make an unsafe query into a safe query.
Use parameters to separate dynamic values from the SQL query. These are combined within the RDBMS at execution time. There is no way a parameter can cause an SQL injection vulnerability.
String unsafeString = "O'Reilly";
create.fetch("SELECT * FROM BOOK WHERE ID = 5 AND TITLE = ?", unsafeString);
// still OK
When you use parameters, you don't need to do any escaping of the variables. In fact, you must not, because you'll end up with escape symbols in your data.
Parameters are good for combining Java variables into an SQL query, but only in the place of an SQL scalar value. That is, where you would normally use a quoted string literal, quoted date literal, or numeric literal in your SQL, you can replace it with a parameter placeholder.
But you can't use parameters for anything else in SQL:
Table names
Column names
Lists of values, for example for an IN ( ... ) predicate—you must use one ? placeholder per individual value in the list.
SQL expressions
SQL keywords
You might like my presentation SQL Injection Myths and Fallacies (video), or my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming
Re comment from #rehas:
Indeed, using prepared statements does not mean you are using parameters implicitly. I showed an example above (my first example), of concatenating an unsafe variable into an SQL string before it is sent to prepare().
Once the SQL string arrives in the RDBMS server, it has no way of knowing which parts of the string were legitimate and which parts were concatenated from unsafe variables. All it sees is one string containing an SQL statement.
The point of using parameters is to keep the (potentially unsafe) variables separated from the SQL string. Within the RDBMS server, the SQL string—still with parameter placeholders like ?—is parsed. Once it's parsed, it won't be parsed again, so it's safe for strings like "O'Reilly" to be bound to the parameter placeholders without risk of causing mismatched quotes or anything. A parameters is guaranteed to be treated as a single value in the SQL execution, even if the value of the parameter contains characters that would have changed the way the query was parsed, if it had been included before prepare().
It's not true that using prepare() means you're always using parameters.
It's accurate to say that using parameters requires use of prepare() and execute() as separate steps. But some frameworks do both steps for you. I'm sure if you were to read the jOOQ source code, you'd see it.

Difference about performance betwen Named parameters and Positional parameters

I know that comparing with a normal Select, putting the values directly into the SQL statements, like this way:
Statement command = connection.createStatement( "SELECT * FROM person WHERE name = "+ nameVar);
Doing that with JDBC and positional parameters, like this:
String statment= "SELECT * from Users WHERE name=? and pass =?";
sentence = conexion.prepareStatement(consulta);
sentence.setString(1, nameVar);
sentence.setString(2, passVar);
Is better for several things, like avoid SQL injection, and also uses less memory (for caching the execution plan only once) and performance (for not doing the same execution plan again and again)
But if you have a lots of "?" may be difficult to correlate the parameter with the variable, and that can cause an error.
My question is, if there is a difference between doing the positional parameters as above with doing "named parameters" like this:
String statment= "SELECT * from Users WHERE name=:nameParam and pass =:passParam";
sentence = conexion.prepareStatement(consulta);
sentence.setString("nameParam", nameVar);
sentence.setString("passParam", passVar);
Because is easier and can skip errors
UPDATE
by the comments seems that only using JPA/Hibernate can use :parameters. Well The question remains. There is any difference using Hibernate?

As you point out you are building a PreparedStatement so that the DB can cache it's execution plan etc. So that is where all the performance gain comes from when the driver actually talks to the DB.
The other benefits are for the code. It is DRY-er (Do not Repeat Yourself) you can prevent SQL injection attacks as you can validate the params and they can't add "structural changes" to the SQL. You can keep a ref to the PreparedStatement Object so no need to re-create it etc etc.
But you are doing this on both cases, so the main benefits are the same. Under the covers an Array of params is passed to the Driver to run the prepared statement.
Positional params will most likely just index into the Array (depending on your Drivers' implementation) whereas named params maintain a map of name to position and generate the correct array when required.
Given that the number of params in a SQL statement tends to be small (can usually count them on one or two hands) creating the array is very fast, and negligible compared to the cost of running the query over the network to the DB.
So IMHO use whichever helps you reason about your code best. The performance difference is minuscule.

SQL injection prevention with hibernate

I have a existing code where the application generates different sql depend of lot of conditions and execute them via hibernate sessions createSQLQuery(). In here the parameters are concat to the sql string which reside in the java class as normal string replacement. The problem here is now i need to prevent sql injections. So for that i have to use getNamedQuery() and bind the parameters so hibernate will take care of special characters. But the problem is moving the string sql's to xml file is a overhead because conditionally generating sql's. So i decide to manually do the special character validation and append it to the string query and execute as it is now.
So then i check the source for PrepareStatement i found, it just throw a exception
byte[] arrayOfByte1 = new byte[0];
try
{
arrayOfByte1 = CharsToBytes(this.OdbcApi.charSet, arrayOfChar);
}
catch (UnsupportedEncodingException localUnsupportedEncodingException) {
}
How can i do same kind of encoding in the java class as above for the parameters before concat them with the string query for eliminate sql injections? Or is there any way i can still keep the string sql as it is an append parameters and use hibernate to execute the query?

As far as I can tell, you want to create SQL queries on the fly because the combination of conditions (from the UI, I guess) can be very complicated. That's fine. All you need to control are the parameters that the user supplies. And for that, you can, and should, still use Hibernate's createSqlQuery(). That function understands either ? for positional parameters (numbered from beginning of query string), or :param_name syntax and then you supply named parameters. You don't need to move anything into an xml file.
Section 16.1.7 has examples.

If you need to assemble custom SQL into a query, I've found writing my own criteria classes that includes the custom SQL works well.
You just need to implement the Criterion interface.
https://docs.jboss.org/hibernate/orm/3.5/api/org/hibernate/criterion/Criterion.html
(See also the Hibernate implementation of 'not null': http://www.grepcode.com/file/repo1.maven.org/maven2/org.hibernate/hibernate/3.2.4.sp1/org/hibernate/criterion/NotNullExpression.java?av=f .)
Then you can simply build up each custom query using the normal hibernate criteria API.
https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/querycriteria.html#querycriteria-creating
Sanitising SQL values properly is painful - try really hard to avoid it! ;-)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.