Performance issue with getInt within ResultSetExtractor

Performance issue with getInt within ResultSetExtractor - java

I have a performance issue with calling getInt inside a ResultSetExtractor. GetInt is called 20000 times. One call cost 0.15 ms, the overall cost is 24 seconds, while running inside the profiler. The execution of the SQL statements takes around 8 seconds (access over Primary-Key). I use the mysql driver version 5.1.13, mysql server 5.1.44 and spring-jdbc-3.1.1
Have you an idea to improve the performance?
mut.getMutEffect()[0]=(rs.getInt("leffect_a") != 0);
mut.getMutEffect()[1]=(rs.getInt("leffect_c") != 0);
...
mut.getMutEffect()[19]=(rs.getInt("leffect_y") != 0);
mut.getMutReliability()[0]=rs.getInt("lreliability_a");
...
mut.getMutReliability()[19]=rs.getInt("lreliability_y");
My scheme looks like this
CREATE TABLE mutation (
...
leffect_a BIT NOT NULL,
lreliability_a TINYINT UNSIGNED NOT NULL,
...
leffect_y BIT NOT NULL,
lreliability_y TINYINT UNSIGNED NOT NULL,
...
) ENGINE=MyISAM;
Edit: Within getInt the methode getIntWithOverflowCheck is called which seems to be expensive. Is it possible to turn of this checks?

Here are some suggestions:
Set fetch size to a fairly large number: Statement.setFetchSize(). This should reduce the round-trips to the database server while processing the resultset.
Ensure the select statement is optimal by profiling
General table optimization, e.g. are you using correct datatypes? It looks like you could change the leffect_a to a BOOLEAN
Make sure you aren't returning any unnecessary columns in your SELECT statement.
Use PreparedStatement
Avoid scrollable and updatable resultsets (neither are the default)

Two suggestions:
Store the result of getMutEffect() and getMutReliability() in local variables as they are used repeatedly. The hotspot jit might inline and remove the duplicate expressions but I think its clearer to not rely on this.
It might be faster to retrieve the values of the ResultSet using their indizes instead of the column names. You could even create a local map of names to index, strangely for some jdbc drivers this is faster than letting ResultSet do the mapping.

Related

The Performance Consequences of Parameterizing Constants in PreparedStatement Queries

When using JDBC's PreparedStatements to query Oracle, consider this:
String qry1 = "SELECT col1 FROM table1 WHERE rownum=? AND col2=?";
String qry2 = "SELECT col1 FROM table1 WHERE rownum=1 AND col2=?";
String qry3 = "SELECT col1 FROM table1 WHERE rownum=1 AND col2=" + someVariable ;
The logic dictates that the value of rownum is always a constant (1 in this example). While the value of col2 is a changing variable.
Question 1: Are there any Oracle server performance advantages (query compilation, caching, etc.) to using qry1 where rownum value is parameterized, over qry2 where rownum's constant value is hardcoded?
Question 2: Ignoring non-performance considerations (such as SQL Injections, readability, etc.), are there any Oracle server performance advantages (query compilation, caching, etc.) to using qry2 over qry3 (in which the value of col2 is explicitly appended, not parameterized).

Answer 1: There are no performance advantages to using qry1 (a softcoded query) over qry2 (a query with reasonable bind variables).
Bind variables improve performance by reducing query parsing; if the bind variable is a constant there is no extra parsing to avoid.
(There are probably some weird examples where adding extra bind variables improves the performance of one specific query. Like with any forecasting program, occasionally if you feed bad information to the Oracle optimizer the result will be better. But it's important to understand that those are exceptional cases.)
Answer 2: There are many performance advantages to using qry2 (a query with reasonable bind variables) over qry3 (a hardcoded query).
Bind variables allow Oracle re-use a lot of the work that goes into query parsing (query compilation). For example, for each query Oracle needs to check that the user has access to view the relevant tables. With bind variables that work only needs to be done once for all executions of the query.
Bind variables also allow Oracle to use some extra optimization tricks that only occur after the Nth run. For example, Oracle can use cardinality feedback to improve the second execution of a query. When Oracle makes a mistake in a plan, for example if it estimates a join will produce 1 row when it really produces 1 million, it can sometimes record that mistake and use that information to improve the next run. Without bind variables the next run will be different and it won't be able to fix that
mistake.
Bind variables also allow for many different plan management features. Sometimes a DBA needs to change an execution plan without changing the text of the query. Features like SQL plan baselines, profiles, outlines, and DBMS_ADVANCED_REWRITE will not work if the query text is constantly changing.
On the other hand, there are a few reasonable cases where it's better to hard-code the queries. Occasionally an Oracle feature like partition pruning cannot understand the expression and it helps to hardcode the value. For large data warehouse queries the extra time to parse a query may be worth it if the query is going to run for a long time anyway.
(Caching is unlikely to affect either scenario. Result caching of a statement is rare, it's much more likely that Oracle will cache only the blocks of the tables used in the statement. The buffer cache probably does not care if those blocks are accessed by one statement many times or by many statements one time)

Is there an upper limit to the number of bind calls in a JOOQ batch statement?

We use batch statements when inserting as follows:
BatchBindStep batch = create.batch(create
.insertInto(PERSON, ID, NAME)
.values((Integer) null, null));
for (Person p : peopleToInsert) {
batch.bind(p.getId(), p.getName());
}
batch.execute();
This has worked well in the past when inserting several thousands of objects. However, it raises a few questions:
Is there an upper limit to the number of .bind() calls for a batch?
If so, what does the limit depend on?
It seems to be possible to call .bind() again after having executed .execute(). Will .execute() clear previously bound values?
To clarify the last question: after the following code has executed...
BatchBindStep batch = create.batch(create
.insertInto(PERSON, ID, NAME)
.values((Integer) null, null));
batch.bind(1, "A");
batch.bind(2, "B");
batch.extecute();
batch.bind(3, "C");
batch.bind(4, "D");
batch.execute();
which result should I expect?
a) b)
ID NAME ID NAME
------- -------
1 A 1 A
2 B 2 B
3 C 1 A
4 D 2 B
3 C
4 D
Unfortunately, neither the Javadoc nor the documentation discuss this particular usage pattern.
(I am asking this particular question because if I .execute() every 1000 binds or so to avoid said limit, I need to know whether I can reuse the batch objects for several .execute() calls or not.)

This answer is valid as of jOOQ 3.7
Is there an upper limit to the number of .bind() calls for a batch?
Not in jOOQ, but your JDBC driver / database server might have such limits.
If so, what does the limit depend on?
Several things:
jOOQ keeps an intermediate buffer for all of the bound variables and binds them to a JDBC batch statement all at once. So, your client memory might also impose an upper limit. But jOOQ doesn't have any limits per se.
Your JDBC driver might know such limits (see also this article on how jOOQ handles limits in non-batch statements). Known limits are:
SQLite: 999 bind variables per statement
Ingres 10.1.0: 1024 bind variables per statement
Sybase ASE 15.5: 2000 bind variables per statement
SQL Server 2008: 2100 bind variables per statement
I'm not aware of any such limits in Oracle, but there probably are.
Batch size is not the only thing you should tune when inserting large amounts of data. There are also:
Bulk size, i.e. the number of rows inserted per statement
Batch size, i.e. the number of statements per batch sent to the server
Commit size, i.e. the number of batches committed in a single transaction
Tuning your insertion boils down to tuning all of the above. jOOQ ships with a dedicated importing API where you can tune all of the above: http://www.jooq.org/doc/latest/manual/sql-execution/importing
You should also consider bypassing SQL for insertions into a loader table, e.g. using Oracle's SQL*Loader. Once you've inserted all data, you can move it to the "real table" using PL/SQL's FORALL statement, which is PL/SQL's version of JDBC's batch statement. This approach will out perform anything you do with JDBC.
It seems to be possible to call .bind() again after having executed .execute(). Will .execute() clear previously bound values?
Currently, execute() will not clear the bind values. You'll need to create a new statement instead. This is unlikely to change, as future jOOQ versions will favour immutability in its API design.

Weird Behavior of ResultSet

hello mates hope you having a nice day, i have a really weird problem with getting data from Mysql database in Java , in the method here :
ResultSet tableExistence;
.
.
.
while (tableExistence.next()) {
System.out.println("before : "+tableExistence.getInt(MyHttpServer.COL_ID));
if(tableExistence.getString(MyHttpServer.COL_STATUS).equals(MyHttpServer.STATUS_AVAILABLE)){
System.out.println("after : "+tableExistence.getInt(MyHttpServer.COL_ID));
...
}
weirdly the value of the "before" is for the right value of the id , but after the if method, the value of "after" returns some thing like 1234125151231 , any idea why is this problem ?!!!!

The ResultSet documentation states:
The docs do say "For maximum portability, result set columns within each row should be read in left-to-right order, and each column should be read only once."
So technically, it's somewhat reasonable. However:
I would actually expect that any modern, well-supported database would allow you to access the columns in an arbitrary order, multiple times
Just returning an incorrect value rather than throwing an exception to indicate the problem is pretty nasty
One explanation could be if you're fetching a clob or blob - the driver may decide in that case that "left to right only" is more reasonable, to avoid having to keep too much data in memory. It's still not great that it just returns corrupted data though.
So while this is a possible answer, I would also:
Check that neither the connection nor the ResultSet is being used from multiple threads concurrently
Check that you're using a recent version of the MySQL driver

JDBC - Best way for perfomance of a preparedstatement with optional criteria

I try to improve the performance of my application. For this, I replace the current statement by prepared-statement to use the oracle cash.
But several of my queries are dynamic (conditional criteria and tables) and I would like to know the best solution to reduce the execution time.
In stackoverflow, I found two solutions.
The first is to build query with if statement :
if(a.equals("Test")) {
whereClause.append ("COLUMN1 = ? ");
}
if(b.equals("Test2")) {
whereClause.append ("COLUMN2 = ? ");
} etc...
The second is to create one query with all criteria but to add "OR var is null"
where (? is null OR (COLUMN1 = ?)) AND (? is null OR (COLUMN2 = ?)) etc.
In your opinion what is the best way In your to performance knowing that a request may contain 40/50 criteria and some tables can be optional too. Or is there another solution.
The database that use is oracle with datasource and jdk 1.5.
Thank you in advance for your help.

First JDBC and Oracle are capable of handling and caching multiple prepared statements.
First approach:
if(a.equals("Test")) {
whereClause.append ("COLUMN1 = ? ");
}
if(b.equals("Test2")) {
whereClause.append ("COLUMN2 = ? ");
}
will certainly lead to more different statements and thous to more prepared statements, but each of this statements is optimizable at preparation-time as no variable is optional.
Second approach:
where (? is null OR (COLUMN1 = ?)) AND (? is null OR (COLUMN2 = ?)) etc.
will result in exactly one prepared statement, but during prepare the query-plan needs to include the possibility that all parameters are non-null.
So it really becomes a matter of benchmarking and how many optional parameters there are. If you have too many prepared statements one might not be cached (but it might also be, that it is so seldomly used, that this does not disturb overall performance).
I also don't know how much overhead the optional parameters in the second approach create even if they are evaluated for each line as the optimizer could not remove them.
Finally if you have indexes prepared that cover some combinations of your optional parameters the first approach should be faster as during prepare the possible indexes are evaluated and the best is choosen (query-plan).

The biggest wins in prepared statement is the query-plan (optimization) being done ahead of time. I, to be honest, doubt the optimization will be very big in case you have 20 criteria like that and only a few are actually used. Why don't you test it and see which is quickest?
Also, I think making a stored procedure out of this could be very advantageous.

I would use PreparedStatement with dynamic query without any additional logic. If a query repeats JDBC will find previously executed cached statement.

I agree with Evgeniy, PreparedStatements are the way to go. If you have many parameters, the upkeep is easier if you use named parameters instead of question marks. See: http://www.javaworld.com/article/2077706/core-java/named-parameters-for-preparedstatement.html. I have used this with about 20 criteria.

Best practice when setting boolean values in PreparedStarement

So I have a MYSQL db in which boolean values are stored as binary(1). I was to investigate why certain queries were slow even though there was an index on the relevant columns. The issue was that when building the SELECT query, the system was using the setBoolean method of PreparedStatement which, as I understand it, converts the value to MYSQL TINYINT. The query found the correct rows, but never used the index since the index was on a binary column. However, if I instead used the setString method and converted the boolean to a string, namely '0' for false and '1' for true, MYSQL was able to use the index and find the wanted rows fast.
Basically, the first query is what I got when using setBoolean and the second when using setString:
SELECT someColumn FROM table WHERE binaryColumn = 1 //Does not use index
SELECT someColumn FROM table WHERE binaryColumn = '1'//Uses index
In Java the change was this:
PreparedStatement ps1 = ...
ps1.setBoolean(1, true);
...
PreparedStatement ps2 = ...
ps2.setString(1, "1");
...
My question is simply if there is a better way to do this? Everything works fine but for some reason I think the code "smells" but I cant really motivate why.

I prefer always the setBoolean, because of abstraction.
The real interesting point is when your DB uses the index.
The optimizier of the DB use a index only, if it makes sense. If you have 1000 entries and a booleanvalue only split it into 50/50 it make no sense for that index, especial when its not the PK - but if you use a additional limitation, to get only 10 rows, as result, a good optimizer should use the index you specified - maybe a "composed index" on 2 columns (booleanColumn1, StringColumn1)

MySQL uses TINYINT(1) for the SQL BOOL/BOOLEAN. So I would change the data type to BOOLEAN, in accordance to standard SQL.
By your relay, the issue should then be resolved. By the way BIT(1) would be another option.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.