Optimize oracle query with IN clause - java

I have two queries where I am using in parameter and I am populating the PreparedStatement using setLong and setString operations.
Query 1
SELECT A, B FROM TABLE1 WHERE A in (SELECT A FROM TABLE2 WHERE C in (?,?,?) )
Query 2
SELECT A, B FROM TABLE1 WHERE A in (?,?)
I am being told that it creates a unique query for each possible set size and pollutes Oracle's SQL cache. Also, oracle could choose different execution plans for each query here as size is not fixed.
What optimizations could be applied to make it better?
Would it be fine if I create in-clause list of size 50 and populate remaining ones using dummy/redundant variables?
If I am not wrong, select-statement in the in-clause will be difficult to optimize unless it is extracted out and used again as a list of statements.

I am being told that it creates a unique query for each possible set size and pollutes Oracle's SQL cache.
This is correct, assuming that the number of items in the IN list can change between requests. If the number of question marks inside the IN list remains the same, there would be no "pollution" of the cache.
Also, oracle could choose different execution plans for each query here as size is not fixed.
That is correct, too. It's a good thing, though.
What optimizations could be applied to make it better? Would it be fine if I create in-clause list of size 50 and populate remaining ones using dummy/redundant variables?
Absolutely. I used this trick many times: rather than generating a list of the exact size, I generated lists of length divisible by a certain number (I used 16, but 50 is also fine). If the size of the actual list wasn't divisible by 16, I added the last item as many times as it was required to reach the correct length.
The only optimization this achieves is the reduction of items in the cache of query plans.

Related

Jooq limit before join

I have two tables with a 1-n relation, i.e. a table Order which stores orders and a table OrderPosition which stores positions of an order.
When fetching from the DB while joining both tables I want to limit the number of orders. When limiting after the join of the two tables, this will obviously not work, as one order may result multiple records depending on the number of the positions the order has.
This is what I wrote with Jooq now:
final Table<Record> alias = context.select().from(ORDER).limit(1).asTable();
final Result<Record> result =
context
.select()
.from(alias.join(ORDER_POSITION)
.on(ORDER_POSITION.ORDER_ID.eq(alias.field(ORDER.ID))))
.fetch();
This does not seem to limit the number of orders, it returns me more than one order. On the other hand, if I replace the join with a leftJoin, the limitation works just as intended (also for different limit parameters). I used a H2 DB to test the query (not sure if it matters).
I'm aware of the difference of join and leftJoin but should the limitation work in both cases as intended or did I miss something?
This is what is generated by Jooq with a join:
select
"alias_129458832"."ID",
"alias_129458832"."KEY",
...
"PUBLIC"."ORDER_POSITION"."ID",
"PUBLIC"."ORDER_POSITION"."ORDER_ID"
from (
select
"PUBLIC"."ORDER"."ID",
"PUBLIC"."ORDER"."KEY",
...
from "PUBLIC"."ORDER"
limit ?
) "alias_129458832"
join "PUBLIC"."ORDER_POSITION"
on "PUBLIC"."ORDER_POSITION"."ORDER_ID" =
"alias_129458832"."ID"
If I replace the query with a leftJoin, it generates the same query except the join is replaced by 'left outer join'.
What I tested
I run the tests with an in-memory H2 database. I have a couple of orders without any positions, one order with one position and one order with three positions.
With the join and a limit of 1, I got all 4 joined records (which means it returns both orders), with a limit of 0 I got no records. The order has a random key, which is always randomly generated for each test. If I order the orders by this key with
context.select().from(ORDER).orderBy(ORDER.KEY).limit(4).asTable();
and a limit of 4 I got sometimes no records, sometimes one record (the order with one position), sometimes three records (the order with three positions) and sometimes all 4 records (which is both orders).
As mentioned if I replace the join with a leftJoin, I got also of course the orders without any position and but here the number of orders specified with the limit is always correct.

Differences between using sql IN() with subselect and code-generated string

Imagine we have an sql such as
SELECT something FROM TableName WHERE something NOT IN (SELECT ...);
And result size of second SELECT is a huge.
So what if I change second SELECT by generated string value such as
"a1, a2, a3, ... an", where is n - is a really big number. Will I get an error that sql query size is too large? Is this size limited? Is this size different for result of second SELECT and generated string?
This completely depends on your database engine/server. You can play with database specific settings to overcome (or) at least extend some of these limits.
But overall I think you should look for solutions liks "Join" instead of subqueries. There are some advantages with that approach.

Wrong order of rows after inserting to Oracle db using Java [duplicate]

In Oracle, what is the the default ordering of rows for a select query if no "order by" clause is specified.
Is it
the order in which the rows were inserted
there is no default ordering at all
none of the above.
According to Tom Kyte: "Unless and until you add "order by" to a query, you cannot say ANYTHING about the order of the rows returned. Well, short of 'you cannot rely on the order of the rows being returned'."
See this question at asktom.com.
As for ROWNUM, it doesn't physically exist, so it can't be "freed". ROWNUM is assigned after a record is retrieved from a table, which is why "WHERE ROWNUM = 5" will always fail to select any records.
#ammoQ: you might want to read this AskTom article on GROUP BY ordering. In short:
Does a Group By clause in an Query gaurantee that the output data will be
sorted on the Group By columns in
order, even if there is NO Order By
clause?
and we said...
ABSOLUTELY NOT,
It never has, it never did, it never
will.
There is no explicit default ordering. For obvious reasons, if you create a new table, insert a few rows and do a "select *" without a "where" clause, it will (very likely) return the rows in the order they were inserted.
But you should never ever rely on a default order happening. If you need a specific order, use an "order by" clause. For example, in Oracle versions up to 9i, doing a "group by" also caused the rows to be sorted by the group expression(*). In 10g, this behaviour does no longer exist! Upgrading Oracle installations has caused me some work because of this.
(*) disclaimer: while this is the behaviour I observed, it was never guaranteed
It has already been said that Oracle is allowed to give you the rows in any order it wants, when you don't specify an ORDER BY clause. Speculating what the order will be when you don't specify the ORDER BY clause is pointless. And relying on it in your code, is a "career limiting move".
A simple example:
SQL> create table t as select level id from dual connect by level <= 10
2 /
Tabel is aangemaakt.
SQL> select id from t
2 /
ID
----------
1
2
3
4
5
6
7
8
9
10
10 rijen zijn geselecteerd.
SQL> delete t where id = 6
2 /
1 rij is verwijderd.
SQL> insert into t values (6)
2 /
1 rij is aangemaakt.
SQL> select id from t
2 /
ID
----------
1
2
3
4
5
7
8
9
10
6
10 rijen zijn geselecteerd.
And this is only after a simple delete+insert. And there are numerous other situations thinkable. Parallel execution, partitions, index organised tables to name just a few.
Bottom line, as already very well said by ammoQ: if you need the rows sorted, use an ORDER BY clause.
You absolutely, positively cannot rely on any ordering unless you specify order by. For Oracle in particular, I've actually seen the exact same query (without joins), run twice within a few seconds of each other, on a table that didn't change in the interim, return a wildly different order. This seems to be more likely when the result set is large.
The parallel execution mentioned by Rob van Wijk probably explains this. See also Oracle's Using Parallel Execution doc.
It is impacted by index ,
if there is index ,it will return a ascending order ,
if there is not any index ,it will return the order inserted .
You can modify the order in which data is stored into the table by INSERT with the ORGANIZATION clause of the CREATE TABLE statement
Although, it should be rownnum (your #2), it really isn't guaranteed and you shouldn't trust it 100%.
I believe it uses Oracle's hidden Rownum attribute.
So your #1 is probably right assuming there were no deletes done that might have freed rownums for later use.
EDIT: As others have said, you really shouldn't rely on this, ever. Besides deletes theres a lot of different conditions that can affect the default sorting behavior.

Is there an upper limit to the number of bind calls in a JOOQ batch statement?

We use batch statements when inserting as follows:
BatchBindStep batch = create.batch(create
.insertInto(PERSON, ID, NAME)
.values((Integer) null, null));
for (Person p : peopleToInsert) {
batch.bind(p.getId(), p.getName());
}
batch.execute();
This has worked well in the past when inserting several thousands of objects. However, it raises a few questions:
Is there an upper limit to the number of .bind() calls for a batch?
If so, what does the limit depend on?
It seems to be possible to call .bind() again after having executed .execute(). Will .execute() clear previously bound values?
To clarify the last question: after the following code has executed...
BatchBindStep batch = create.batch(create
.insertInto(PERSON, ID, NAME)
.values((Integer) null, null));
batch.bind(1, "A");
batch.bind(2, "B");
batch.extecute();
batch.bind(3, "C");
batch.bind(4, "D");
batch.execute();
which result should I expect?
a) b)
ID NAME ID NAME
------- -------
1 A 1 A
2 B 2 B
3 C 1 A
4 D 2 B
3 C
4 D
Unfortunately, neither the Javadoc nor the documentation discuss this particular usage pattern.
(I am asking this particular question because if I .execute() every 1000 binds or so to avoid said limit, I need to know whether I can reuse the batch objects for several .execute() calls or not.)
This answer is valid as of jOOQ 3.7
Is there an upper limit to the number of .bind() calls for a batch?
Not in jOOQ, but your JDBC driver / database server might have such limits.
If so, what does the limit depend on?
Several things:
jOOQ keeps an intermediate buffer for all of the bound variables and binds them to a JDBC batch statement all at once. So, your client memory might also impose an upper limit. But jOOQ doesn't have any limits per se.
Your JDBC driver might know such limits (see also this article on how jOOQ handles limits in non-batch statements). Known limits are:
SQLite: 999 bind variables per statement
Ingres 10.1.0: 1024 bind variables per statement
Sybase ASE 15.5: 2000 bind variables per statement
SQL Server 2008: 2100 bind variables per statement
I'm not aware of any such limits in Oracle, but there probably are.
Batch size is not the only thing you should tune when inserting large amounts of data. There are also:
Bulk size, i.e. the number of rows inserted per statement
Batch size, i.e. the number of statements per batch sent to the server
Commit size, i.e. the number of batches committed in a single transaction
Tuning your insertion boils down to tuning all of the above. jOOQ ships with a dedicated importing API where you can tune all of the above: http://www.jooq.org/doc/latest/manual/sql-execution/importing
You should also consider bypassing SQL for insertions into a loader table, e.g. using Oracle's SQL*Loader. Once you've inserted all data, you can move it to the "real table" using PL/SQL's FORALL statement, which is PL/SQL's version of JDBC's batch statement. This approach will out perform anything you do with JDBC.
It seems to be possible to call .bind() again after having executed .execute(). Will .execute() clear previously bound values?
Currently, execute() will not clear the bind values. You'll need to create a new statement instead. This is unlikely to change, as future jOOQ versions will favour immutability in its API design.

Better to query once, then organize objects based on returned column value, or query twice with different conditions?

I have a table which I need to query, then organize the returned objects into two different lists based on a column value. I can either query the table once, retrieving the column by which I would differentiate the objects and arrange them by looping through the result set, or I can query twice with two different conditions and avoid the sorting process. Which method is generally better practice?
MY_TABLE
NAME AGE TYPE
John 25 A
Sarah 30 B
Rick 22 A
Susan 43 B
Either SELECT * FROM MY_TABLE, then sort in code based on returned types, or
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'A' followed by
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'B'
Logically, a DB query from a Java code will be more expensive than a loop within the code because querying the DB involves several steps such as connecting to DB, creating the SQL query, firing the query and getting the results back.
Besides, something can go wrong between firing the first and second query.
With an optimized single query and looping with the code, you can save a lot of time than firing two queries.
In your case, you can sort in the query itself if it helps:
SELECT * FROM MY_TABLE ORDER BY TYPE
In future if there are more types added to your table, you need not fire an additional query to retrieve it.
It is heavily dependant on the context. If each list is really huge, I would let the database to the hard part of the job with 2 queries. At the opposite, in a web application using a farm of application servers and a central database I would use one single query.
For the general use case, IMHO, I will save database resource because it is a current point of congestion and use only only query.
The only objective argument I can find is that the splitting of the list occurs in memory with a hyper simple algorithm and in a single JVM, where each query requires a bit of initialization and may involve disk access or loading of index pages.
In general, one query performs better.
Also, with issuing two queries you can potentially get inconsistent results (which may be fixed with higher transaction isolation level though ).
In any case I believe you still need to iterate through resultset (either directly or by using framework's methods that return collections).
From the database point of view, you optimally have exactly one statement that fetches exactly everything you need and nothing else. Therefore, your first option is better. But don't generalize that answer in way that makes you query more data than needed. It's a common mistake for beginners to select all rows from a table (no where clause) and do the filtering in code instead of letting the database do its job.
It also depends on your dataset volume, for instance if you have a large data set, doing a select * without any condition might take some time, but if you have an index on your 'TYPE' column, then adding a where clause will reduce the time taken to execute the query. If you are dealing with a small data set, then doing a select * followed with your logic in the java code is a better approach
There are four main bottlenecks involved in querying a database.
The query itself - how long the query takes to execute on the server depends on indexes, table sizes etc.
The data volume of the results - there could be hundreds of columns or huge fields and all this data must be serialised and transported across the network to your client.
The processing of the data - java must walk the query results gathering the data it wants.
Maintaining the query - it takes manpower to maintain queries, simple ones cost little but complex ones can be a nightmare.
By careful consideration it should be possible to work out a balance between all four of these factors - it is unlikely that you will get the right answer without doing so.
You can query by two conditions:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B'
This will do both for you at once, and if you want them sorted, you could do the same, but just add an order by keyword:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B' ORDER BY TYPE ASC
This will sort the results by type, in ascending order.
EDIT:
I didn't notice that originally you wanted two different lists. In that case, you could just do this query, and then find the index where the type changes from 'A' to 'B' and copy the data into two arrays.

Categories