Getting Scrollable Resultsets from Oracle DB - java

We are working in our team pretty tight up with an Oracle DB server using jdbc. In one of our changes, I'm calling a Stored Procedure which returns me two different ResultSets. At first my implementation assumed default Scroll-ability.
After that failed, I looked it up in the Internet.
Everything I could read about it said basically the same thing: use prepareStatement or prepareCall methods with the appropriate TYPE_SCROLL_INSENSITIVE and CONCUR_READ_ONLY. None of these worked.
The Stored Procedure I use, again, return me two different result sets and they are extracted through a (ResultSet) rs.getObject("name"). Generally in examples, their ResultSet are coming back instantly from a .executeQuery.
My Question is, Do the Scrollablility/Updatability types in the prepareCall methods affecting these sort of ResultSets? if so, how do I get them?
I know that the JDBC driver can degrade my request for ScrollableResultSet. How can I tell if my ResultSet was degraded?
On that note, Why aren't ResultSets scrollable by default? What are the best practices and what is "the cost" of their flexibility?

In Oracle, a cursor is a forward-only structure. All the database knows how to do is fetch the next row (well, technically the next n rows). In order to make a ResultSet seem scrollable, you rely on the JDBC driver.
The JDBC driver has two basic approaches to making ResultSet seem scrollable. The first is to save the entire result set in memory as you fetch data just in case you want to go backwards. Functionally, that works but it has potentially catastrophic results on performance and scalability when a query potentially returns a fair amount of data. The first time some piece of code starts chewing up GB of RAM on app servers because a query returned thousands of rows that included a bunch of long comment fields, that JDBC driver will get rightly pilloried as a resource hog.
The more common approach is to for the driver to add a key to the query and to use that key to manage the data the driver caches. So, for example, the driver might keep the last 1000 rows in memory in their entirety but only cache the key for the earlier rows so it can go back and re-fetch that data later. That's more complicated to code but it also requires that the ResultSet has a unique key. Normally, that's done by trying to add a ROWID to the query. That's why, for example, the Oracle JDBC driver specifies that a scrollable or updatable ResultSet cannot use a SELECT * but can use SELECT alias.*-- the latter makes it possible for the driver to potentially be able to blindly add a ROWID column to the query.
A ResultSet coming from a stored procedure, however, is completely opaque to the driver-- it has no way of getting the query that was used to open the ResultSet so it has no way to add an additional key column or to go back and fetch the data again. If the driver wanted to make the ResultSet scrollable, it would have to go back to caching the entire ResultSet in memory. Technically, that is entirely possible to do but very few drivers will do so since it tends to lead to performance problems. It's much safer to downgrade the ResultSet. Most of the time, the application is in a better position to figure out whether it is reasonable to cache the entire ResultSet because you know it is only ever going to return a small amount of data or to be able to go back and fetch rows again by their natural key.
You can use the getType() and getConcurrency() methods on your ResultSet to determine whether your ResultSet has been downgraded by the driver.

Related

JDBC Pagination: vendor specific sql versus result set fetchSize

There are a lot of different tutorials across the internet about pagination with JDBC/iterating over huge result set.
So, basically there are a number of approaches I've found so far:
Vendor specific sql
Scrollable result set (?)
Holding plain result set in a memory and map the rows only when necessary (using fetchSize)
The result set fetch size, either set explicitly, or by default equal
to the statement fetch size that was passed to it, determines the
number of rows that are retrieved in any subsequent trips to the
database for that result set. This includes any trips that are still
required to complete the original query, as well as any refetching of
data into the result set. Data can be refetched, either explicitly or
implicitly, to update a scroll-sensitive or
scroll-insensitive/updatable result set.
Cursor (?)
Custom seek method paging implemented by jooq
Sorry for messing all these but I need someone to clear that out for me.
I have a simple task where service consumer asks for results with a pageNumber and pageSize. Looks like I have two options:
Use vendor specific sql
Hold the connection/statement/result set in the memory and rely on jdbc fetchSize
In the latter case I use rxJava-jdbc and if you look at producer implementation it holds the result set, then all you do is calling request(long n) and another n rows are processed. Of course everything is hidden under Observable suggar of rxJava. What I don't like about this approach is that you have to hold the resultSet between different service calls and have to clear that resultSet if client forgets to exhaust or close it. (Note: resultSet here is java ResultSet class, not the actual data)
So, what is recommended way of doing pagination? Is vendor specific sql considered slow compared to holding the connection?
I am using oracle, ScrollableResultSet is not recommended to be used with huge result sets as it caches the whole result set data on the client side. proof
Keeping resources open for an indefinite time is a bad thing in general. The database will, for example, create a cursor for you to obtain the fetched rows. That cursor and other resources will be kept open until you close the result set. The more queries you do in parallel the more resources will be occupied and at some point the database will reject further requests due to an exhausted resource pool (e.g. there is a limited number of cursors, that can be opened at a time).
Hibernate, for example, uses vendor specific SQL to fetch a "page" and I would do it just like that.
There are many approaches because there are many different use cases.
Do you actually expect users to fetch every page of the result set? Or are they more likely to fetch the first page or two and try something else if the data they're interested in isn't there. If you are Google, for example, you can be pretty confident that people will look at results from the first page, a small number will look at results from the second page, and a tiny fraction of results will come from the third page. It makes perfect sense in that case to use vendor-specific code to request a page of data and only run that for the next page when the user asks for it. If you expect the user to fetch the last page of the result, on the other hand, running a separate query for each page is going to be more expensive than running a single query and doing multiple fetches.
How long do users need to keep the queries open? How many concurrent users? If you're building an internal application that dozens of users will have access to and you expect users to keep cursors open for a few minutes, that might be reasonable. If you are trying to build an application that will have thousands of users that will be paging through a result over a span of hours, keeping resources allocated is a bad idea. If your users are really machines that are going to fetch data and process it in a loop as quickly as possible, a single ResultSet with multiple fetches makes far more sense.
How important is it that no row is missed/ every row is seen exactly once/ the results across pages are consistent? Multiple fetches from a single cursor guarantees that every row in the result is seen exactly once. Separate paginated queries might not-- new data could have been added or removed between queries being executed, your sort might not be fully deterministic, etc.
ScrollableResultSet caches result on client side - this requires memory resources. But for example PostgreSQL does it by default and nobody complains. Some databases simply use client's memory to hold the whole resultset. In most cases the database has to process much more data to re-evaluate the query.
Also you usually have much more clients, than database instances.
Also note that query re-execution - using rownum - as implemented by Hibernate does not guarantee correct(consistent) results. If data are modified between executions and default isolation level is used.
It really depends on use case. Changing Oracle's init parameter for max. connections and also for open cursors requires database restart.
So ScrollableResultSet and cursors can be used only when you can predict amount of (concurrent) users.

What is the right time to close ResultSet

I have a Java ResultSet which contains a big data amount (but still may be stored in RAM), retrieved from the database. I'm going to work with this data. From the performance point of view, should I copy result set content to some data structure in order to be able to close the result set as soon as possible and to work with the data from a new container or it's better to do not waste a time on copy the content and work with the data directly from the result set?
ResultSet's fetch a specified number of rows at a time, dependent on the JDBC driver and database being used. So if your query has a million rows returned, not all million are resident in memory, unless you iterate over the ResultSet and put them in memory of course. To answer your question directly, you close the ResultSet when you are finished reading the rows you need, usually when the ResultSet.next() returns false.
JDBC Developers Guide

Read SQL Database in batches

I am using Java to read from a SQL RDBMS and return the results to the user. The problem is that the database table has 155 Million rows, which make the wait time really long.
I wanted to know if it is possible to retrieve results as they come from the database and present them incrementaly to the user (in batches).
My query is a simple SELECT * FROM Table_Name query.
Is there a mechanism or technology that can give me callbacks of DB records, in batches until the SELECT query finishes?
The RDBMS that is used is MS SQL Server 2008.
Thanks in advance.
Methods Statement#setFetchSize and Statement#getMoreResults are supposed to allow you to manage incremental fetches from the database. Unfortunately, this is the interface spec and vendors may or may not implement these. Memory management during a fetch is really down to the vendor (which is why I wouldn't strictly say that "JDBC just works like this").
From the JDBC documentation on Statement :
setFetchSize(int rows)
Gives the JDBC driver a hint as to the number of rows that should be
fetched from the database when more rows are needed for ResultSet
objects genrated by this Statement.
getMoreResults()
Moves to this Statement object's next result, returns true if it is a
ResultSet object, and implicitly closes any current ResultSet object(s)
obtained with the method getResultSet.
getMoreResults(int current)
Moves to this Statement object's next result, deals with any current
ResultSet object(s) according to the instructions specified by the given
flag, and returns true if the next result is a ResultSet object.
current param indicates Keep or close current ResultSet?
Also, this SO response answers about the use of setFetchSize with regards to SQLServer 2005 and how it doesn't seem to manage batched fetches. The recommendation is to test this using the 2008 driver or moreover, to use the jTDS driver (which gets thumbs up in the comments)
This response to the same SO post may also be useful as it contains a link to SQLServer driver settings on MSDN.
There's also some good info on the MS technet website but relating more to SQLServer 2005. Couldn't find the 2008 specific version in my cursory review. Anyway, it recommends creating the Statement with:
com.microsoft.sqlserver.jdbc.SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY (2004) scrollability for forward-only, read-only access, and then use the setFetchSize method to tune performance
Using pagination (LIMIT pageno, rows / TOP) might create holes and duplicates, but might be used in combination with checking the last row ID (WHERE id > ? ORDER BY id LIMIT 0, 100).
You may use TYPE_FORWARD_ONLY or FETCH_FORWARD_ONLY.
This is exactly how is JDBC driver supposed to work (I remember the bug in old PostgreSQL driver, that caused all fetched records to be stored in memory).
However, it enables you to read record when the query starts to fetch them. This is where I would start to search.
For example, Oracle optimizes SELECT * queries for fetching the whole set. It means it can take a lot of time before first results will appear. You can give hints to optimize for fetching first results, so you can show first rows to your user quite fast, but the whole query can take longer to execute.
You should test your query on console first, to check when it starts to fetch results. Then try with JDBC and monitor the memory usage while you iterate through ResultSet. If the memory usage grows fast, check if you have opened ResultSet in forward-only and read-only mode, if necessary update driver.
If such solution is not feasible because of memory usage, you can still use cursors manually and fetch N rows (say, 100) in each query.
Cursor documentation for MSSQL: for example here: http://msdn.microsoft.com/en-us/library/ms180152.aspx

How to update Rows of a JDBC Read Only ResultSet

I'm hitting a problem when trying to update a ResultSet.
I'm querying the database via JDBC, and getting back a resultset which is not CONCUR_UPDATABLE.
I need to replace the '_' into ' ' at the specified columns. How could I do that?
String value = derivedResult.getString(column).replace("_", " ");
derivedResult.updateString(column, value);
derivedResult.updateRow();
This works fine on Updatable, but what if it's ResultSet.CONCUR_READ_ONLY?
EDIT:
This will be a JDBC driver, which calls another JDBC Drivers, my problem is i need to replace the content of the ResultSets, even if it's forward only, or Read only. If I set scroll_insensitive and updatable, there isn't a problem, but there are JDBC drivers that works with forward only resultsets.
Solutions:
Should I try to move the results to an inmemory database and replace the contents there.
Should I implement the resultset which acts like all my other classes: Calls the underlying drivers function with modifications if needed.
I don't want to use the resulst afterward to make updates or inserts. Basically this will be done on select queries.
In my experience updating the result set is only possible for simple queries (select statements on a single table). However, depending on the database, this may change. I would first consult the database documentation.
Even if you create your own resultset which would be updatable, why do you think that the database data would change? It is highly probable (almost certain) that the update mechanism uses code that is not public and only exists in the resultset instance implementation type of the jdbc driver you use.
I hope the above makes sense.

Does using Limit in query using JDBC, have any effect in performance?

If we use the Limit clause in a query which also has ORDER BY clause and execute the query in JDBC, will there be any effect in performance? (using MySQL database)
Example:
SELECT modelName from Cars ORDER BY manuDate DESC Limit 1
I read in one of the threads in this forum that, by default a set size is fetched at a time. How can I find the default fetch size?
I want only one record. Originally, I was using as follows:
SQL Query:
SELECT modelName from Cars ORDER BY manuDate DESC
In the JAVA code, I was extracting as follows:
if(resultSett.next()){
//do something here.
}
Definitely the LIMIT 1 will have a positive effect on the performance. Instead of the entire (well, depends on default fetch size) data set of mathes being returned from the DB server to the Java code, only one row will be returned. This saves a lot of network bandwidth and Java memory usage.
Always delegate as much as possible constraints like LIMIT, ORDER, WHERE, etc to the SQL language instead of doing it in the Java side. The DB will do it much better than your Java code can ever do (if the table is properly indexed, of course). You should try to write the SQL query as much as possibe that it returns exactly the information you need.
Only disadvantage of writing DB-specific SQL queries is that the SQL language is not entirely portable among different DB servers, which would require you to change the SQL queries everytime when you change of DB server. But it's in real world very rare anyway to switch to a completely different DB make. Externalizing SQL strings to XML or properties files should help a lot anyway.
There are two ways the LIMIT could speed things up:
by producing less data, which means less data gets sent over the wire and processed by the JDBC client
by potentially having MySQL itself look at fewer rows
The second one of those depends on how MySQL can produce the ordering. If you don't have an index on manuDate, MySQL will have to fetch all the rows from Cars, then order them, then give you the first one. But if there's an index on manuDate, MySQL can just look at the first entry in that index, fetch the appropriate row, and that's it. (If the index also contains modelName, MySQL doesn't even need to fetch the row after it looks at the index -- it's a covering index.)
With all that said, watch out! If manuDate isn't unique, the ordering is only partially deterministic (the order for all rows with the same manuDate is undefined), and your LIMIT 1 therefore doesn't have a single correct answer. For instance, if you switch storage engines, you might start getting different results.

Categories