Whether resultset is directly connected with database - java

In Java,
Whether DB driver request to DB for each row of resultset? Means if there are total 200 rows are fetched then would there be 200 requests to database? Or the result set is stored locally not in DB?
If i populate a resultset (based on some conditions in SQL statement) and then i do some updation/changes in data in database. Whether resultset will return updated data?

You should clarify exactly what you mean by "request." It varies by driver and configuration where the data is cached. However, no driver will do 200 separate queries.
For the second question, it depends on the isolation level.

Each JDBC driver may implement their own fetching strategy. The driver will fetch a number of rows prior to you iterating through them. Thus a large result set would not pre-fetch ALL rows, but scroll through in fetch-sized batches.
You can provide the JDBC driver a hint as to how many you'd like to get.
Just as with in-progress SQL statements, you should not expect dirty reads by other transactions unless you'd really like to and set the isolation levels as such.
Most commonly, the Connection establishes the session and network connection the the DB and the 200 records of which you speak will be streamed (& multiplexed) through the network connection, with proper flow control.
Another approach is to use RowSet instead of ResultSet whereby you can be notified via events of changes to those rows that are affected.

I know that if you close your Statement,you can not iterate through the ResultSet.

Related

ResultSet implementation - is it fetched as next() is called, or the results are already in memory?

Assuming that I have to go through all the entries., does anyone know how the results for ResultSet is fetched?
Can I call SELECT * FROM MyTable instead of SELECT TOP 100 * FROM MyTable ORDER BY id ASC OFFSET 0; and just call resultSet.next() as needed to fetch the results, and process them on a program level, or are the results already in memory and not putting in TOP is bad?
The ResultSet class exposes a
void setFetchSize(int rows)
method, which, per JavaDoc
Gives the JDBC driver a hint as to the number of rows that should be
fetched from the database when more rows are needed for this ResultSet
object.
That means if we have a result set of 200 rows from the database, and we set the fetch size to 100, ~100 rows will loaded from the database at a time, and two trips to the database might be required.
The default fetch size is driver dependant, but for example, Oracle set it to 10 rows.
Depends on the DB engine and JDBC driver. Generally, the IDEA behind the JDBC API is that the DB engine creates a cursor (this is also why ResultSets are resources that must be closed), and thus, you can do a SELECT * FROM someTableWithBillionsOfRows without a LIMIT, and yet it can be fast.
Whether it actually is, well, that depends. In my experience, which is primarily interacting with postgres, it IS fast (as in, cursor based with limited data transfer from DB to VM even if the query would match billions of rows), and thus your plan (select without limits, keep calling next until you have what you want and then close the resultset) should work fine.
NB: Some DB engines meet you halfway and transfer results in batches, for the best of both worlds: Latency overhead is limited (a single latency overhead is shared by batchsize results), and yet the total transfer between DB and VM is limited to only rowsize times batchsize, even if you only read a single row and then close the resultset.

Getting Scrollable Resultsets from Oracle DB

We are working in our team pretty tight up with an Oracle DB server using jdbc. In one of our changes, I'm calling a Stored Procedure which returns me two different ResultSets. At first my implementation assumed default Scroll-ability.
After that failed, I looked it up in the Internet.
Everything I could read about it said basically the same thing: use prepareStatement or prepareCall methods with the appropriate TYPE_SCROLL_INSENSITIVE and CONCUR_READ_ONLY. None of these worked.
The Stored Procedure I use, again, return me two different result sets and they are extracted through a (ResultSet) rs.getObject("name"). Generally in examples, their ResultSet are coming back instantly from a .executeQuery.
My Question is, Do the Scrollablility/Updatability types in the prepareCall methods affecting these sort of ResultSets? if so, how do I get them?
I know that the JDBC driver can degrade my request for ScrollableResultSet. How can I tell if my ResultSet was degraded?
On that note, Why aren't ResultSets scrollable by default? What are the best practices and what is "the cost" of their flexibility?
In Oracle, a cursor is a forward-only structure. All the database knows how to do is fetch the next row (well, technically the next n rows). In order to make a ResultSet seem scrollable, you rely on the JDBC driver.
The JDBC driver has two basic approaches to making ResultSet seem scrollable. The first is to save the entire result set in memory as you fetch data just in case you want to go backwards. Functionally, that works but it has potentially catastrophic results on performance and scalability when a query potentially returns a fair amount of data. The first time some piece of code starts chewing up GB of RAM on app servers because a query returned thousands of rows that included a bunch of long comment fields, that JDBC driver will get rightly pilloried as a resource hog.
The more common approach is to for the driver to add a key to the query and to use that key to manage the data the driver caches. So, for example, the driver might keep the last 1000 rows in memory in their entirety but only cache the key for the earlier rows so it can go back and re-fetch that data later. That's more complicated to code but it also requires that the ResultSet has a unique key. Normally, that's done by trying to add a ROWID to the query. That's why, for example, the Oracle JDBC driver specifies that a scrollable or updatable ResultSet cannot use a SELECT * but can use SELECT alias.*-- the latter makes it possible for the driver to potentially be able to blindly add a ROWID column to the query.
A ResultSet coming from a stored procedure, however, is completely opaque to the driver-- it has no way of getting the query that was used to open the ResultSet so it has no way to add an additional key column or to go back and fetch the data again. If the driver wanted to make the ResultSet scrollable, it would have to go back to caching the entire ResultSet in memory. Technically, that is entirely possible to do but very few drivers will do so since it tends to lead to performance problems. It's much safer to downgrade the ResultSet. Most of the time, the application is in a better position to figure out whether it is reasonable to cache the entire ResultSet because you know it is only ever going to return a small amount of data or to be able to go back and fetch rows again by their natural key.
You can use the getType() and getConcurrency() methods on your ResultSet to determine whether your ResultSet has been downgraded by the driver.

Does MySQL use server-side pre-fetching when streaming a ResultSet

The MySQL JDBC connector defines two fetch modes:
the default one fetches the whole ResultSet at once
streaming, when the statement fetchSize is set to Integer.MIN_VALUE
According to the documentation, the streaming will fetch each row individually, one at a time.
Is it true that, when using streaming, each row is fetched in a separate database roundtrip?
Does the MySQL server prefetches the result-set in advance or does it traverse the server-side cursor one row at a time too?
I believe the short answer is yes. I don't know the nuances as it applies to a mysql_use_result/mysql_store_result, but there are a few types of prefetch:
The InnoDB storage engine underneath has read-ahead, so it will start fetching pages in advance.
Some queries do need to be materialized in full before they can be streamed row at a time (think of a sort without using an index, or a group by without loose index scan). If this happens, the temporary table will show up using the show profiles feature.
Finally, in MySQL 5.6+ the retrieve from the storage engine can be batched (BKA). This is probably the case you were hinting at, the buffer that fills is called join_buffer_size.

How does fetchLazy work in jooq?

How does fetchLazy work in jooq?
Is it equivalent to doing paginated select with limit and offset?
They're different.
fetchLazy()
... returns a Cursor type, which is jOOQ's equivalent of the JDBC ResultSet type. The query will fully materialise in the database, but jOOQ (JDBC) will fetch rows one-by-one. This is useful
when large result sets need to be fetched without waiting for the data transfer between server and client to finish - as opposed to a simple fetch(), which loads all rows from the server in one go.
when the client doesn't know in advance how many rows they really want to fetch from the server.
LIMIT .. OFFSET
... will reduce the number of returned rows already in the database, without them ever surfacing in the client. This can heavily improve execution speed in the server, as the server
May choose a different execution plan - e.g. using nested loops instead of hash joins for low values of LIMIT
Doesn't need to keep an open cursor for a long data transfer time, as only few rows are transferred over the wire.

Read SQL Database in batches

I am using Java to read from a SQL RDBMS and return the results to the user. The problem is that the database table has 155 Million rows, which make the wait time really long.
I wanted to know if it is possible to retrieve results as they come from the database and present them incrementaly to the user (in batches).
My query is a simple SELECT * FROM Table_Name query.
Is there a mechanism or technology that can give me callbacks of DB records, in batches until the SELECT query finishes?
The RDBMS that is used is MS SQL Server 2008.
Thanks in advance.
Methods Statement#setFetchSize and Statement#getMoreResults are supposed to allow you to manage incremental fetches from the database. Unfortunately, this is the interface spec and vendors may or may not implement these. Memory management during a fetch is really down to the vendor (which is why I wouldn't strictly say that "JDBC just works like this").
From the JDBC documentation on Statement :
setFetchSize(int rows)
Gives the JDBC driver a hint as to the number of rows that should be
fetched from the database when more rows are needed for ResultSet
objects genrated by this Statement.
getMoreResults()
Moves to this Statement object's next result, returns true if it is a
ResultSet object, and implicitly closes any current ResultSet object(s)
obtained with the method getResultSet.
getMoreResults(int current)
Moves to this Statement object's next result, deals with any current
ResultSet object(s) according to the instructions specified by the given
flag, and returns true if the next result is a ResultSet object.
current param indicates Keep or close current ResultSet?
Also, this SO response answers about the use of setFetchSize with regards to SQLServer 2005 and how it doesn't seem to manage batched fetches. The recommendation is to test this using the 2008 driver or moreover, to use the jTDS driver (which gets thumbs up in the comments)
This response to the same SO post may also be useful as it contains a link to SQLServer driver settings on MSDN.
There's also some good info on the MS technet website but relating more to SQLServer 2005. Couldn't find the 2008 specific version in my cursory review. Anyway, it recommends creating the Statement with:
com.microsoft.sqlserver.jdbc.SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY (2004) scrollability for forward-only, read-only access, and then use the setFetchSize method to tune performance
Using pagination (LIMIT pageno, rows / TOP) might create holes and duplicates, but might be used in combination with checking the last row ID (WHERE id > ? ORDER BY id LIMIT 0, 100).
You may use TYPE_FORWARD_ONLY or FETCH_FORWARD_ONLY.
This is exactly how is JDBC driver supposed to work (I remember the bug in old PostgreSQL driver, that caused all fetched records to be stored in memory).
However, it enables you to read record when the query starts to fetch them. This is where I would start to search.
For example, Oracle optimizes SELECT * queries for fetching the whole set. It means it can take a lot of time before first results will appear. You can give hints to optimize for fetching first results, so you can show first rows to your user quite fast, but the whole query can take longer to execute.
You should test your query on console first, to check when it starts to fetch results. Then try with JDBC and monitor the memory usage while you iterate through ResultSet. If the memory usage grows fast, check if you have opened ResultSet in forward-only and read-only mode, if necessary update driver.
If such solution is not feasible because of memory usage, you can still use cursors manually and fetch N rows (say, 100) in each query.
Cursor documentation for MSSQL: for example here: http://msdn.microsoft.com/en-us/library/ms180152.aspx

Categories