Does JDBC have a maximum ResultSet size? - java

Is there a maximum number of rows that a JDBC will put into a ResultSet specifically from a Hive query? I am not talking about fetch size or paging, but the total number of rows returned in a ResultSet.
Correct me if I'm wrong, but the fetch size sets the number of rows the jdbc looks at to process with each pass in the database, inserting appropriate responses into the ResultSet. When it has gone through all the records in the table, it returns the ResultSet to the Java code. I am asking if there is a limit to the number of rows returned to the Java code.
If it doesn't have a maximum number of rows, is there anything inherent with the class that may cause some records to be trimmed off?

No, it does not work that way. JDBC is just a wrapper around native databases. But either JDBC or database cursors work the same :
you send a query (via JDBC) to the database
the database analyses the query and initializes the cursor (ResulSet in JDBC)
while you fetch data from the ResultSet, the databases software walks in the database to get new rows and populates the ResultSet
So there is no limit to the number of rows that a ResultSet can contains, nor to the number of rows that a java client program can process. The only limit comes if you try to load all the rows in memory to populate a list for example and exhaust the client application memory. But if you process rows and do not keep them in memory, there is no limit (exactly like what happens when you read a file).

Related

ResultSet implementation - is it fetched as next() is called, or the results are already in memory?

Assuming that I have to go through all the entries., does anyone know how the results for ResultSet is fetched?
Can I call SELECT * FROM MyTable instead of SELECT TOP 100 * FROM MyTable ORDER BY id ASC OFFSET 0; and just call resultSet.next() as needed to fetch the results, and process them on a program level, or are the results already in memory and not putting in TOP is bad?
The ResultSet class exposes a
void setFetchSize(int rows)
method, which, per JavaDoc
Gives the JDBC driver a hint as to the number of rows that should be
fetched from the database when more rows are needed for this ResultSet
object.
That means if we have a result set of 200 rows from the database, and we set the fetch size to 100, ~100 rows will loaded from the database at a time, and two trips to the database might be required.
The default fetch size is driver dependant, but for example, Oracle set it to 10 rows.
Depends on the DB engine and JDBC driver. Generally, the IDEA behind the JDBC API is that the DB engine creates a cursor (this is also why ResultSets are resources that must be closed), and thus, you can do a SELECT * FROM someTableWithBillionsOfRows without a LIMIT, and yet it can be fast.
Whether it actually is, well, that depends. In my experience, which is primarily interacting with postgres, it IS fast (as in, cursor based with limited data transfer from DB to VM even if the query would match billions of rows), and thus your plan (select without limits, keep calling next until you have what you want and then close the resultset) should work fine.
NB: Some DB engines meet you halfway and transfer results in batches, for the best of both worlds: Latency overhead is limited (a single latency overhead is shared by batchsize results), and yet the total transfer between DB and VM is limited to only rowsize times batchsize, even if you only read a single row and then close the resultset.

How does fetchLazy work in jooq?

How does fetchLazy work in jooq?
Is it equivalent to doing paginated select with limit and offset?
They're different.
fetchLazy()
... returns a Cursor type, which is jOOQ's equivalent of the JDBC ResultSet type. The query will fully materialise in the database, but jOOQ (JDBC) will fetch rows one-by-one. This is useful
when large result sets need to be fetched without waiting for the data transfer between server and client to finish - as opposed to a simple fetch(), which loads all rows from the server in one go.
when the client doesn't know in advance how many rows they really want to fetch from the server.
LIMIT .. OFFSET
... will reduce the number of returned rows already in the database, without them ever surfacing in the client. This can heavily improve execution speed in the server, as the server
May choose a different execution plan - e.g. using nested loops instead of hash joins for low values of LIMIT
Doesn't need to keep an open cursor for a long data transfer time, as only few rows are transferred over the wire.

Fetching bulk data into resultset

Is there any performance issue in fetching bulk data into result set in java.
rs = st.executeQuery("SELECT * from ABC")
Table ABC is having bulk amount of data (say 1 million)
What will be the performance improvement while fetching data iteratively as small chunks of 1000 rows at a time and doing operation on it.
Please, use only selected column with limit of data in mysql like:
rs = st.executeQuery("SELECT columnname1,columnname2,.. from ABC
LIMIT 10000, 30 ")
To a certain extent this depends on the type of database you are connecting to and the driver that you are using. For large datasets you should use:
Statement st = con.createStatement(ResultSet.TYPE_FORWARD_ONLY);
st.setFetchSize(<n>);
where n is a number of rows to fetch.
This will tell the driver not to pull all the data over in one lump. The fetch size is a hint to tell the driver how much to fetch if it is empty.
In general it is better to let the jdbc driver handle this rather than try and implement it yourself.
This Java Tutorial tells about as much as you'd ever want to know about cursors.

fetchsize in resultset set to 0 by default

I have to query a database and result set is very big. I am using MySQL as data base. To avoid the "OutOfMemoryError" after a lot of search I got two options: One using LIMIT(specific to database) and other is using jdbc fetchSize attribute.
I have tested the option 1(LIMIT) an it is working but it is not the desired solution. I do not want to do it.
Using jdbc I found out that ResultSet size is set to 0 by default. How can I change this to some other value. I tried the following:
a) First Try:
rs = preparedStatement.executeQuery();
rs.setFetchSize(1000); //Not possible as exception occurs before.
b) Second T Even if this is not there then also I need to communicate to databse multiple timry:
rs.setFetchSize(1000); //Null pointer exception(rs is null).
rs = preparedStatement.executeQuery();
c) Third Try:
preparedStatement = dbConnection.createStatement(query);
preparedStatement.setFetchSize(1000);
None of this is working. Any help appreciated!
Edit:
I do not want a solution using limit because:
a) I have millions of rows in my result set. Now doing multiple query is slow. My assumption is that database takes multiple queries like
SELECT * FROM a LIMIT 0, 1000
SELECT * FROM a LIMIT 1000, 2000
as two different queries.
b) The code is looks messy because you need to have additional counters.
The MySQL JDBC driver always fetches all rows, unless the fetch size is set to Integer.MIN_VALUE.
See the MySQL Connector/J JDBC API Implementation Notes:
By default, ResultSets are completely retrieved and stored in memory.
In most cases this is the most efficient way to operate, and due to
the design of the MySQL network protocol is easier to implement. If
you are working with ResultSets that have a large number of rows or
large values, and cannot allocate heap space in your JVM for the
memory required, you can tell the driver to stream the results back
one row at a time.
To enable this functionality, create a Statement instance in the
following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch
size of Integer.MIN_VALUE serves as a signal to the driver to stream
result sets row-by-row. After this, any result sets created with the
statement will be retrieved row-by-row.
Besides all you should change your query like
SELECT * FROM RandomStones LIMIT 1000;
Or
PreparedStatement stmt = connection.prepareStatement(qry);
stmt.setFetchSize(1000);
stmt.executeQuery();
To set the fetch size for a query, call setFetchSize() on the statement object prior to executing the query. If you set the fetch size to N, then N rows are fetched with each trip to the database.
In MySQL connector-j 5.1 implementation notes, they said there are 2 ways to handle this situation.
enter image description here
The 1st is to make the statement retrieve data row by row, the 2nd can support batch retrieve.

Where Result set is stored while working with jdbc and oracle driver

Once I use jdbc with oracle driver and run select query is the result of the query is stored in the server of oracle memory or file system or temp table ?
and once I run the next method by getting the next row is it loaded from the oracle server memory to the jvm memory ?
And in case I define the the number of fetch size on the result set to be 1000 is this mean that the 1000 rows are loaded from the oracle to the JDBC driver on the JVM?
A default number of rows (not the entire result set) will be fetched in your local memory. Once you reach at the last line of the fetched rows (say by doing next() and try to access next row) and if there are more rows in the result, then another round-trip call will be made to the database to fetch next batch of rows.
EDIT 1:
You can see how many rows your resultset is fetching at a time by doing this (please verify the syntax):
rs.beforeFirst(); // will put cursor before the first row
rs.last(); // will put cursor after the last line
int noOfRows = rs.getRow(); // will give you the current row number
EDIT 2:
If you want to get more rows in the local memory than usual, you may consider CachedRowSet. Even this will make round-trips, but generally less than normal resultset. However, you should consider doing some performance checks for your applications.
Depending on the exact implementation, part of the resultset will be prefetched into the JVM and part of it will either be in the memory of the Oracle server, or will simply be loaded from the database when more rows are requested.
When executing a query the database does not always need to read all rows from before returning data to the client (depending on the access path of the query, ability of the optimizer, functionality of the db etc).
When you set the fetchSize() on the Statement, you are only giving a hint to the JDBC driver how much you think it should prefetch. The JDBC driver is free to ignore you. I do not know what the Oracle driver does with the fetchSize(). Most notorious AFAIK is (or maybe was) the MySQL JDBC driver which will always fetch all rows unless you set the fetchSize() to Integer.MIN_VALUE.
I got it from Oracle JDBC documentation:
Fetch size
By default, when Oracle JDBC runs a query, it retrieves a result set
of 10 rows at a time from the database cursor. This is the default
Oracle row fetch size value. You can change the number of rows
retrieved with each trip to the database cursor by changing the row
fetch size value. Standard JDBC also enables you to specify the number
of rows fetched with each database round-trip for a query, and this
number is referred to as the fetch size. In Oracle JDBC, the
row-prefetch value is used as the default fetch size in a statement
object. Setting the fetch size overrides the row-prefetch setting and
affects subsequent queries run through that statement object. Fetch
size is also used in a result set. When the statement object run a
query, the fetch size of the statement object is passed to the result
set object produced by the query. However, you can also set the fetch
size in the result set object to override the statement fetch size
that was passed to it.
https://docs.oracle.com/cd/E11882_01/java.112/e16548.pdf
p. 17-4
After you execute the query, the data is returned to the JVM. The JVM handles all the data I/O from that point.

Categories