fetchsize in resultset set to 0 by default

fetchsize in resultset set to 0 by default - java

I have to query a database and result set is very big. I am using MySQL as data base. To avoid the "OutOfMemoryError" after a lot of search I got two options: One using LIMIT(specific to database) and other is using jdbc fetchSize attribute.
I have tested the option 1(LIMIT) an it is working but it is not the desired solution. I do not want to do it.
Using jdbc I found out that ResultSet size is set to 0 by default. How can I change this to some other value. I tried the following:
a) First Try:
rs = preparedStatement.executeQuery();
rs.setFetchSize(1000); //Not possible as exception occurs before.
b) Second T Even if this is not there then also I need to communicate to databse multiple timry:
rs.setFetchSize(1000); //Null pointer exception(rs is null).
rs = preparedStatement.executeQuery();
c) Third Try:
preparedStatement = dbConnection.createStatement(query);
preparedStatement.setFetchSize(1000);
None of this is working. Any help appreciated!
Edit:
I do not want a solution using limit because:
a) I have millions of rows in my result set. Now doing multiple query is slow. My assumption is that database takes multiple queries like
SELECT * FROM a LIMIT 0, 1000
SELECT * FROM a LIMIT 1000, 2000
as two different queries.
b) The code is looks messy because you need to have additional counters.

The MySQL JDBC driver always fetches all rows, unless the fetch size is set to Integer.MIN_VALUE.
See the MySQL Connector/J JDBC API Implementation Notes:
By default, ResultSets are completely retrieved and stored in memory.
In most cases this is the most efficient way to operate, and due to
the design of the MySQL network protocol is easier to implement. If
you are working with ResultSets that have a large number of rows or
large values, and cannot allocate heap space in your JVM for the
memory required, you can tell the driver to stream the results back
one row at a time.
To enable this functionality, create a Statement instance in the
following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch
size of Integer.MIN_VALUE serves as a signal to the driver to stream
result sets row-by-row. After this, any result sets created with the
statement will be retrieved row-by-row.

Besides all you should change your query like
SELECT * FROM RandomStones LIMIT 1000;
Or
PreparedStatement stmt = connection.prepareStatement(qry);
stmt.setFetchSize(1000);
stmt.executeQuery();
To set the fetch size for a query, call setFetchSize() on the statement object prior to executing the query. If you set the fetch size to N, then N rows are fetched with each trip to the database.

In MySQL connector-j 5.1 implementation notes, they said there are 2 ways to handle this situation.
enter image description here
The 1st is to make the statement retrieve data row by row, the 2nd can support batch retrieve.

Related

ResultSet implementation - is it fetched as next() is called, or the results are already in memory?

Assuming that I have to go through all the entries., does anyone know how the results for ResultSet is fetched?
Can I call SELECT * FROM MyTable instead of SELECT TOP 100 * FROM MyTable ORDER BY id ASC OFFSET 0; and just call resultSet.next() as needed to fetch the results, and process them on a program level, or are the results already in memory and not putting in TOP is bad?

The ResultSet class exposes a
void setFetchSize(int rows)
method, which, per JavaDoc
Gives the JDBC driver a hint as to the number of rows that should be
fetched from the database when more rows are needed for this ResultSet
object.
That means if we have a result set of 200 rows from the database, and we set the fetch size to 100, ~100 rows will loaded from the database at a time, and two trips to the database might be required.
The default fetch size is driver dependant, but for example, Oracle set it to 10 rows.

Depends on the DB engine and JDBC driver. Generally, the IDEA behind the JDBC API is that the DB engine creates a cursor (this is also why ResultSets are resources that must be closed), and thus, you can do a SELECT * FROM someTableWithBillionsOfRows without a LIMIT, and yet it can be fast.
Whether it actually is, well, that depends. In my experience, which is primarily interacting with postgres, it IS fast (as in, cursor based with limited data transfer from DB to VM even if the query would match billions of rows), and thus your plan (select without limits, keep calling next until you have what you want and then close the resultset) should work fine.
NB: Some DB engines meet you halfway and transfer results in batches, for the best of both worlds: Latency overhead is limited (a single latency overhead is shared by batchsize results), and yet the total transfer between DB and VM is limited to only rowsize times batchsize, even if you only read a single row and then close the resultset.

Streaming MySql ResultSet with fixed number of results at a time

I have MySql table with 16 millions records, because of some migration work I'm reading whole Mysql table.
The following code is used for streaming large ResultSet in MySql
statement = connection.createStatement(
java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
statement.setFetchSize(Integer.MIN_VALUE);
but this is streaming one result at a time , does it mean we are hitting MySql server for each row
while using streaming can we set something like this statement.setFetchSize(1000);
I want to reduce number of round trips to server while streaming large ResultSet

I will assume that you are using the official MySQL provided JDBC driver Connector/J.
You are explicitly telling JDBC (and MySQL) to stream the results row-by-row with statement.setFetchSize(Integer.MIN_VALUE);
From MYSQL Docs:
By default, ResultSets are completely retrieved and stored in memory.
In most cases this is the most efficient way to operate, and due to
the design of the MySQL network protocol is easier to implement. If
you are working with ResultSets that have a large number of rows or
large values, and can not allocate heap space in your JVM for the
memory required, you can tell the driver to stream the results back
one row at a time.
To enable this functionality, you need to create a Statement instance
in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch
size of Integer.MIN_VALUE serves as a signal to the driver to stream
result sets row-by-row. After this any result sets created with the
statement will be retrieved row-by-row.
Any value other than Integer.MIN_VALUE for the fetch size is ignored by MySQL, and the standard behavior applies. The entire result set will be fetched by the JDBC driver.
Either don't use setFetchSize(), so the JDBC driver will use the default value (0), or set the value to 0 explicitly. Using the value of 0 will also ensure that JDBC doesn't use MySQL cursors, which may occur depending on your MySQL and Connector/J versions and configuration.

To fetch with a fixed number of records at a time (e.g. batches of 1000) instead of record by record streaming with the MySQL JDBC driver, you need to either:
Set useCursorFetch to true and defaultFetchSize to desired batch size in your JDBC URL e.g.
jdbc:mysql://localhost/?useCursorFetch=true&defaultFetchSize=1000
As per the MySQL driver documentation, set useCursorFetch to true in the JDBC URL and then call setFetchSize(1000) after creating the statement. See code below:
conn = DriverManager.getConnection("jdbc:mysql://localhost/?useCursorFetch=true", "user", "s3cr3t");
stmt = conn.createStatement();
stmt.setFetchSize(1000);
rs = stmt.executeQuery("SELECT * FROM your_table_here");
See official documentation for more info: https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-implementation-notes.html

Fetching bulk data into resultset

Is there any performance issue in fetching bulk data into result set in java.
rs = st.executeQuery("SELECT * from ABC")
Table ABC is having bulk amount of data (say 1 million)
What will be the performance improvement while fetching data iteratively as small chunks of 1000 rows at a time and doing operation on it.

Please, use only selected column with limit of data in mysql like:
rs = st.executeQuery("SELECT columnname1,columnname2,.. from ABC
LIMIT 10000, 30 ")

To a certain extent this depends on the type of database you are connecting to and the driver that you are using. For large datasets you should use:
Statement st = con.createStatement(ResultSet.TYPE_FORWARD_ONLY);
st.setFetchSize(<n>);
where n is a number of rows to fetch.
This will tell the driver not to pull all the data over in one lump. The fetch size is a hint to tell the driver how much to fetch if it is empty.
In general it is better to let the jdbc driver handle this rather than try and implement it yourself.
This Java Tutorial tells about as much as you'd ever want to know about cursors.

Read SQL Database in batches

I am using Java to read from a SQL RDBMS and return the results to the user. The problem is that the database table has 155 Million rows, which make the wait time really long.
I wanted to know if it is possible to retrieve results as they come from the database and present them incrementaly to the user (in batches).
My query is a simple SELECT * FROM Table_Name query.
Is there a mechanism or technology that can give me callbacks of DB records, in batches until the SELECT query finishes?
The RDBMS that is used is MS SQL Server 2008.
Thanks in advance.

Methods Statement#setFetchSize and Statement#getMoreResults are supposed to allow you to manage incremental fetches from the database. Unfortunately, this is the interface spec and vendors may or may not implement these. Memory management during a fetch is really down to the vendor (which is why I wouldn't strictly say that "JDBC just works like this").
From the JDBC documentation on Statement :
setFetchSize(int rows)
Gives the JDBC driver a hint as to the number of rows that should be
fetched from the database when more rows are needed for ResultSet
objects genrated by this Statement.
getMoreResults()
Moves to this Statement object's next result, returns true if it is a
ResultSet object, and implicitly closes any current ResultSet object(s)
obtained with the method getResultSet.
getMoreResults(int current)
Moves to this Statement object's next result, deals with any current
ResultSet object(s) according to the instructions specified by the given
flag, and returns true if the next result is a ResultSet object.
current param indicates Keep or close current ResultSet?
Also, this SO response answers about the use of setFetchSize with regards to SQLServer 2005 and how it doesn't seem to manage batched fetches. The recommendation is to test this using the 2008 driver or moreover, to use the jTDS driver (which gets thumbs up in the comments)
This response to the same SO post may also be useful as it contains a link to SQLServer driver settings on MSDN.
There's also some good info on the MS technet website but relating more to SQLServer 2005. Couldn't find the 2008 specific version in my cursory review. Anyway, it recommends creating the Statement with:
com.microsoft.sqlserver.jdbc.SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY (2004) scrollability for forward-only, read-only access, and then use the setFetchSize method to tune performance

Using pagination (LIMIT pageno, rows / TOP) might create holes and duplicates, but might be used in combination with checking the last row ID (WHERE id > ? ORDER BY id LIMIT 0, 100).
You may use TYPE_FORWARD_ONLY or FETCH_FORWARD_ONLY.

This is exactly how is JDBC driver supposed to work (I remember the bug in old PostgreSQL driver, that caused all fetched records to be stored in memory).
However, it enables you to read record when the query starts to fetch them. This is where I would start to search.
For example, Oracle optimizes SELECT * queries for fetching the whole set. It means it can take a lot of time before first results will appear. You can give hints to optimize for fetching first results, so you can show first rows to your user quite fast, but the whole query can take longer to execute.
You should test your query on console first, to check when it starts to fetch results. Then try with JDBC and monitor the memory usage while you iterate through ResultSet. If the memory usage grows fast, check if you have opened ResultSet in forward-only and read-only mode, if necessary update driver.
If such solution is not feasible because of memory usage, you can still use cursors manually and fetch N rows (say, 100) in each query.
Cursor documentation for MSSQL: for example here: http://msdn.microsoft.com/en-us/library/ms180152.aspx

Where Result set is stored while working with jdbc and oracle driver

Once I use jdbc with oracle driver and run select query is the result of the query is stored in the server of oracle memory or file system or temp table ?
and once I run the next method by getting the next row is it loaded from the oracle server memory to the jvm memory ?
And in case I define the the number of fetch size on the result set to be 1000 is this mean that the 1000 rows are loaded from the oracle to the JDBC driver on the JVM?

A default number of rows (not the entire result set) will be fetched in your local memory. Once you reach at the last line of the fetched rows (say by doing next() and try to access next row) and if there are more rows in the result, then another round-trip call will be made to the database to fetch next batch of rows.
EDIT 1:
You can see how many rows your resultset is fetching at a time by doing this (please verify the syntax):
rs.beforeFirst(); // will put cursor before the first row
rs.last(); // will put cursor after the last line
int noOfRows = rs.getRow(); // will give you the current row number
EDIT 2:
If you want to get more rows in the local memory than usual, you may consider CachedRowSet. Even this will make round-trips, but generally less than normal resultset. However, you should consider doing some performance checks for your applications.

Depending on the exact implementation, part of the resultset will be prefetched into the JVM and part of it will either be in the memory of the Oracle server, or will simply be loaded from the database when more rows are requested.
When executing a query the database does not always need to read all rows from before returning data to the client (depending on the access path of the query, ability of the optimizer, functionality of the db etc).
When you set the fetchSize() on the Statement, you are only giving a hint to the JDBC driver how much you think it should prefetch. The JDBC driver is free to ignore you. I do not know what the Oracle driver does with the fetchSize(). Most notorious AFAIK is (or maybe was) the MySQL JDBC driver which will always fetch all rows unless you set the fetchSize() to Integer.MIN_VALUE.

I got it from Oracle JDBC documentation:
Fetch size
By default, when Oracle JDBC runs a query, it retrieves a result set
of 10 rows at a time from the database cursor. This is the default
Oracle row fetch size value. You can change the number of rows
retrieved with each trip to the database cursor by changing the row
fetch size value. Standard JDBC also enables you to specify the number
of rows fetched with each database round-trip for a query, and this
number is referred to as the fetch size. In Oracle JDBC, the
row-prefetch value is used as the default fetch size in a statement
object. Setting the fetch size overrides the row-prefetch setting and
affects subsequent queries run through that statement object. Fetch
size is also used in a result set. When the statement object run a
query, the fetch size of the statement object is passed to the result
set object produced by the query. However, you can also set the fetch
size in the result set object to override the statement fetch size
that was passed to it.
https://docs.oracle.com/cd/E11882_01/java.112/e16548.pdf
p. 17-4

After you execute the query, the data is returned to the JVM. The JVM handles all the data I/O from that point.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.