Streaming MySql ResultSet with fixed number of results at a time

Streaming MySql ResultSet with fixed number of results at a time - java

I have MySql table with 16 millions records, because of some migration work I'm reading whole Mysql table.
The following code is used for streaming large ResultSet in MySql
statement = connection.createStatement(
java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
statement.setFetchSize(Integer.MIN_VALUE);
but this is streaming one result at a time , does it mean we are hitting MySql server for each row
while using streaming can we set something like this statement.setFetchSize(1000);
I want to reduce number of round trips to server while streaming large ResultSet

I will assume that you are using the official MySQL provided JDBC driver Connector/J.
You are explicitly telling JDBC (and MySQL) to stream the results row-by-row with statement.setFetchSize(Integer.MIN_VALUE);
From MYSQL Docs:
By default, ResultSets are completely retrieved and stored in memory.
In most cases this is the most efficient way to operate, and due to
the design of the MySQL network protocol is easier to implement. If
you are working with ResultSets that have a large number of rows or
large values, and can not allocate heap space in your JVM for the
memory required, you can tell the driver to stream the results back
one row at a time.
To enable this functionality, you need to create a Statement instance
in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch
size of Integer.MIN_VALUE serves as a signal to the driver to stream
result sets row-by-row. After this any result sets created with the
statement will be retrieved row-by-row.
Any value other than Integer.MIN_VALUE for the fetch size is ignored by MySQL, and the standard behavior applies. The entire result set will be fetched by the JDBC driver.
Either don't use setFetchSize(), so the JDBC driver will use the default value (0), or set the value to 0 explicitly. Using the value of 0 will also ensure that JDBC doesn't use MySQL cursors, which may occur depending on your MySQL and Connector/J versions and configuration.

To fetch with a fixed number of records at a time (e.g. batches of 1000) instead of record by record streaming with the MySQL JDBC driver, you need to either:
Set useCursorFetch to true and defaultFetchSize to desired batch size in your JDBC URL e.g.
jdbc:mysql://localhost/?useCursorFetch=true&defaultFetchSize=1000
As per the MySQL driver documentation, set useCursorFetch to true in the JDBC URL and then call setFetchSize(1000) after creating the statement. See code below:
conn = DriverManager.getConnection("jdbc:mysql://localhost/?useCursorFetch=true", "user", "s3cr3t");
stmt = conn.createStatement();
stmt.setFetchSize(1000);
rs = stmt.executeQuery("SELECT * FROM your_table_here");
See official documentation for more info: https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-implementation-notes.html

Related

Server side cursor support in MySQL JDBC

MySQL Connector-J Documentation (here) mentions two ways in which the JDBC retrieves results from the MySQL database. One is the default operation, in which the entire result set is loaded into memory and made accessible in the code. The second is row by row streaming.
I would like to know whether the latest versions of MySQL/MySQL JDBC support server side cursors. Specifically, I would like to know whether the options useCursorFetch=True and defaultFetchSize>0 can be used to ensure that the result set is retrieved from the database in batches of certain size (fetch size). MySQL describes server side cursors in its C API (here), and I would like to know whether similar support is there with MySQL JDBC.
If this support exists, what are the constraints of such an operation? I understand that a temporary table would be created in the server's memory from which results would be fetched. But what are the other things to look out for (such as table/row locks, restrictions on update/insertions, and result set/connection closing)?

The most recent version of the documentation you linked to has this note:
By default, ResultSets are completely retrieved and stored in memory. [...] you can tell the driver to stream the results back one row at a time.
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
This sounds like what you're looking for.

How to improve the select query performance in Java ? Is there a way to do select query using multi threading?

I'm need to do a select query from a queue I have created in data base , The view has about 30,00,000 rows and is taking about 2 minutes to read this data and storing in local memory using JTDS JDBC driver . The order in which I read data does not matter . Right now I'm simply using a prepared statement and reading from a result set . Is there a better way to read from data base ?
I'm reading from MS SQL server .
The way I'm reading right now is
public ResultSet getData(String view_name) throws SQLException {
String SQL="select * from "+view_name;
PreparedStatement stmt=conn.prepareStatement(SQL);
resultSet= stmt.executeQuery();
resultSet.setFetchSize(8000);
return resultSet;
}

As you already know the performance of the application will degrade as soon as the size of available memory starts decreasing which will return in more frequent GC cycles.
Is there a better way to read from data base ?
Did you try streaming of ResultSet and the use of Adaptive Buffering? What is adaptive response buffering and why should I use it?
Adaptive buffering is designed to retrieve any kind of large-value data without the overhead of server cursors. The application can execute a SELECT statement that produces more rows than the application can store in memory. Adaptive buffering provides the ability to do a forward-only read-only pass of an arbitrarily large result set without requiring a server cursor.
When large values are read once by using the get<Type>Stream methods, and the ResultSet columns and the CallableStatement OUT parameters are accessed in the order returned by the SQL Server, adaptive buffering minimizes the application memory usage when processing the results.
You can have a look at this MSDN library post for more info on Using Adaptive Buffering.
Not sure about MS SQL, but in MySQL streaming of resultset can be enable as below.
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
Hope this helps you.

fetchsize in resultset set to 0 by default

I have to query a database and result set is very big. I am using MySQL as data base. To avoid the "OutOfMemoryError" after a lot of search I got two options: One using LIMIT(specific to database) and other is using jdbc fetchSize attribute.
I have tested the option 1(LIMIT) an it is working but it is not the desired solution. I do not want to do it.
Using jdbc I found out that ResultSet size is set to 0 by default. How can I change this to some other value. I tried the following:
a) First Try:
rs = preparedStatement.executeQuery();
rs.setFetchSize(1000); //Not possible as exception occurs before.
b) Second T Even if this is not there then also I need to communicate to databse multiple timry:
rs.setFetchSize(1000); //Null pointer exception(rs is null).
rs = preparedStatement.executeQuery();
c) Third Try:
preparedStatement = dbConnection.createStatement(query);
preparedStatement.setFetchSize(1000);
None of this is working. Any help appreciated!
Edit:
I do not want a solution using limit because:
a) I have millions of rows in my result set. Now doing multiple query is slow. My assumption is that database takes multiple queries like
SELECT * FROM a LIMIT 0, 1000
SELECT * FROM a LIMIT 1000, 2000
as two different queries.
b) The code is looks messy because you need to have additional counters.

The MySQL JDBC driver always fetches all rows, unless the fetch size is set to Integer.MIN_VALUE.
See the MySQL Connector/J JDBC API Implementation Notes:
By default, ResultSets are completely retrieved and stored in memory.
In most cases this is the most efficient way to operate, and due to
the design of the MySQL network protocol is easier to implement. If
you are working with ResultSets that have a large number of rows or
large values, and cannot allocate heap space in your JVM for the
memory required, you can tell the driver to stream the results back
one row at a time.
To enable this functionality, create a Statement instance in the
following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch
size of Integer.MIN_VALUE serves as a signal to the driver to stream
result sets row-by-row. After this, any result sets created with the
statement will be retrieved row-by-row.

Besides all you should change your query like
SELECT * FROM RandomStones LIMIT 1000;
Or
PreparedStatement stmt = connection.prepareStatement(qry);
stmt.setFetchSize(1000);
stmt.executeQuery();
To set the fetch size for a query, call setFetchSize() on the statement object prior to executing the query. If you set the fetch size to N, then N rows are fetched with each trip to the database.

In MySQL connector-j 5.1 implementation notes, they said there are 2 ways to handle this situation.
enter image description here
The 1st is to make the statement retrieve data row by row, the 2nd can support batch retrieve.

Read SQL Database in batches

I am using Java to read from a SQL RDBMS and return the results to the user. The problem is that the database table has 155 Million rows, which make the wait time really long.
I wanted to know if it is possible to retrieve results as they come from the database and present them incrementaly to the user (in batches).
My query is a simple SELECT * FROM Table_Name query.
Is there a mechanism or technology that can give me callbacks of DB records, in batches until the SELECT query finishes?
The RDBMS that is used is MS SQL Server 2008.
Thanks in advance.

Methods Statement#setFetchSize and Statement#getMoreResults are supposed to allow you to manage incremental fetches from the database. Unfortunately, this is the interface spec and vendors may or may not implement these. Memory management during a fetch is really down to the vendor (which is why I wouldn't strictly say that "JDBC just works like this").
From the JDBC documentation on Statement :
setFetchSize(int rows)
Gives the JDBC driver a hint as to the number of rows that should be
fetched from the database when more rows are needed for ResultSet
objects genrated by this Statement.
getMoreResults()
Moves to this Statement object's next result, returns true if it is a
ResultSet object, and implicitly closes any current ResultSet object(s)
obtained with the method getResultSet.
getMoreResults(int current)
Moves to this Statement object's next result, deals with any current
ResultSet object(s) according to the instructions specified by the given
flag, and returns true if the next result is a ResultSet object.
current param indicates Keep or close current ResultSet?
Also, this SO response answers about the use of setFetchSize with regards to SQLServer 2005 and how it doesn't seem to manage batched fetches. The recommendation is to test this using the 2008 driver or moreover, to use the jTDS driver (which gets thumbs up in the comments)
This response to the same SO post may also be useful as it contains a link to SQLServer driver settings on MSDN.
There's also some good info on the MS technet website but relating more to SQLServer 2005. Couldn't find the 2008 specific version in my cursory review. Anyway, it recommends creating the Statement with:
com.microsoft.sqlserver.jdbc.SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY (2004) scrollability for forward-only, read-only access, and then use the setFetchSize method to tune performance

Using pagination (LIMIT pageno, rows / TOP) might create holes and duplicates, but might be used in combination with checking the last row ID (WHERE id > ? ORDER BY id LIMIT 0, 100).
You may use TYPE_FORWARD_ONLY or FETCH_FORWARD_ONLY.

This is exactly how is JDBC driver supposed to work (I remember the bug in old PostgreSQL driver, that caused all fetched records to be stored in memory).
However, it enables you to read record when the query starts to fetch them. This is where I would start to search.
For example, Oracle optimizes SELECT * queries for fetching the whole set. It means it can take a lot of time before first results will appear. You can give hints to optimize for fetching first results, so you can show first rows to your user quite fast, but the whole query can take longer to execute.
You should test your query on console first, to check when it starts to fetch results. Then try with JDBC and monitor the memory usage while you iterate through ResultSet. If the memory usage grows fast, check if you have opened ResultSet in forward-only and read-only mode, if necessary update driver.
If such solution is not feasible because of memory usage, you can still use cursors manually and fetch N rows (say, 100) in each query.
Cursor documentation for MSSQL: for example here: http://msdn.microsoft.com/en-us/library/ms180152.aspx

Where Result set is stored while working with jdbc and oracle driver

Once I use jdbc with oracle driver and run select query is the result of the query is stored in the server of oracle memory or file system or temp table ?
and once I run the next method by getting the next row is it loaded from the oracle server memory to the jvm memory ?
And in case I define the the number of fetch size on the result set to be 1000 is this mean that the 1000 rows are loaded from the oracle to the JDBC driver on the JVM?

A default number of rows (not the entire result set) will be fetched in your local memory. Once you reach at the last line of the fetched rows (say by doing next() and try to access next row) and if there are more rows in the result, then another round-trip call will be made to the database to fetch next batch of rows.
EDIT 1:
You can see how many rows your resultset is fetching at a time by doing this (please verify the syntax):
rs.beforeFirst(); // will put cursor before the first row
rs.last(); // will put cursor after the last line
int noOfRows = rs.getRow(); // will give you the current row number
EDIT 2:
If you want to get more rows in the local memory than usual, you may consider CachedRowSet. Even this will make round-trips, but generally less than normal resultset. However, you should consider doing some performance checks for your applications.

Depending on the exact implementation, part of the resultset will be prefetched into the JVM and part of it will either be in the memory of the Oracle server, or will simply be loaded from the database when more rows are requested.
When executing a query the database does not always need to read all rows from before returning data to the client (depending on the access path of the query, ability of the optimizer, functionality of the db etc).
When you set the fetchSize() on the Statement, you are only giving a hint to the JDBC driver how much you think it should prefetch. The JDBC driver is free to ignore you. I do not know what the Oracle driver does with the fetchSize(). Most notorious AFAIK is (or maybe was) the MySQL JDBC driver which will always fetch all rows unless you set the fetchSize() to Integer.MIN_VALUE.

I got it from Oracle JDBC documentation:
Fetch size
By default, when Oracle JDBC runs a query, it retrieves a result set
of 10 rows at a time from the database cursor. This is the default
Oracle row fetch size value. You can change the number of rows
retrieved with each trip to the database cursor by changing the row
fetch size value. Standard JDBC also enables you to specify the number
of rows fetched with each database round-trip for a query, and this
number is referred to as the fetch size. In Oracle JDBC, the
row-prefetch value is used as the default fetch size in a statement
object. Setting the fetch size overrides the row-prefetch setting and
affects subsequent queries run through that statement object. Fetch
size is also used in a result set. When the statement object run a
query, the fetch size of the statement object is passed to the result
set object produced by the query. However, you can also set the fetch
size in the result set object to override the statement fetch size
that was passed to it.
https://docs.oracle.com/cd/E11882_01/java.112/e16548.pdf
p. 17-4

After you execute the query, the data is returned to the JVM. The JVM handles all the data I/O from that point.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.