Is there any performance issue in fetching bulk data into result set in java.
rs = st.executeQuery("SELECT * from ABC")
Table ABC is having bulk amount of data (say 1 million)
What will be the performance improvement while fetching data iteratively as small chunks of 1000 rows at a time and doing operation on it.
Please, use only selected column with limit of data in mysql like:
rs = st.executeQuery("SELECT columnname1,columnname2,.. from ABC
LIMIT 10000, 30 ")
To a certain extent this depends on the type of database you are connecting to and the driver that you are using. For large datasets you should use:
Statement st = con.createStatement(ResultSet.TYPE_FORWARD_ONLY);
st.setFetchSize(<n>);
where n is a number of rows to fetch.
This will tell the driver not to pull all the data over in one lump. The fetch size is a hint to tell the driver how much to fetch if it is empty.
In general it is better to let the jdbc driver handle this rather than try and implement it yourself.
This Java Tutorial tells about as much as you'd ever want to know about cursors.
Related
Assuming that I have to go through all the entries., does anyone know how the results for ResultSet is fetched?
Can I call SELECT * FROM MyTable instead of SELECT TOP 100 * FROM MyTable ORDER BY id ASC OFFSET 0; and just call resultSet.next() as needed to fetch the results, and process them on a program level, or are the results already in memory and not putting in TOP is bad?
The ResultSet class exposes a
void setFetchSize(int rows)
method, which, per JavaDoc
Gives the JDBC driver a hint as to the number of rows that should be
fetched from the database when more rows are needed for this ResultSet
object.
That means if we have a result set of 200 rows from the database, and we set the fetch size to 100, ~100 rows will loaded from the database at a time, and two trips to the database might be required.
The default fetch size is driver dependant, but for example, Oracle set it to 10 rows.
Depends on the DB engine and JDBC driver. Generally, the IDEA behind the JDBC API is that the DB engine creates a cursor (this is also why ResultSets are resources that must be closed), and thus, you can do a SELECT * FROM someTableWithBillionsOfRows without a LIMIT, and yet it can be fast.
Whether it actually is, well, that depends. In my experience, which is primarily interacting with postgres, it IS fast (as in, cursor based with limited data transfer from DB to VM even if the query would match billions of rows), and thus your plan (select without limits, keep calling next until you have what you want and then close the resultset) should work fine.
NB: Some DB engines meet you halfway and transfer results in batches, for the best of both worlds: Latency overhead is limited (a single latency overhead is shared by batchsize results), and yet the total transfer between DB and VM is limited to only rowsize times batchsize, even if you only read a single row and then close the resultset.
I'm need to do a select query from a queue I have created in data base , The view has about 30,00,000 rows and is taking about 2 minutes to read this data and storing in local memory using JTDS JDBC driver . The order in which I read data does not matter . Right now I'm simply using a prepared statement and reading from a result set . Is there a better way to read from data base ?
I'm reading from MS SQL server .
The way I'm reading right now is
public ResultSet getData(String view_name) throws SQLException {
String SQL="select * from "+view_name;
PreparedStatement stmt=conn.prepareStatement(SQL);
resultSet= stmt.executeQuery();
resultSet.setFetchSize(8000);
return resultSet;
}
As you already know the performance of the application will degrade as soon as the size of available memory starts decreasing which will return in more frequent GC cycles.
Is there a better way to read from data base ?
Did you try streaming of ResultSet and the use of Adaptive Buffering? What is adaptive response buffering and why should I use it?
Adaptive buffering is designed to retrieve any kind of large-value data without the overhead of server cursors. The application can execute a SELECT statement that produces more rows than the application can store in memory. Adaptive buffering provides the ability to do a forward-only read-only pass of an arbitrarily large result set without requiring a server cursor.
When large values are read once by using the get<Type>Stream methods, and the ResultSet columns and the CallableStatement OUT parameters are accessed in the order returned by the SQL Server, adaptive buffering minimizes the application memory usage when processing the results.
You can have a look at this MSDN library post for more info on Using Adaptive Buffering.
Not sure about MS SQL, but in MySQL streaming of resultset can be enable as below.
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
Hope this helps you.
Is there a maximum number of rows that a JDBC will put into a ResultSet specifically from a Hive query? I am not talking about fetch size or paging, but the total number of rows returned in a ResultSet.
Correct me if I'm wrong, but the fetch size sets the number of rows the jdbc looks at to process with each pass in the database, inserting appropriate responses into the ResultSet. When it has gone through all the records in the table, it returns the ResultSet to the Java code. I am asking if there is a limit to the number of rows returned to the Java code.
If it doesn't have a maximum number of rows, is there anything inherent with the class that may cause some records to be trimmed off?
No, it does not work that way. JDBC is just a wrapper around native databases. But either JDBC or database cursors work the same :
you send a query (via JDBC) to the database
the database analyses the query and initializes the cursor (ResulSet in JDBC)
while you fetch data from the ResultSet, the databases software walks in the database to get new rows and populates the ResultSet
So there is no limit to the number of rows that a ResultSet can contains, nor to the number of rows that a java client program can process. The only limit comes if you try to load all the rows in memory to populate a list for example and exhaust the client application memory. But if you process rows and do not keep them in memory, there is no limit (exactly like what happens when you read a file).
I have to query a database and result set is very big. I am using MySQL as data base. To avoid the "OutOfMemoryError" after a lot of search I got two options: One using LIMIT(specific to database) and other is using jdbc fetchSize attribute.
I have tested the option 1(LIMIT) an it is working but it is not the desired solution. I do not want to do it.
Using jdbc I found out that ResultSet size is set to 0 by default. How can I change this to some other value. I tried the following:
a) First Try:
rs = preparedStatement.executeQuery();
rs.setFetchSize(1000); //Not possible as exception occurs before.
b) Second T Even if this is not there then also I need to communicate to databse multiple timry:
rs.setFetchSize(1000); //Null pointer exception(rs is null).
rs = preparedStatement.executeQuery();
c) Third Try:
preparedStatement = dbConnection.createStatement(query);
preparedStatement.setFetchSize(1000);
None of this is working. Any help appreciated!
Edit:
I do not want a solution using limit because:
a) I have millions of rows in my result set. Now doing multiple query is slow. My assumption is that database takes multiple queries like
SELECT * FROM a LIMIT 0, 1000
SELECT * FROM a LIMIT 1000, 2000
as two different queries.
b) The code is looks messy because you need to have additional counters.
The MySQL JDBC driver always fetches all rows, unless the fetch size is set to Integer.MIN_VALUE.
See the MySQL Connector/J JDBC API Implementation Notes:
By default, ResultSets are completely retrieved and stored in memory.
In most cases this is the most efficient way to operate, and due to
the design of the MySQL network protocol is easier to implement. If
you are working with ResultSets that have a large number of rows or
large values, and cannot allocate heap space in your JVM for the
memory required, you can tell the driver to stream the results back
one row at a time.
To enable this functionality, create a Statement instance in the
following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch
size of Integer.MIN_VALUE serves as a signal to the driver to stream
result sets row-by-row. After this, any result sets created with the
statement will be retrieved row-by-row.
Besides all you should change your query like
SELECT * FROM RandomStones LIMIT 1000;
Or
PreparedStatement stmt = connection.prepareStatement(qry);
stmt.setFetchSize(1000);
stmt.executeQuery();
To set the fetch size for a query, call setFetchSize() on the statement object prior to executing the query. If you set the fetch size to N, then N rows are fetched with each trip to the database.
In MySQL connector-j 5.1 implementation notes, they said there are 2 ways to handle this situation.
enter image description here
The 1st is to make the statement retrieve data row by row, the 2nd can support batch retrieve.
Can I limit the number of rows in result set?
My table contains some 800000 rows, if I fetch them in result set, this will definitely lead to OOM exception. each row has 40 columns.
I do not want to want work on them at the same time, but each row is to be filtered out for some data.
Thank you in advance.
Something like following should be a SQL solution but albeit rather ineffective, since each time you will have to fetch the increasing amount of rows.
Assuming that you have your ORDER BY is based on unique int and
that you will be fetching 1000 rows at a time.
SET currenttop = 0;
SET cuurentid = 0;
SELECT * FROM YourTable t
WHERE t1.ID > #currentID AND (#currentid := t1.ID) IS NOT NULL;
LIMIT (#currenttop:=#currenttop+1000);
Of course you can choose to handle variable from your Java code.
You could use JDBC fetch size to limit the result in the result set. It is better than the SQL LIMIT as it will work for other database as well without changing the query. Jdbc diriver will not read the whole result from the database. Each time it will retrieve the records specified by the fetch size and there will be no memory issue anymore.
You can use limit keyword with sql query, see following
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
You can read more about using limit here
Cheers !!
You have a couple options. First, add a limit to the SQL query. Second, you could use JDBCTemplate.query() with a RowCallbackHandler to process one row at a time. The template will handle memory issues with a large result set.