Stream result of JDBC query with JDBCTemplate

Stream result of JDBC query with JDBCTemplate - java

I'm using spring java.
I need to return a Stream of objects from the database query (I use ObjectMapper to map them to JSON).
The query results can be quite large (over 500k objects) so I don't want to store them in memory.
I've done it with JpaRepository.
I'd like to know how it can be done with JdbcTemplate and if there would be an advantage to do so?
I.e ... can we optimize the throughput and memory usage using JdbcTemplate or perhaps even other libraries.
My goal is really at the end to find the optimal way to run a query and print all objects to an output stream in terms of memory/time/processing.

Yes, there will be an advantage with streams since it is a common abstraction for processing data without all of it being in memory. E.g. passing the stream to a HTTP response.
Spring 5.3
If you use Spring 5.3, there is a convenient method JdbcTemplate.queryForStream() that can be used like this:
String sql = "select * from table";
Stream<Person > stream = jdbcTemplate.queryForStream(sql, (resultSet, rowNum) -> {
return new Person(resultSet.getInt(1), resultSet.getString(2));
});
Previous Spring versions
Older versions of JDBCTemplate does not have functionality for streams directly. But you can make a stream by using the underlying database connection:
String sql = "select * from table";
Connection connection = jdbcTemplate.getDataSource().getConnection();
PreparedStatement statement = connection.prepareStatement(sql);
ResultSet resultSet = statement.executeQuery();
PersonMapper personMapper = new PersonMapper();
Spliterator<Person> spliterator =
Spliterators.spliteratorUnknownSize(
new Iterator<Person>() {
#Override public boolean hasNext() {
try {
return !resultSet.isAfterLast();
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
#Override public Person next() {
try {
if (resultSet.isBeforeFirst()) {
resultSet.next();
}
Person result = new Person(resultSet.getInt(1), resultSet.getString(2));
resultSet.next();
return result;
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
},
Spliterator.IMMUTABLE);
Runnable closer = () -> {
try {
resultSet.close();
statement.close();
connection.close();
} catch (SQLException e) {
throw new RuntimeException(e);
}
};
Stream<Person> = StreamSupport.stream(spliterator, false).onClose(closer);

Look at all the query() methods of JdbcTemplate.
The ones that has a return type of void obviously don't build up the full data set in memory. They actually all take a RowCallbackHandler:
An interface used by JdbcTemplate for processing rows of a ResultSet on a per-row basis.
The processRow() callback method can then add the data to the streaming JSON text, ensuring minimal memory use.
That of course assumes that the JDBC driver doesn't load the entire ResultSet into memory, but that's a different issue, so solution (if needed) entirely depends on the JDBC driver in use.

Related

Java PreparedStatement java.lang.OutOfMemoryError: GC overhead limit exceeded

I know similar questions to this have been asked many times before, but even having tried many of the solutions given, I'm still seeing this problem.
Our application allows tech users to create parameterised raw SQL querys to extract data from the DB which is downloaded to an excel spreadsheet.
For smaller datasets this works fine, however, when the file size starts approaching 10mb+ I start hitting this issue.
The datasets could potentially be 100k rows or 80-90mb in size. I don't want to increase the JVM heap size if possible.
Hopefully there is a glaring error in my code that I haven't spotted. The resultSet.next() loop seems to be the source of the issue. Is there a more efficient way to write this to stop gobbling heap space?
Any help much appreciated. Thanks
/*
*
* query is a raw sql query that takes parameters (using Mybatis)
* criteriaMap the arguments that we subsitute into the query
*
*/
public List<Map<String, Object>> queryForJsonWithoutMapping(final String query, final Map<String, Object> criteriaMap){
SqlSession sqlSession = getSqlSessionInstance();
String sql = "";
Connection connection = null;
PreparedStatement pstmt = null;
ResultSet resultSet = null;
try {
final Configuration configuration = getSqlSessionInstance().getConfiguration();
SqlSourceBuilder builder = new SqlSourceBuilder(configuration);
SqlSource src = builder.parse(query, Map.class, null);
BoundSql boundSql = src.getBoundSql(criteriaMap);
sql = boundSql.getSql();
List<ParameterMapping> parameterMappings = boundSql.getParameterMappings();
connection = sqlSession.getConnection();
pstmt = connection.prepareStatement(sql, java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
// this function subs the params into the preparedStatement query
buildParams(parameterMappings, criteriaMap, pstmt);
resultSet = pstmt.executeQuery();
// the while loop inside this function is where things start to hang
List<Map<String, Object>> results = getObjectFromResultSet(resultSet);
return results;
} catch (Exception e) {
LOG.error(e.getMessage(), e);
LOG.error(ExceptionUtils.getStackTrace(e));
throw new IllegalStateException(sql + " " + e.getMessage(), e);
} finally {
try{
connection.close();
pstmt.close();
resultSet.close();
}catch (SQLException e){
e.printStackTrace();
}
sqlSession.close();
}
private List<Map<String, ?>> getEntitiesFromResultSet(ResultSet resultSet) throws SQLException {
ArrayList<Map<String, ?>> entities = new ArrayList<>(resultSet.getFetchSize());
int index = 0;
Map<String, Object> jsonObject;
while (resultSet.next()) {
jsonObject = getEntityFromResultSet(resultSet);
entities.add(index, jsonObject);
index ++;
}
resultSet.close();
return entities;
}
private List<Map<String, Object>> getObjectFromResultSet(ResultSet resultSet) throws SQLException {
ArrayList<Map<String, Object>> entities = new ArrayList<>(resultSet.getFetchSize());
int index = 0;
Map<String, Object> jsonObject;
while (resultSet.next()) {
jsonObject = getEntityFromResultSet(resultSet);
entities.add(index, jsonObject);
index ++;
}
resultSet.close();
return entities;
}
DB is oracle

Getting and processing all rows from a DB table in one go is a bad idea. You need to implement generic idea of Pagination i.e. you read and process one page (n = page-size rows) at a time.
Your page size should be optimal enough that you don't make too many DB hits and at the same time not to have too many records in memory.
JdbcPagingItemReader of Spring Batch API implements this concept.
Refer this SO Question to get more ideas on pagination with JDBC.
In addition to that, you shouldn't keep increasing the size of your Map results. You need to flush this map in cycles.
Hope this helps !!

In such a design, you will inevitable run out of memory at some point if the result of the query returns large amount of data because you're loading the entire ResultSet in memory. Instead you could simply state that you getXXXFromResultSet APIs have a threshold in terms of amount of data. For every row you calculate its size and decide if you can add it to your JSON doc. If you've passed the threshold you stop there and close the ResultSet (which will cancel the execution on the server). Another option would involve streaming the results but that's more complex.

RowMapper/ResultsetExtractor in spring

I am working on spring framework, below is the SQL query to execute and get the result.
String sql = "SELECT DESCRIPTION FROM ATWData WHERE SNO = '980098' ";
In spring framework we have RowMapper and ResultsetExtractor call back interfaces, which one is best to use to execute the above mentioned SQL query and get the single record. Below code gets the result using RowMapper and ResultSetExtractor.
code using ResultSetExtractor:
try
{
description = jdbcTemplate.query(sql,
new ResultSetExtractor<String>()
{
public String extractData(final ResultSet rs) throws SQLException
{
if (rs.next())
{
String description = null;
description = rs.getString("DESCRIPTION");
return description;
}
return null;
}
});
}
catch (final DataAccessException e)
{
LOG.error("Error " + e);
}
below code uses RowMapper to get the result:
try
{
description = jdbcTemplate2.query(sql,
new RowMapper<String>()
{
public String mapRow(final ResultSet rs, final int rowNum) throws SQLException
{
String description = (String)rs.getString("DESCRIPTION"));
return description;
}
});
}
catch (final DataAccessException e)
{
LOG.error("Error " + e);
}
I have read the documentation about those methods but confused to choose one in my existing scenario. Please suggest which code(RowMapper/ResultSetExtractor) best suits to execute the simple SQL query given above.

In your case, either is fine. RowMappers are designed to convert a row in the result set into a model in your application. A ResultSetExtractor is slightly more generic and allows you to operate on the ResultSet in its entirety rather than just over a single row. It simply depends on what you need to do with the results. In the case of extracting a single string, they both work. Also see this question for more information. Your question may be marked as a duplicate.

Don't use either.
But first, and more importantly:
Don't build a SQL statement using string concatenation, unless you want to leave yourself open to SQL Injection attacks, allowing hackers to steal your data and/or corrupt or delete your database.
Don't catch the DataAccessException and continue execution as if everything is fine. The caller needs to be made aware that query failed, and will most likely want to rollback any pending transaction.
Since your query is a single row/single column query with parameter, use queryForObject(String sql, Class<T> requiredType, Object... args).
This is what you should do:
String sno = "980098";
String sql = "SELECT DESCRIPTION FROM ATWData WHERE SNO = ?";
String description = jdbcTemplate2.queryForObject(sql, String.class, sno);
It will throw EmptyResultDataAccessException if no row is found. If a missing row is not considered an error, catch the exception and return null (or Optional.empty()).

Is it Ok to Pass ResultSet?

In my situation, I am querying a database for a specific return (in this case registration information based on a username).
//Build SQL String and Query Database.
if(formValid){
try {
SQL = "SELECT * FROM users WHERE username=? AND email=?";
Collections.addAll(fields, username, email);
results = services.DataService.getData(SQL, fields);
if (!results.next()){
errMessages.add("User account not found.");
} else {
user = new User();
user.fillUser(results); //Is it ok to pass ResultSet Around?
}
} catch (SQLException e) {
e.printStackTrace();
} finally {
services.DataService.closeDataObjects(); //Does this close the ResultSet I passed to fillUser?
}
}
So once I query the database, if a result is found I create a new User object and populate it with the data I received from the database. I used to do all of this directly in the method that I was pulling the resultset into, but I realized I was doing a lot of redundant coding throughout my project so I moved it all into one central method that lives in the actual User bean.
public void fillUser(ResultSet data) throws SQLException{
setUserId(data.getInt("id"));
setFirstName(data.getString("first_name"));
setLastName(data.getString("last_name"));
setUsername(data.getString("username"));
setType(data.getString("type"));
setEmail(data.getString("email"));
}
I have done a few tests and from what I can determine, because I close the original resultset in the finally block of the query, the resultset that I pass into the fillUser method also gets closed. Or am I wrong and am I seriously leaking data? This is actually the second time I pass a resultset (so its two instances of one) because the block I use to query my database is
public static ResultSet getData(String SQL, ArrayList fields) throws SQLException {
try{
connection = Database.getConnection();
preparedStatement = connection.prepareStatement(SQL);
for(int i=0; i<fields.size(); i++){
Integer num = i + 1;
Object item = fields.get(i);
if(item instanceof String){
preparedStatement.setString(num, (String) item); //Array item is String.
} else if (item instanceof Integer){
preparedStatement.setInt(num, (Integer) item); //Array item is Integer.
}
}
resultSet = preparedStatement.executeQuery();
return resultSet;
}finally{
}
}
All of these code snippets live in separate classes and are reused multiple times throughout my project. Is it ok to pass a resultset around like this, or should I be attempting another method? My goal is to reduce the codes redundancy, but i'm not sure if i'm going about it in a legal manner.

Technically, it's OK to pass result sets, as long as you are not serializing and passing it to a different JVM, and your JDBC connection and statement are still open.
However, it's probably a better software engineer and programming practice to have DB access layer that returns you the result set in a Java encoded way (a list of User in your example). That way, your code would be cleaner and you won't have to worry if the ResultSet is already opened, or you have to scroll it to the top, you name it...

As everyone before me said its a bad idea to pass the result set. If you are using Connection pool library like c3p0 then you can safely user CachedRowSet and its implementation CachedRowSetImpl. Using this you can close the connection. It will only use connection when required. Here is snippet from the java doc:
A CachedRowSet object is a disconnected rowset, which means that it makes use of a connection to its data source only briefly. It connects to its data source while it is reading data to populate itself with rows and again while it is propagating changes back to its underlying data source. The rest of the time, a CachedRowSet object is disconnected, including while its data is being modified. Being disconnected makes a RowSet object much leaner and therefore much easier to pass to another component. For example, a disconnected RowSet object can be serialized and passed over the wire to a thin client such as a personal digital assistant (PDA).
Here is the code snippet for querying and returning ResultSet:
public ResultSet getContent(String queryStr) {
Connection conn = null;
Statement stmt = null;
ResultSet resultSet = null;
CachedRowSetImpl crs = null;
try {
Connection conn = dataSource.getConnection();
stmt = conn.createStatement();
resultSet = stmt.executeQuery(queryStr);
crs = new CachedRowSetImpl();
crs.populate(resultSet);
} catch (SQLException e) {
throw new IllegalStateException("Unable to execute query: " + queryStr, e);
}finally {
try {
if (resultSet != null) {
resultSet.close();
}
if (stmt != null) {
stmt.close();
}
if (conn != null) {
conn.close();
}
} catch (SQLException e) {
LOGGER.error("Ignored", e);
}
}
return crs;
}
Here is the snippet for creating data source using c3p0:
ComboPooledDataSource cpds = new ComboPooledDataSource();
try {
cpds.setDriverClass("<driver class>"); //loads the jdbc driver
} catch (PropertyVetoException e) {
e.printStackTrace();
return;
}
cpds.setJdbcUrl("jdbc:<url>");
cpds.setMinPoolSize(5);
cpds.setAcquireIncrement(5);
cpds.setMaxPoolSize(20);
javax.sql.DataSource dataSource = cpds;

Is there any tool or technique to identify opened ResultSet

In the context of a java application using SQLIte to persist data I am using the Zentus JDBC driver. Thus I am using the java.sql package to acces my database.
I am facing some strange (in a an environment with several Connection objects on the same database) issues and I am pretty sure my problems come from non closed ResultSet.
Is there any tool or technique allowing me to spot where to look in my source code to find these non closed objects ?
Edit May be using AspectJ ??

It seems like an aspect may be helpful.
How about wrapping the methods which return a result set in an aspect. Something like:
execution(public java.sql.ResultSet+ java.sql.Statement+.*(..))
Another aspect can monitor the close method on ResultSets. Perhaps:
execution(public * java.sql.ResultSet.close())
The first aspect would, on the return of every ResultSet, create a new Exception object and store it in a static Map somewhere using the hash of the ResultSet as the key. The second aspect, on the closing of the result set, would remove the Exception from the Map using the same hashcode as a key. At any time, the map should have one exception instance for every open ResultSet. From the exception you can obtain a stack trace to see where the ResultSet was opened.
You could perhaps store a larger object which includes an exception and some other contextual information; time that the ResultSet was created, etc.

A practical suggestion is to add some debug code and "log" creation and closing of resultsets to a csv file. Later on you could examine this file and check, if there's a "close" entry for each "create".
So, assuming you have a utility class with static methods that allows writing Strings to a file, you can do it like this:
ResultSet rs = stmt.executeQuery(query);
Util.writeln(rs.hashcode() + ";create"); // add this line whenever a
// new ResultSet is created
and
rs.close();
Util.writeln(rs.hashcode() + ";closed"); // add this line whenever a
// ResultSet is closed
Open the csv file with Excel or any other spread sheet program, sort the table and look if result sets are not closed. If this is the case, add more debug information to clearly identify the open sets.
BTW - Wrapping the interfaces (like JAMon) is pretty easy, if you have eclipse or something else, its coded in less then 15 Minutes. You'd need to wrap Connection, Statement (and PreparedStatement?) and ResultSet, the ResultSet wrapper could be instrumented to track and monitor creation and closing of result sets:
public MonitoredConnection implements Connection {
Connection wrappedConnection = null;
public MonitoredConnection(Connection wrappedConnection) {
this.wrappedConnection = wrappedConnection;
}
// ... implement interface methods and delegate to the wrappedConnection
#Override
public Statement createStatement() {
// we need MonitoredStatements because later we want MonitoredResultSets
return new MonitoredStatement(wrappedConnection.createStatemet());
}
// ...
}
The same for MonitoredStatement and MonitoredResultSet (MonitoredStatement will return wrapped ResultSets):
public MonitoredStatement implements Statement {
private Statement wrappedStatement = null;
#Override
public ResultSet executeQuery(String sql) throws SQLException
MonitoredResultSet rs = wrappedStatement.executeQuery(sql);
ResultSetMonitor.create(rs.getWrappedResultSet()); // some static utility class/method
return rs;
}
// ...
}
and
public MonitoredResultSet implements ResultSet {
private ResultSet wrappedResultSet;
#Override
public void close() {
wrappedResultSet.close();
ResultSetMonitor.close(wrappedResultSet); // some static utility class/method
}
// ...
}
At the end, you should only need to modify a single line in your code:
Connection con = DriverManager.getConnection(ur);
to
Connection con = new MonitoredConnection(DriverManager.getConnection(ur));

A Google Search pointed me directly to JAMon. It allows you to also monitor JDBC connections and cursors.
Personally, I would check the code and make sure that all Statement, PreparedStatement and ResultSet are closed when not needed. Even when using Connection Pooling, only JDBC Connection are returned into the pool and statements and ResultSet are closed.
This example shows how I achieve closing ResultSet and PreparedStatement in the finally close (for guarantee):
PreparedStatement ps = null;
ResultSet rs = null;
UserRequest request = null;
try {
ps = getConnection().prepareStatement(SQL_RETRIEVE);
ps.setLong(1, id);
rs = ps.executeQuery();
if (rs != null && rs.next()) {
request = mapEntity(rs);
}
} catch (SQLException e) {
// TODO Auto-generated catch block
throw new DAOException(e);
} finally {
try {
close(rs, ps);
} catch (SQLException e) {
// TODO Auto-generated catch block
logger.error("Error closing statement or resultset.", e);
}
}
That's my 2 cents worth...hope it helps you.

It should be relatively simple to instrument your code with AOP of your choice. I was using AspectWerkz number of years ago to do load-time weaving of web app and collecting performance related statistics. Also if you're using IOC framework, such as Spring it's very easy to wrap your DataSources and trace calls to getConnection() etc.

Why do I need a connection to create PreparedStatements?

I would like to use prepared statements, for many reasons.
But, I would like to create a method that looks like this:
/* This opens a connection, executes the query, and closes the connection */
public static void executeNonQuery(String queryString);
In other words, I want my application logic to only have to formulate the queries and feed in parameters, but not deal with connections & statements. However, PreparedStatements are created from a connection object, so I am currently forced into preparing the query string using String.format() - butt ugly and dangerous.
Is there a way to do what I want without using String.format()?

Why do I need a connection to create PreparedStatements ?
Because the statements are prepared on per-connection basis in most RDBMS's.
Prepared statements are in fact cached execution plans that don't take you permissions, encodings, collation settings etc. into account.
All this is done during query parsing.
Is there a way to do what I want without using String.format()
Don't see why you need String.format() here.
You can implement your query as a class, create a connection and prepare the query in the class constructor and then execute it in a method.
A parametrized query typically looks like this:
SELECT *
FROM table
WHERE col1 = ?
AND col2 = ?
, where the bound parameters will be substituted for ?'s during the query execution.
If you want a static method:
Create a static connection handle.
Create a static hash table of prepared queries using the parametrized query text as a key, and the handle to the prepared query as a value.
Whenever you want to execute a query, find its handle (or create it if it wasn't found) and use to to bind the parameters and execute the query.

Why not have your "application" logic use a data layer which you create which can present that kind of interface method?
Your data layer can then handle creating connections, preparing statements, etc., all within that executeNonQuery method.
I think that if you are attempting to merge the parameters in your query/statement yourself into a String, then you are shooting yourself in the foot and actually not using the parameter functionality of PreparedStatements. Not sure why you would want to do this.
You might also want to look into using an API such as Spring, which has a series of JdbcTemplate classes that can abstract all of the connection handling away from you, but still allow you to work with parameters in a Map.

You probably want something like the DbUtils package in the Apache Commons libraries: [http://commons.apache.org/dbutils/index.html][1]
The QueryRunner class lets you execute sql statements without having to manually create PreparedStatements, or even have an open connection for that matter. From the examples page:
QueryRunner run = new QueryRunner( dataSource );
try
{
// Create an object array to hold the values to insert
Object[] insertParams = {"John Doe", new Double( 1.82 )};
// Execute the SQL update statement and return the number of
// inserts that were made
int inserts = run.update( "INSERT INTO Person (name,height) VALUES (?,?)",
insertParams );
// Now it's time to rise to the occation...
Object[] updateParams = {new Double( 2.05 ), "John Doe"};
int updates = run.update( "UPDATE Person SET height=? WHERE name=?",
updateParams );
}
catch(SQLException sqle) {
// Handle it
}
So it basically handles the creation of prepared statements transparently, and the only thing you really need to know is a DataSource. This also works just as well for non-update/insert statements, i.e. plain-vanilla select queries, and the ability to create ResultSetHandlers gives you the power to convert a ResultSet into something like a fully-prepared bean, or a Map with the keys being the column names, and the values being the actual row values. Very useful for when you can't implement a whole ORM solution.

I abstract out all of the JDBC stuff by having a class I call QueryRunner that has an execute method that takes the sql, a List of objects that represent the parameters, and an object that will process the ResultSet. If you use the setObject method from JDBC to set your parameters it will figure out the appropriate DB types to use based on the underlying object. Here is a portion of my code. I've got another method that wraps this one and get's the connection.
public void executeNoCommit(Connection conn,
String sql,
List params,
ResultSetProcessor processor) throws SQLException {
PreparedStatement stmt = null;
ResultSet rs = null;
int updateCount = 0;
Iterator it;
int paramIndex = 1;
boolean query;
try {
stmt = conn.prepareStatement(sql);
if (params != null) {
it = params.iterator();
while (it.hasNext()) {
stmt.setObject(paramIndex, it.next());
paramIndex++;
}
}
query = stmt.execute();
if (query) {
rs = stmt.getResultSet();
}
else {
updateCount = stmt.getUpdateCount();
}
processor.process(rs, updateCount);
}
finally {
if (rs != null) {
try {
rs.close();
}
catch (SQLException e) {
log.error(e);
}
}
if (stmt != null) {
try {
stmt.close();
}
catch (SQLException e) {
log.error(e);
}
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Stream result of JDBC query with JDBCTemplate - java

Related

Java PreparedStatement java.lang.OutOfMemoryError: GC overhead limit exceeded

RowMapper/ResultsetExtractor in spring

Is it Ok to Pass ResultSet?

Is there any tool or technique to identify opened ResultSet

Why do I need a connection to create PreparedStatements?

Categories

Resources