I have a table with millions of records in it. In order to make the system faster, I need to implement the pagination concept in my Java code. I need to fetch just 1000 records at a time and process them, then pick another 1000 records and do my processing and so on. I have already tried a few things and none of them is working. Some of the things I tried are listed below -
1) String query = "select * from TABLENAME" + " WHERE ROWNUM BETWEEN %d AND %d";
sql = String.format(query, firstrow, firstrow + rowcount);
In the above example, when the query is SELECT * from TABLENAME Where ROWNUM BETWEEN 0 and 10 it gives me a result but when the query is SELECT * from TABLENAME Where ROWNUM BETWEEN 10 and 20, it returns an empty result set. I even tried to run it in the DB, it return Empty result set (not sure why!!)
2) preparedStatement.setFetchSize(100); I have that in my Java code, but it still fetches all the records from the table. Adding this statement didnt affect my code in anyway.
Please help!
It sounds like you are not actually needed to paginate the results but to just process the results in batches. If this is the case then all you need to do is set the fetch size to 1000 using setFetchSize and iterate over the resultset as usual (using resultset.next()) and process the results as you iterate. There are many resources describing setFetchSize and what it does. Do some research:
What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?
How JDBC Statement.SetFetchsize exaclty works
What and when should I specify setFetchSize()?
For oracle pagination there are a lot of resources describing how to do this. Just do a web search. Here are a couple of resources that describe how to do it:
http://www.databasestar.com/limit-the-number-of-rows-in-oracle/
http://ocptechnology.com/how-to-use-row-limiting-clause/
Pagination is not very useful if you do not define a consistent ordering (ORDER BY clause) since you cannot rely on the order they are returned.
This answer explains why your BETWEEN statement is not working: https://stackoverflow.com/a/10318244/908961
From the answer if using oracle older than 12c you need to do a sub select to get your results. Something like:
SELECT c.*
FROM (SELECT c.*, ROWNUM as rnum
FROM (SELECT * FROM TABLENAME ORDER BY id) c) c
WHERE c.rnum BETWEEN %d AND %d
If you are using Oracle 12c or greater I would recommend using the newer OFFSET FETCH syntax instead of fiddling with rownum. See the first link above or
http://www.toadworld.com/platforms/oracle/b/weblog/archive/2016/01/23/oracle-12c-enhanced-syntax-for-row-limiting-a-k-a-top-n-queries
So your query would be something like
String query = "select * from TABLENAME OFFSET %d ROWS FETCH NEXT 1000 ONLY";
String.format(query, firstrow);
or using prepared statements
PreparedStatement statement = con.prepareStatement("select * from TABLENAME OFFSET ? ROWS FETCH NEXT 1000 ONLY");
statement.setInt(1, firstrow);
ResultSet rs = statement.executeQuery();
Alternately you can also use the limit keyword as described here http://docs.oracle.com/javadb/10.10.1.2/ref/rrefjdbclimit.html and your query would be something like
String query = "select * from TABLENAME { LIMIT 1000 OFFSET %d }";
String.format(query, firstrow);
The normal way to implement pagination in Oracle is to use an analytic windowing function, e.g. row_number together with an ORDER BY clause that defines the row ordering. The query with the analytic function is then wrapped into an inline view (or a "window"), from which you can query the row numbers you need. Here's an example that queries the first 1000 rows from my_table (ordering by column_to_sort_by):
select rs.* from
(select t.*,
row_number() over (order by column_to_sort_by) as row_num
from my_table t
) rs
where rs.row_num >= 1 and rs.row_num < 1001
order by rs.row_num
A JDBC implementation could then look like the following:
public void queryWithPagination() throws SQLException {
String query = "select rs.* from"
+ " (select t.*,"
+ " row_number() over (order by column_to_sort_by) as row_num"
+ " from my_table t"
+ " ) rs"
+ " where rs.row_num >= ? and rs.row_num < ?"
+ " order by rs.row_num";
final int pageSize = 1000;
int rowIndex = 1;
try (PreparedStatement ps = myConnection.prepareStatement(query)) {
do {
ps.setInt(1, rowIndex);
ps.setInt(2, rowIndex + pageSize);
rowIndex += pageSize;
} while (handleResultSet(ps, pageSize));
}
}
private boolean handleResultSet(PreparedStatement ps, final int pageSize)
throws SQLException {
int rows = 0;
try (ResultSet rs = ps.executeQuery()) {
while (rs.next()) {
/*
* handle rows here
*/
rows++;
}
}
return rows == pageSize;
}
Note that the table should remain unchanged while you're reading it so that the pagination works correctly across different query executions.
If there are so many rows in the table that you're running out of memory, you probably need to purge/serialize your list after some pages have been read.
EDIT:
If the ordering of rows doesn't matter to you at all, then -- as #bdrx mentions in his answer -- you don't probably need pagination, and the fastest solution would be to query the table without a WHERE condition in the SELECT. As suggested, you can adjust the fetch size of the statement to a larger value to improve throughput.
Related
I have an application in Java requiring me to find specific records given specific conditionals. For example, I have the table:
id
song
artist
record_label
1
Never Gonna Give You Up
Rick Astley
Rickroll'd Records
2
Blackbird
The Beatles
Apple Records
3
Yesterday
The Beatles
Apple Records
4
WonderWall
Oasis
Columbia Records
I'd like to bulk query a subset of them based on specific conditions. Something similar to:
SELECT id FROM songs
WHERE
(song = 'Blackbird' AND artist = 'The Beatles' AND record_label = 'Apple Records') OR
(song = 'WonderWall' AND artist = 'Oasis' AND record_label = 'Columbia Records') OR
(song = 'Yesterday' AND artist = 'The Beatles' AND record_label = 'Apple Records')
The application is going to receive these conditions from the user and could be trying to find thousands of these records. As a result, I'm hoping to find a way to do this without any case of SQL injection and in as little queries as possible.
MY first approach would be some flavor of PreparedStatement where I iterate through this SQL query to query each individual record:
SELECT id from songs WHERE song = ? AND artist = ? AND record_label = ?
This prevents SQL injection, but I feel like this could be optimized more as we hammer the DB with thousands of these requests in seconds.
Another option is to create a temp table, import our passed conditions into the temp table and do an INNER JOIN on the songs table to only retrieve the rows that match between the 2. This solves both problems, but it requires a good amount of development work.
I'm wondering if there's any other methods I haven't taken into account. Thanks in advance for any suggestions!
One way I can think of, is to pass the parameters as a JSON string, then you can have a single parameter:
SELECT id
FROM songs
WHERE (song, artist, record_label)
in (select item ->> 'song',
item ->> 'artist',
item ->> 'record_label'
from jsonb_array_elements(cast(? as jsonb)) as p(item)
);
The parameter would then be a String passed through PreparedStatement.setString().
For your sample query e.g.
[
{"song": "Blackbird", "artist": "The Beatles", "record_label": "Apple Records"},
{"song": "Wonderwall", "artist": "Oasis", "record_label": "Columbia Records"},
{"song": "Yesterday", "artist": "The Beatles", "record_label": "Apple Records"}
]
Not sure about performance, but the OR condition is usually a performance killer to begin with, so the small overhead of parsing and unnesting the JSON array shouldn't make a big difference.
A simple test show how Spring Boot does not batch the query:
myRepo.findAllById(IntStream
.range(0, 1000000)
.mapToObj(i -> UUID.randomUUID())
.collect(toList()));
producing a big query
... where myRepo0_.id in (? , ? , ?, ...
and failing with big ranges.
The default saveAll JPA implementation neither use batching.
Probably you should test the speed of perform one by one queries:
List<MyResult> myResults = myQueryParams
.stream()
.map(qp -> myRepo.findByMyParams(qp...))
.collect(toList());
If is too slow, check the query is optimal and only if the speed is really important (e.g. if the transaction is too long for a HTTP request you can do it asynchronously, paging, ...) then use batching.
To batching queries you can create a temporary (only for your specific query, it will not lock db objects):
long t0 = System.currentTimeMillis();
try(Connection cnx = primaryDataSource.getConnection()) {
cnx.setAutoCommit(false);
cnx.createStatement().execute("create temp table resultSet(id uuid)");
PreparedStatement s = cnx.prepareStatement("insert into resultSet(id) select id from tq_event where id = ?");
for(int i = 0; i < 1000000; i++) {
s.setObject(1, UUID.fromString("39907bfb-f77a-47a3-9ab6-2b4794c7d6ec"));
s.addBatch();
}
s.executeBatch();
cnx.commit();
ResultSet rs = cnx.createStatement().executeQuery("select id from resultSet");
while(rs.next())
System.out.printf("%s%n", rs.getString(1));
cnx.createStatement().execute("drop table resultSet");
cnx.commit();
}
System.out.printf("Time: %d mS%n", System.currentTimeMillis() - t0);
running from my PC querying via ssh to a cloud PostgreSQL database (very slow connection), query 1.000.000 rows take Time: 179640 mS (avg 0.18mS per row).
You can try the #a_horse_with_no_name solution:
PreparedStatement s = cnx.prepareStatement("select uid from test where (id) in (select (k ->> 'id')::integer from jsonb_array_elements(cast(? as jsonb)) as p(k))");
s.setString(1, "[" + IntStream.range(1, 1000000).mapToObj(i -> "{\"id\": " + i + "}").collect(joining(",")) + "]");
ResultSet rs = s.executeQuery();
while(rs.next())
System.out.printf("%s%n", rs.getString(1));
witch run much more faster (only one query is sent to the server) but casting datatypes could be a problem and you will have a limit of 2Gbytes! for all your parameters (looks good for many cases).
I have a program in scala that connects to an oracle database using ojdbc, queries a table, and tries to insert records from the java.sql.resultSet into another table on a separate jdbc connection.
//conn1 to oracle: java.sql.Connection = oracle.jdbc.driver.T4CConnection#698122b2
//conn2 to another non-oracle database: java.sql.Connection = com.snowflake.client.jdbc.SnowflakeConnectionV1#6e4566f1
My attempt at capturing results from an oracle table:
val stmt1 = conn1.createStatement()
stmt1.setFetchSize(3000)
val sql1 = "select userid from nex.users"
val result = stmt1.executeQuery(sql1)
and code for attempting to insert records from result to a separate database and table via jdbc:
val insert_sql = "insert into test.users (userid) values (?)"
val ps = conn2.prepareStatement(insert_sql)
val batchSize = 3000
var count = 0
while (result.next) {
ps.setInt(1, result.getInt(1))
ps.addBatch()
count += 1
if (count % batchSize == 0) ps.executeBatch()
}
What's stumping me is this is almost the exact same syntax in many examples of using jdbc, but in my second table, I'm seeing 4x the original number of rows from the first table.
select userid, count(*) from test.users group by userid
1 4
2 4
3 4
4 4
5 4
6 4
etc
Yes, clearBatch is missing.
executeBatch() calls clearBatch() in the end.
But there is no guarantee for that will be exactly the same in other implementations.
Also, if needed, I am making a minor-subtle addition to tchoedak's answer :)
ps.executeBatch();
conn2.commit();
ps.clearBatch();
The issue was that I needed to execute ps.clearBatch() after every execute, otherwise the next batch would get piled on top of the previous batch. When trying this on a large table that would need to call executeBatch more often, the amount of duplicate rows were x times higher. The final code looks similar but with ps.clearBatch() .
val ps = conn2.prepareStatement(insert_sql)
val batchSize = 3000
var count = 0
while (result.next) {
ps.setInt(1, result.getInt(1))
ps.addBatch()
count += 1
if (count % batchSize == 0)
ps.executeBatch()
ps.clearBatch()
}
I have a query using various joins, and I just need the list of columns which are returned by this query. I done it in java, by asking only one row with rownum=1 and getting column name for value.The problem is if there is no data returned by that query.
For ex.
select * from something
and if there is any data returning by this query then it will return col1,col2,col3.
But if there is no data returned by this query, then it will throw error.
What I need is
Is there any way that I can run
desc (select * from something)
or similar to get list of columns returned by query.
It can be in sql or JAVA. Both methods are acceptable.
In my application, user passes the query, and I can add wrapper to the query but I cant modify it totally.
The flow of application is
query from user -> execute by java and get one row -> return list of columns in the result set.
you can use ResultSetMetaData of resultset
for example :
ResultSet rs = stmt.executeQuery("SELECT a, b, c FROM TABLE2");
ResultSetMetaData rsmd = rs.getMetaData();
int countOfColumns = rsmd.getColumnCount();
for(int i = 1; i <= countOfColumns ; i++ )
System.out.println(rsmd.getColumnName(i));
you could maybe convert your query to a view, you can then see the columns in the view by querying user_tab_columns
select * from user_tab_columns
The Oracle equivalent for information_schema.COLUMNS is USER_TAB_COLS for tables owned by the current user, ALL_TAB_COLS or DBA_TAB_COLS for tables owned by all users.
Tablespace is not equivalent to a schema, neither do you have to provide the tablespace name.
Providing the schema/username would be of use if you want to query ALL_TAB_COLS or DBA_TAB_COLS for columns OF tables owned by a specific user. in your case, I'd imagine the query would look something like:
String sqlStr= "
SELECT column_name
FROM all_tab_cols
WHERE table_name = 'users'
AND owner = ' || +_db+ || '
AND column_name NOT IN ( 'password', 'version', 'id' )
"
Note that with this approach, you risk SQL injection.
I am currently trying to use a DECLARE clause in a preparedStatent with jdbc. The code that I wrote is:
statement.executeUpdate(" declare #variable int set #variable = "+timer+" INSERT INTO table1 values (ip, protocol, counter, timer) SELECT ip,protocol,counter,#variable FROM table2 ORDER BY counter DESC LIMIT 5 OFFSET 0 ;");
What I'm trying to get is to create a new table (that is table1) which includes the top 5 from table2 (every 5 secs e.g), with a predefined interval. The interval is the timer variable. The timer variable is passed through a method.
Note: I don't know if it makes any difference to use preparedStatement. I tried both.
Assuming you need to create a new table from a select, then you should use this query instead:
CREATE TABLE table1 SELECT ip,protocol,counter,#variable FROM table2 ORDER BY counter DESC LIMIT 5 OFFSET 0
But if you do this in Java and using PreparedStatement then you can pass the value of #variable as a parameter, thus getting rid of the previous query. So, your query will look like this in Java code:
String sql =
"CREATE TABLE table1"
+ " SELECT ip,protocol,counter,?"
+ " FROM table2"
+ " ORDER BY counter DESC"
+ " LIMIT 5 OFFSET 0";
Assuming you already have the table table1 created and you're just adding the latest results into it from table2, then the query will look like this:
INSERT INTO table1 values (ip, protocol, counter, timer) SELECT ip,protocol,counter,#variable FROM table2 ORDER BY counter DESC LIMIT 5 OFFSET 0
Again, you can pass the value of #variable as a parameter. The query will look like this in Java code:
String sql =
"INSERT INTO table1 (ip, protocol, counter, timer)"
+ " SELECT ip,protocol,counter,?"
+ " FROM table2"
+ " ORDER BY counter DESC"
+ " LIMIT 5 OFFSET 0";
Then, you will prepare the query like this:
PreparedStatement pstmt = con.prepareStatement(sql);
//setting your variable as the parameter in the query
pstmt.setString(1, timer);
In the end, you will use PreparedStatement#execute or PreparedStatement#executeUpdate:
//the former query is a DDL query
pstmt.execute();
//the latter query is a DML query
pstmt.executeUpdate();
SOLVED (See answer below.)
I did not understand my problem within the proper context. The real issue was that my query was returning multiple ResultSet objects, and I had never come across that before. I have posted code below that solves the problem.
PROBLEM
I have an SQL Server database table with many thousand rows. My goal is to pull the data back from the source database and write it to a second database. Because of application memory constraints, I will not be able to pull the data back all at once. Also, because of this particular table's schema (over which I have no control) there is no good way for me to tick off the rows using some sort of ID column.
A gentleman over at the Database Administrators StackExchange helped me out by putting together something called a database API cursor, and basically wrote this complicated query that I only need to drop my statement into. When I run the query in SQL Management Studio (SSMS) it works great. I get all the data back, a thousand rows at a time.
Unfortunately, when I try to translate this into JDBC code, I get back the first thousand rows only.
QUESTION
Is it possible using JDBC to retrieve a database API cursor, pull the first set of rows from it, allow the cursor to advance, and then pull the subsequent sets one at a time? (In this case, a thousand rows at a time.)
SQL CODE
This gets complicated, so I'm going to break it up.
The actual query can be simple or complicated. It doesn't matter. I've tried several different queries during my experimentation and they all work. You just basically drop it into the the SQL code in the appropriate place. So, let's take this simple statement as our query:
SELECT MyColumn FROM MyTable;
The actual SQL database API cursor is far more complicated. I will print it out below. You can see the above query buried in it:
-- http://dba.stackexchange.com/a/82806
DECLARE #cur INTEGER
,
-- FAST_FORWARD | AUTO_FETCH | AUTO_CLOSE
#scrollopt INTEGER = 16 | 8192 | 16384
,
-- READ_ONLY, CHECK_ACCEPTED_OPTS, READ_ONLY_ACCEPTABLE
#ccopt INTEGER = 1 | 32768 | 65536
,#rowcount INTEGER = 1000
,#rc INTEGER;
-- Open the cursor and return the first 1,000 rows
EXECUTE #rc = sys.sp_cursoropen #cur OUTPUT
,'SELECT MyColumn FROM MyTable'
,#scrollopt OUTPUT
,#ccopt OUTPUT
,#rowcount OUTPUT;
IF #rc <> 16 -- FastForward cursor automatically closed
BEGIN
-- Name the cursor so we can use CURSOR_STATUS
EXECUTE sys.sp_cursoroption #cur
,2
,'MyCursorName';
-- Until the cursor auto-closes
WHILE CURSOR_STATUS('global', 'MyCursorName') = 1
BEGIN
EXECUTE sys.sp_cursorfetch #cur
,2
,0
,1000;
END;
END;
As I've said, the above creates a cursor in the database and asks the database to execute the statement, keep track (internally) of the data it's returning, and return the data a thousand rows at a time. It works great.
JDBC CODE
Here's where I'm having the problem. I have no compilation problems or run-time problems with my Java code. The problem I am having is that it returns only the first thousand rows. I don't understand how to utilize the database cursor properly. I have tried variations on the Java basics:
// Hoping to get all of the data, but I only get the first thousand.
ResultSet rs = stmt.executeQuery(fq.getQuery());
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
}
I'm not surprised by the results, but all of the variations I've tried produce the same results.
From my research it seems like the JDBC does something with database cursors when the database is Oracle, but you have to set the data type returned in the result set as an Oracle cursor object. I'm guessing there is something similar with SQL Server, but I have been unable to find anything yet.
Does anyone know of a way?
I'm including example Java code in full (as ugly as that gets).
// FancyQuery.java
import java.sql.*;
public class FancyQuery {
// Adapted from http://dba.stackexchange.com/a/82806
String query = "DECLARE #cur INTEGER\n"
+ " ,\n"
+ " -- FAST_FORWARD | AUTO_FETCH | AUTO_CLOSE\n"
+ " #scrollopt INTEGER = 16 | 8192 | 16384\n"
+ " ,\n"
+ " -- READ_ONLY, CHECK_ACCEPTED_OPTS, READ_ONLY_ACCEPTABLE\n"
+ " #ccopt INTEGER = 1 | 32768 | 65536\n"
+ " ,#rowcount INTEGER = 1000\n"
+ " ,#rc INTEGER;\n"
+ "\n"
+ "-- Open the cursor and return the first 1,000 rows\n"
+ "EXECUTE #rc = sys.sp_cursoropen #cur OUTPUT\n"
+ " ,'SELECT MyColumn FROM MyTable;'\n"
+ " ,#scrollopt OUTPUT\n"
+ " ,#ccopt OUTPUT\n"
+ " ,#rowcount OUTPUT;\n"
+ " \n"
+ "IF #rc <> 16 -- FastForward cursor automatically closed\n"
+ "BEGIN\n"
+ " -- Name the cursor so we can use CURSOR_STATUS\n"
+ " EXECUTE sys.sp_cursoroption #cur\n"
+ " ,2\n"
+ " ,'MyCursorName';\n"
+ "\n"
+ " -- Until the cursor auto-closes\n"
+ " WHILE CURSOR_STATUS('global', 'MyCursorName') = 1\n"
+ " BEGIN\n"
+ " EXECUTE sys.sp_cursorfetch #cur\n"
+ " ,2\n"
+ " ,0\n"
+ " ,1000;\n"
+ " END;\n"
+ "END;\n";
public String getQuery() {
return this.query;
}
public static void main(String[ ] args) throws Exception {
String dbUrl = "jdbc:sqlserver://tc-sqlserver:1433;database=MyBigDatabase";
String user = "mario";
String password = "p#ssw0rd";
String driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";
FancyQuery fq = new FancyQuery();
Class.forName(driver);
Connection conn = DriverManager.getConnection(dbUrl, user, password);
Statement stmt = conn.createStatement();
// We expect to get 1,000 rows at a time.
ResultSet rs = stmt.executeQuery(fq.getQuery());
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
}
// Alas, we've only gotten 1,000 rows, total.
rs.close();
stmt.close();
conn.close();
}
}
I figured it out.
stmt.execute(fq.getQuery());
ResultSet rs = null;
for (;;) {
rs = stmt.getResultSet();
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
}
if ((stmt.getMoreResults() == false) && (stmt.getUpdateCount() == -1)) {
break;
}
}
if (rs != null) {
rs.close();
}
After some additional googling, I found a bit of code posted back in 2004:
http://www.coderanch.com/t/300865/JDBC/databases/SQL-Server-JDBC-Registering-cursor
The gentleman who posted the snippet that I found helpful (Julian Kennedy) suggested: "Read the Javadoc for getUpdateCount() and getMoreResults() for a clear understanding." I was able to piece it together from that.
Basically, I don't think I understood my problem well enough at the outset in order to phrase it correctly. What it comes down to is that my query will be returning the data in multiple ResultSet instances. What I needed was a way to not merely iterate through each row in a ResultSet but, rather, iterate through the entire set of ResultSets. That's what the code above does.
If you want all records from the table, just do "Select * from table".
The only reason to retrieve in chunks is if there is some intermediate place for the data: e.g. if you are showing it on the screen, or storing it in memory.
If you are simply reading from one and inserting to another, just read everything from the first.You will not get any better performance by trying to retrieve in batches. If there is a difference, it will be negative. Frame your query in a way that brings back everything. The JDBC software will handle all the other breaking-up and reconstituting that you need.
However, you should batch the update/insert side of things.
The set-up would create two statements on the two connections:
Statement stmt = null;
ResultSet rs = null;
PreparedStatement insStmt = null;
stmt = conDb1.createStatement();
insStmt = conDb2.prepareStament("insert into tgt_db2_table (?,?,?,?,?......etc. ?,?) ");
rs = stmt.executeQuery("select * from src_db1_table");
Then, loop over the select as normal, but use batching on the target.
int batchedRecordCount = 0;
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
//Here you read values from the cursor and set them to the insStmt ...
String field1 = rs.getString(1);
String field2 = rs.getString(2);
int field3 = rs.getInt(3);
//--- etc.
insStmt.setString(1, field1);
insStmt.setString(2, field2);
insStmt.setInt(3, field3);
//----- etc. for all the fields
batchedRecordCount++;
insStmt.addBatch();
if (batchRecordCount > 1000) {
insStmt.executeBatch();
}
}
if (batchRecordCount > 0) {
//Finish of the final (partial) set of records
insStmt.executeBatch();
}
//Close resources...