Program hangs after retrieving 100 rows containg CLOB - java

I am retrieving one text column (CLOB) from a table in a "remote" H2 database (actually on a local drive, but using tcp to access it) and after retrieving the first 100 rows the program hangs on retrieving the next row of the result set. If, on the other hand, I access the same database as an embedded database, there is no problem. If I try to display the table's rows using H2's console application accessing the database using the Server (i.e. tcp) method, then I get the following error message:
IO Exception: "java.io.IOException: org.h2.message.DbException: The object is already closed [90007-164]";
"lob: null table: 14 id: 1" [90031-164] 90031/90031
Here is the program. If I uncomment out the call that sets the system property, the program works. I have also tried retrieving the column using a character stream or simply a call to getString, controlled by constant USE_STREAM. There is no difference in the results:
import java.sql.*;
import java.util.*;
import java.io.*;
public class Jdbc4
{
private static final boolean USE_STREAM = false;
public static void main(String[] args) throws Exception
{
//System.setProperty("h2.serverResultSetFetchSize", "50");
Connection conn = null;
try {
Class.forName("org.h2.Driver").newInstance();
conn = DriverManager.getConnection("jdbc:h2:tcp://localhost/file:C:/h2/db/test/test;IFEXISTS=TRUE", "sa", "");
Statement stmt = conn.createStatement();
String sql = "select select_variables from ipm_queues";
ResultSet rs = stmt.executeQuery(sql);
int count = 0;
while (rs.next()) {
++count;
String s;
if (USE_STREAM) {
Clob clob = rs.getClob(1);
Reader rdr = clob.getCharacterStream();
char[] cbuf = new char[1024];
StringBuffer sb = new StringBuffer();
int len;
while ((len = rdr.read(cbuf, 0, cbuf.length)) != -1)
sb.append(cbuf, 0, len);
rdr.close();
s = sb.toString();
clob.free();
}
else
s = rs.getString(1);
System.out.println(count + ": " + s);
}
}
finally {
if (conn != null)
conn.close();
}
}
}
Here is the DDL for creating the table (you can see it was originally a MySql table):
CREATE TABLE `ipm_queues` (
`oid` bigint NOT NULL,
`queue_id` varchar(256) NOT NULL,
`store_id` bigint NOT NULL,
`creation_time` datetime NOT NULL,
`status` bigint NOT NULL,
`deleted` bigint NOT NULL,
`last_mod_time` datetime NOT NULL,
`queue_name` varchar(128),
`select_variables` text,
`where_clause` text,
`from_table` varchar(128),
`order_by` varchar(256),
`from_associate_table` varchar(256),
`from_view` varchar(128)
);
ALTER TABLE ipm_queues
ADD CONSTRAINT ipm_queues_pkey PRIMARY KEY (oid);
CREATE UNIQUE INDEX ipm_queues_key_idx ON ipm_queues(queue_id, store_id);
CREATE INDEX ipm_queues_str_idx ON ipm_queues(store_id);

I believe I understand the the cause of the hang. I investigated the simplest case of using a h2.serverResultSetFetchSize value of 600, which is greater than the 523 rows I know that I have. As I mentioned, I can retrieve the first 3 rows (single CLOB column) okay and then I either hang on the retrieval of the 4th row or I get a "The object is already closed" exception.
It turns out that the actual string comprising the first three columns seem to be rather short in length and method getInputStream in class org.h2.value.ValueLobDb has the data already and simply returns a ByteArrayInputStream constructed on this data. The 4th row's data is still on the server side and so an actual RemoteInputStream has to be built to process fetch the data from the server-side LOB.
Here's what seems to be the problem: Class org.h2.server.TcpServerThread is caching these LOBs in in instance of a SmallLRUCache. This cache seems to be designed to maintain only the least recently referenced LOBs!!! The default size of this cache is given by system property h2.serverCachedObjects, which defaults to 64, whereas the default fetch size is 100. So even if I had not overridden the default h2.serverResultSetFetchSize property, if all of my rows had sufficiently large columns requiring cached LOBs, any fetch size > 64 would cause the LOB representing the first row to be flushed out of the cache and I would not even be able to retrieve the first row.
An LRU cache seems to be the wrong structure for holding LOBs that are in an active result set. Certainly having a default cache size that is less than the default fetch size seems less than ideal.

you should probably give more details, but did you check your network connection? Maybe your database server is blocking connections (or network connections) as soon as they try to fetch too much data. This could be a "sort of" protection.

Related

Java JDBC - PreparedStatement executeUpdate() always returns 1

Currently I'm working on a code in java that retrieves data from XML files located in various folders and then uploads the file itself and the data retrieved to a SQL-server Database. I don't want to upload any repeated XML file to database but since the files can have random names I'm checking using the Hash from each file I'm about to upload, I'm uploading the files to the following table:
XMLFiles
CREATE TABLE [dbo].[XMLFiles](
[PathID] [int] NOT NULL,
[FileID] [int] IDENTITY(1,1) NOT NULL,
[XMLFileName] [nvarchar](100) NULL,
[FileSize] [int] NULL,
[FileData] [varbinary](max) NULL,
[ModDate] [datetime2](7) NULL,
[FileHash] [nvarchar](100) NULL,
CONSTRAINT [PK_XMLFiles] PRIMARY KEY CLUSTERED
(
[FileID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
the code I'm using to upload the files is the following:
public int UploadFile
public int UploadFile(String Path,int pathID) throws SQLException, SAXException, IOException {
int ID=-1;
String hash;
int len,rowCount=0;
String query;
PreparedStatement pstmt;
try {
File file = new File(Path);
hash=XMLRead.getFileChecksum(file);
FileInputStream fis = new FileInputStream(file);
len = (int) file.length();
query = (" IF NOT EXISTS "
+ " (SELECT 1"
+ " FROM XMLFiles"
+ " WHERE FileSize="+len+" AND FileHash='"+hash+"')"
+ " BEGIN"
+ " INSERT INTO XMLFiles (PathID,XMLFileName,FileSize,FileData,ModDate,FileHash) "
+ " VALUES(?,?,?,?,GETDATE(),?)"
+ " END;");
pstmt = Con.prepareStatement(query);
pstmt.setInt(1, pathID);
pstmt.setString(2, file.getName());
pstmt.setInt(3, len);
pstmt.setBinaryStream(4, fis, len);
pstmt.setString(5, hash);
rowCount=pstmt.executeUpdate();
System.out.println("ROWS AFFECTED:-"+rowCount);
if (rowCount==0){
System.out.println("THE FILE: "+file.getName()+"ALREADY EXISTS IN THE SERVER WITH THE NAME: ");
System.out.println(GetFilename(hash));
}
} catch (Exception e) {
e.printStackTrace();
}
return rowCount;
}
I'm executing the program with 28 files in which 4 of them are repeated files but with different names, I know the code is working fine because at the end of each execution only the 24 unique files are uploaded, the problem is that I'm using the rowCount to check if the file was uploaded or not, and if the file wasn't uploaded because it was a repeated file I'm not uploading the data of that file to the database neither, like so (the following code is a fragment to illustrate the comprobation I'm doing):
int rowCount=UploadFile(Path,pathID);
if (rowCount==1){
//UPLOAD DATA
}
the problem is that the executeUpdate() in the method UploadFile always returns 1 even when no rows in the database where affected, Is there something I'm missing here?, I can't find anything wrong with my code, is it the IF NOT EXISTS comprobation that I'm doing the one that returns 1?
The update count returned by a SQL statement is only well-defined for a normal DML statement (INSERT, UPDATE, or DELETE).
It is not defined for a SQL script.
The value is whatever the server chooses to return for a script. For MS SQL Server, it is likely the value of ##ROWCOUNT at the end of the statement / script:
Set ##ROWCOUNT to the number of rows affected or read.
Since you're executing a SELECT statement, it sets the ##ROWCOUNT value. If zero, you then execute the INSERT statement, which will override the ##ROWCOUNT value.
Assuming there will never be more than one row with that size/hash, you will always get a count of 1 back.
It could be that when your SELECT in the IF block finds the existing row it is counted and returned.
If there is no exception thrown you could try the INSERT without the IF NOT EXISTS check and see if this is the case. You may end up with duplicates if you do not have a key of some kind that prevents them from being inserted, or you may receive an exception if you have a key that does prevent the insert. It's worth testing to see what you get.
If it is the SELECT returning the 1, you may need to split them into two statements, and simply skip the execution of the second if the first finds a row. You can keep them in the same transaction, and essentially your db is doing two statements as currently written. It's more code, but if you do in the same transaction, it's the same effect on your database.

PostgreSQL's XMIN in Oracle & MySQL

I'm trying to get the equivalent for this code on Oracle & MySQL
if(vardbtype.equals("POSTGRESQL")){
Long previousTxId = 0L;
Long nextTxId = 0L;
Class.forName("org.postgresql.Driver");
System.out.println("----------------------------");
try(Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/"+ vardbserver, vardbuser, vardbpassword);
PreparedStatement stmts = c.prepareStatement("SELECT * FROM "+ vardbname +" where xmin::varchar::bigint > ? and xmin::varchar::bigint < ? ");
PreparedStatement max = c.prepareStatement("select max(xmin::varchar::bigint) as txid from "+ vardbname)
) {
c.setAutoCommit(false);
while(true) {
stmts.clearParameters();
try(ResultSet rss = max.executeQuery()) {
if(rss.next()) {
nextTxId = rss.getLong(1);
}
}
stmts.setLong(1, previousTxId);
stmts.setLong(2, nextTxId + 1);
try(ResultSet rss = stmts.executeQuery()) {
while(rss.next()) {
String message = rss.getString("MESSAGE");
System.out.println("Message = " + message);
TextMessage mssg = session.createTextMessage(message);
System.out.println("Sent: " + mssg.getText());
producer.send(mssg);
}
previousTxId = nextTxId;
}
Thread.sleep(batchperiod2);
}
}
}
Basically, the code works to get contents inside a database's table and sent it to ActiveMQ. And when the table updated, it will sent the content that just updated (not sending the past that was sent). But this code only works on PostgreSQL
Then i'm planning to create an "if" function. So i can use another database to getting the data (Oracle and MySQL).
I guess i must change this code right?
try(Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/"+ vardbserver, vardbuser, vardbpassword);
PreparedStatement stmts = c.prepareStatement("SELECT * FROM "+ vardbname +" where xmin::varchar::bigint > ? and xmin::varchar::bigint < ? ");
PreparedStatement max = c.prepareStatement("select max(xmin::varchar::bigint) as txid from "+ vardbname)
) {
A couple thoughts supplemental to Thorsten's answer.
First, xmin is a system column which is, iirc, stored in the row header on disk. It is updated by writes. I have not yet run into a case where the transaction id's don't increase. However, there has to be some wraparound point. I think you are better off with a trigger which stores the transaction ids in another table for processing for this reason (and using that to process things).
For Oracle and MySQL, underlying storage is sufficiently different that I don't see how you can do this directly.
If you want a common solution you want a queue table where you can use a trigger to insert waiting copies, and then select/delete from that in your worker. This will likely work better on MySQL than on PostgreSQL, and for Oracle you want to look for index-oriented tables. If autovacuum has trouble keeping up, ask more questions or hire a consultant.
After further research
InnoDB provides a DB_TRX_ID column which is similar. Note you cannot assume you have this column if you are running MySQL because MySQL has different table storage engines and not all even support transactions. So that is an important limitation.
I was unable to locate a similar column on Oracle.
This script is looking in intervals at a table and putting out all inserted messages since that last loop.
PostgreSQL stores the transaction number that inserted a record, so this can be used to find the newly inserted records (although I am not sure whether it is guaranteed for a new transaction to have a higher number than all previous ones as the script assumes).
Other DBMS don't have this pseudo column. So you would have to have a timestamp column in your table and use this instead. You'd have to change the two queries as well as the code to match the data type (I suppose java.sql.Timestamp instead of Long, but I am no Java guy).

Do not update row in ResultSet if data has changed

we are extracting data from various database types (Oracle, MySQL, SQL-Server, ...). Once it is successfully written to a file we want to mark it as transmitted, so we update a specific column.
Our problem is, that a user has the possibility to change the data in the meantime but might forget to commit. The record is blocked with a select for update statement. So it can happen, that we mark something as transmitted, which is not.
This is an excerpt from our code:
Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
if (!extractedData.rowUpdated()) {
extractedData.updateString("COLUMNNAME", "TRANSMITTED");
// code will stop here if user has changed data but did not commit
extractedData.updateRow();
// once committed the changed data is marked as transmitted
}
}
The method extractedData.rowUpdated() returns false, because technically the user didn't change anything yet.
Is there any way to not update the row and detect if data was changed at this late stage?
Unfortunately I cannot change the program the user is using to change the data.
So you want to
Run through all rows of the table that have not been exported
Export this data somewhere
Mark these rows exported so your next iteration will not export them again
As there might be pending changes on a row, you don't want to mess with that information
How about:
You iterate over all rows.
for every row
generate a hash value for the contents of the row
compare column "UPDATE_STATUS" with calulated hash
if no match
export row
store hash into "UPDATE_STATUS"
if store fails (row locked)
-> no worries, will be exported again next time
if store succeeds (on data already changed by user)
-> no worries, will be exported again as hash will not match
This might further slow your export as you'll have to iterate over everything instead of over everything WHERE UPDATE_STATUS IS NULL but you might be able to do two jobs - one (fast)
iterating over WHERE UPDATE_STATUS IS NULL and one slow and thorough WHERE UPDATE_STATUS IS NOT NULL (with the hash-rechecking in place)
If you want to avoid store-failures/waits, you might want to store the hash /updated information into a second table copying the primary key plus the hash field value - that way user
locks on the main table would not interfere with your updates at all (as those would be on another table)
"a user [...] might forget to commit" > A user either commits or he doesn't. "Forgetting" to commit is tantamount to a bug in his software.
To work around that you need to either:
Start a transaction with isolation level SERIALIZABLE, and within that transaction:
Read the data and export it. Data read this way is blocked from being updated.
Update the data you processed. Note: don't do that with an updateable ResultSet, do that with an UPDATE statement. That way you don't need an CONCUR_UPDATABLE + TYPE_SCROLL_SENSITIVE which is much slower than a CONCUR_READ_ONLY + TYPE_FORWARD_ONLY.
Commit the transaction.
That way the buggy software will be blocked from updating data you are processing.
Another way
Start a TRANSACTION at a lower isolation level (default READ COMMITTED) and within that transaction
Select the data with proper Table Hints Eg for SQL Server these: TABLOCKX + HOLDLOCK (large datasets), or ROWLOCK + XLOCK + HOLDLOCK (small datasets), or PAGLOCK + XLOCK + HOLDLOCK. Having HOLDLOCK as a table hint is practically equivalent to having a SERIALIZABLE transaction. Note that lock escalation may escalate the latter two to table locks if the number of locks becomes too high.
Update the data you processed; Note: use an UPDATE statement. Lose the updatable/scroll_sensitive resultset.
Commit the TRANSACTION.
Same deal, the buggy software will be blocked from updating data you are processing.
In the end we had to implement optimistic locking. In some tables we already have a column that stores the version number. Some other tables have a timestamp column that holds the time of the last change (changed by trigger).
While a timestamp might not always be a reliable source for optimistic locking we went with it anyway. Several changes during a single second are not very realistic in our environment.
Since we have to know the primary key without describing it before hand, we had to access the resultset metadata. Some of our databases do not support this (DB/2 legacy tables for example). We are still using the old system for these.
Note: The tableMetaData is an XML-config file where our description of the table is stored. This is not directly related to the metadata of the table in the database.
Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
if (tableMetaData.getVersion() != null) {
markDataAsExported(extractedData, tableMetaData);
} else {
markResultSetAsExported(extractedData, tableMetaData);
}
}
// new way with building of an update statement including the version column in the where clause
private void markDataAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
ResultSet resultSetPrimaryKeys = null;
PreparedStatement versionedUpdateStatement = null;
try {
ResultSetMetaData extractedMetaData = extractedData.getMetaData();
resultSetPrimaryKeys = conn.getMetaData().getPrimaryKeys(null, null, tableMetaData.getTable());
ArrayList<String> primaryKeyList = new ArrayList<String>();
String sqlStatement = "update " + tableMetaData.getTable() + " set " + tableMetaData.getUpdateColumn()
+ " = ? where ";
if (resultSetPrimaryKeys.isBeforeFirst()) {
while (resultSetPrimaryKeys.next()) {
primaryKeyList.add(resultSetPrimaryKeys.getString(4));
sqlStatement += resultSetPrimaryKeys.getString(4) + " = ? and ";
}
sqlStatement += tableMetaData.getVersionColumn() + " = ?";
versionedUpdateStatement = conn.prepareStatement(sqlStatement);
while (extractedData.next()) {
versionedUpdateStatement.setString(1, tableMetaData.getUpdateValue());
for (int i = 0; i < primaryKeyList.size(); i++) {
versionedUpdateStatement.setObject(i + 2, extractedData.getObject(primaryKeyList.get(i)),
extractedMetaData.getColumnType(extractedData.findColumn(primaryKeyList.get(i))));
}
versionedUpdateStatement.setObject(primaryKeyList.size() + 2,
extractedData.getObject(tableMetaData.getVersionColumn()), tableMetaData.getVersionType());
if (versionedUpdateStatement.executeUpdate() == 0) {
logger.warn(Message.COLLECTOR_DATA_CHANGED, tableMetaData.getTable());
}
}
} else {
logger.warn(Message.COLLECTOR_PK_ERROR, tableMetaData.getTable());
markResultSetAsExported(extractedData, tableMetaData);
}
} finally {
if (resultSetPrimaryKeys != null) {
resultSetPrimaryKeys.close();
}
if (versionedUpdateStatement != null) {
versionedUpdateStatement.close();
}
}
}
//the old way as fallback
private void markResultSetAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
while (extractedData.next()) {
extractedData.updateString(tableMetaData.getUpdateColumn(), tableMetaData.getUpdateValue());
extractedData.updateRow();
}
}

Why does getGeneratedKeys() return "GENERATED_KEY" as column name?

I'm playing with the JDBC/MySQL 5.1. I created an insert query to insert some data into a table and want to return the generated key from the newly created row. However, when I go to reference the column by "id" which is my PK and auto-increment column.
PreparedStatement ps = St0rm.getInstance().getDatabase("main")
.prepare("INSERT INTO quests (name,minlevel,start_npc,end_npc) VALUES(?,?,?,?)", true); // creates a prepared statement with flag RETURN_GENERATED_KEYS
// ...
int affected = ps.executeUpdate();
ResultSet keys = ps.getGeneratedKeys();
if (affected > 0 && keys.next()) {
St0rm.getInstance().getLogger().warning(String.format("ID Column Name: %s", keys.getMetaData().getColumnName(1))); // says the column name is: GENERATED_KEY
q = new Quest(keys.getInt(1)); // column index from the generated key, no error thrown.
q = new Quest(keys.getInt("id")); // actual column name, line throws a SQLException
// ...
}
So, my question: Why does ResultSet.getGeneratedKeys use GENERATED_KEY as the column name?
You shouldn't retrieve these columns by name. Only by index, since
there can only ever be one column with MySQL and auto_increments that
returns value(s) that can be exposed by Statement.getGeneratedKeys().
Currently the MySQL server doesn't return information directly that
would make the ability to retrieve these columns by name in an
efficient manner possible, which is why I'm marking this as "to be
fixed later", since we can, once the server returns the information in
a way that the driver can use.
From here (in 2006!).

Failing to load large dataset into h2 database

Here is the problem: At my company we have a large database that we want to perform some automated operations in it. To test that we got a small sample of that data about 6 10MB sized csv files. We want to use H2 to test the results of our program in it. H2 Seemed to work fine with our previous cvs though they were, at most, 1000 entries long. When it comes to any of our 10MB files the command
insert into myschema.mytable (select * from csvread('mycsvfile.csv'));
reports a failure because one of the registries is supposedly duplicated and offends our primary key constraints.
Unique index or primary key violation: "PRIMARY_KEY_6 ON MYSCHEMA.MYTABLE(DATETIME, LARGENUMBER, KIND)"; SQL statement:
insert into myschema.mytable (select * from csvread('src/test/resources/h2/data/mycsvfile.csv')) [23001-148] 23001/23001
Breaking the mycsvfile.csv into smaller pieces I was able to see that the problem starts to appear after about 10000 rows inserted(though the number varies depending on what data I used). I could however insert more than 10000 rows if I broke the file into pieces and then ran the command individually. But even if I manage to insert all that data manually I need an automated method to fill the database.
Since running the command would not give me the row that was causing the problem I guessed that the problem could be some cache in the csvread routine.
Then I created a small java program that could insert the data in the H2 database manually. No matter whether I batched the commands, closed and opened the connection for 1000 rows h2 reported that I was trying to duplicate an entry in the database.
org.h2.jdbc.JdbcSQLException: Unique index or primary key violation: "PRIMARY_KEY_6 ON MYSCHEMA.MYTABLE(DATETIME, LARGENUMBER, KIND)"; SQL statement:
INSERT INTO myschema.mytable VALUES ( '1997-10-06 01:00:00.0',25485116,1.600,0,18 ) [23001-148]
Doing a normal search for that registry using emacs I can find that the registry is not duplicated as the datetime column is unique in the whole dataset.
I cannot give that data for you to test since the company sells that information. But here is how my table definition is like.
create table myschema.mytable (
datetime timestamp,
largenumber numeric(8,0) references myschema.largenumber(largecode),
value numeric(8,3) not null,
flag numeric(1,0) references myschema.flag(flagcode),
kind smallint references myschema.kind(kindcode),
primary key (datetime, largenumber, kind)
);
This is how our csv looks like:
datetime,largenumber,value,flag,kind
1997-06-11 16:45:00.0,25485116,0.710,0,18
1997-06-11 17:00:00.0,25485116,0.000,0,18
1997-06-11 17:15:00.0,25485116,0.000,0,18
1997-06-11 17:30:00.0,25485116,0.000,0,18
And the java code that would fill our test database(forgive my ugly code, I got desperate :)
private static void insertFile(MyFile file) throws SQLException {
int updateCount = 0;
ResultSet rs = Csv.getInstance().read(file.toString(), null, null);
ResultSetMetaData meta = rs.getMetaData();
Connection conn = DriverManager.getConnection(
"jdbc:h2:tcp://localhost/mytestdatabase", "sa", "pass");
rs.next();
while (rs.next()) {
Statement stmt = conn.createStatement();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < meta.getColumnCount(); i++) {
if (i == 0)
sb.append("'" + rs.getString(i + 1) + "'");
else
sb.append(rs.getString(i + 1));
sb.append(',');
}
updateCount++;
if (sb.length() > 0)
sb.deleteCharAt(sb.length() - 1);
stmt.execute(String.format(
"INSERT INTO myschema.mydatabase VALUES ( %s ) ",
sb.toString()));
if (updateCount == 1000) {
conn.close();
conn = DriverManager.getConnection(
"jdbc:h2:tcp://localhost/mytestdatabase", "sa", "pass");
updateCount = 0;
}
}
if (!conn.isClosed()) {
conn.close();
}
rs.close();
}
I'll be glad to provide more information if requested.
EDIT
#Randy I always check if the database is clean before running the command and in my java program I have a routine to delete all data from a file that fails to be inserted.
select * from myschema.mytable where largenumber = 25485116;
DATETIME LARGENUMBER VALUE FLAG KIND
(no rows, 8 ms)
The only thing that I can think of is that there is a trigger on the table that sets the timestamp to "now". Although that would not explain why you are successful with a few rows, it would explain why the primary key is being violated.

Categories