Failing to load large dataset into h2 database - java

Here is the problem: At my company we have a large database that we want to perform some automated operations in it. To test that we got a small sample of that data about 6 10MB sized csv files. We want to use H2 to test the results of our program in it. H2 Seemed to work fine with our previous cvs though they were, at most, 1000 entries long. When it comes to any of our 10MB files the command
insert into myschema.mytable (select * from csvread('mycsvfile.csv'));
reports a failure because one of the registries is supposedly duplicated and offends our primary key constraints.
Unique index or primary key violation: "PRIMARY_KEY_6 ON MYSCHEMA.MYTABLE(DATETIME, LARGENUMBER, KIND)"; SQL statement:
insert into myschema.mytable (select * from csvread('src/test/resources/h2/data/mycsvfile.csv')) [23001-148] 23001/23001
Breaking the mycsvfile.csv into smaller pieces I was able to see that the problem starts to appear after about 10000 rows inserted(though the number varies depending on what data I used). I could however insert more than 10000 rows if I broke the file into pieces and then ran the command individually. But even if I manage to insert all that data manually I need an automated method to fill the database.
Since running the command would not give me the row that was causing the problem I guessed that the problem could be some cache in the csvread routine.
Then I created a small java program that could insert the data in the H2 database manually. No matter whether I batched the commands, closed and opened the connection for 1000 rows h2 reported that I was trying to duplicate an entry in the database.
org.h2.jdbc.JdbcSQLException: Unique index or primary key violation: "PRIMARY_KEY_6 ON MYSCHEMA.MYTABLE(DATETIME, LARGENUMBER, KIND)"; SQL statement:
INSERT INTO myschema.mytable VALUES ( '1997-10-06 01:00:00.0',25485116,1.600,0,18 ) [23001-148]
Doing a normal search for that registry using emacs I can find that the registry is not duplicated as the datetime column is unique in the whole dataset.
I cannot give that data for you to test since the company sells that information. But here is how my table definition is like.
create table myschema.mytable (
datetime timestamp,
largenumber numeric(8,0) references myschema.largenumber(largecode),
value numeric(8,3) not null,
flag numeric(1,0) references myschema.flag(flagcode),
kind smallint references myschema.kind(kindcode),
primary key (datetime, largenumber, kind)
);
This is how our csv looks like:
datetime,largenumber,value,flag,kind
1997-06-11 16:45:00.0,25485116,0.710,0,18
1997-06-11 17:00:00.0,25485116,0.000,0,18
1997-06-11 17:15:00.0,25485116,0.000,0,18
1997-06-11 17:30:00.0,25485116,0.000,0,18
And the java code that would fill our test database(forgive my ugly code, I got desperate :)
private static void insertFile(MyFile file) throws SQLException {
int updateCount = 0;
ResultSet rs = Csv.getInstance().read(file.toString(), null, null);
ResultSetMetaData meta = rs.getMetaData();
Connection conn = DriverManager.getConnection(
"jdbc:h2:tcp://localhost/mytestdatabase", "sa", "pass");
rs.next();
while (rs.next()) {
Statement stmt = conn.createStatement();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < meta.getColumnCount(); i++) {
if (i == 0)
sb.append("'" + rs.getString(i + 1) + "'");
else
sb.append(rs.getString(i + 1));
sb.append(',');
}
updateCount++;
if (sb.length() > 0)
sb.deleteCharAt(sb.length() - 1);
stmt.execute(String.format(
"INSERT INTO myschema.mydatabase VALUES ( %s ) ",
sb.toString()));
if (updateCount == 1000) {
conn.close();
conn = DriverManager.getConnection(
"jdbc:h2:tcp://localhost/mytestdatabase", "sa", "pass");
updateCount = 0;
}
}
if (!conn.isClosed()) {
conn.close();
}
rs.close();
}
I'll be glad to provide more information if requested.
EDIT
#Randy I always check if the database is clean before running the command and in my java program I have a routine to delete all data from a file that fails to be inserted.
select * from myschema.mytable where largenumber = 25485116;
DATETIME LARGENUMBER VALUE FLAG KIND
(no rows, 8 ms)

The only thing that I can think of is that there is a trigger on the table that sets the timestamp to "now". Although that would not explain why you are successful with a few rows, it would explain why the primary key is being violated.

Related

PostgreSQL's XMIN in Oracle & MySQL

I'm trying to get the equivalent for this code on Oracle & MySQL
if(vardbtype.equals("POSTGRESQL")){
Long previousTxId = 0L;
Long nextTxId = 0L;
Class.forName("org.postgresql.Driver");
System.out.println("----------------------------");
try(Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/"+ vardbserver, vardbuser, vardbpassword);
PreparedStatement stmts = c.prepareStatement("SELECT * FROM "+ vardbname +" where xmin::varchar::bigint > ? and xmin::varchar::bigint < ? ");
PreparedStatement max = c.prepareStatement("select max(xmin::varchar::bigint) as txid from "+ vardbname)
) {
c.setAutoCommit(false);
while(true) {
stmts.clearParameters();
try(ResultSet rss = max.executeQuery()) {
if(rss.next()) {
nextTxId = rss.getLong(1);
}
}
stmts.setLong(1, previousTxId);
stmts.setLong(2, nextTxId + 1);
try(ResultSet rss = stmts.executeQuery()) {
while(rss.next()) {
String message = rss.getString("MESSAGE");
System.out.println("Message = " + message);
TextMessage mssg = session.createTextMessage(message);
System.out.println("Sent: " + mssg.getText());
producer.send(mssg);
}
previousTxId = nextTxId;
}
Thread.sleep(batchperiod2);
}
}
}
Basically, the code works to get contents inside a database's table and sent it to ActiveMQ. And when the table updated, it will sent the content that just updated (not sending the past that was sent). But this code only works on PostgreSQL
Then i'm planning to create an "if" function. So i can use another database to getting the data (Oracle and MySQL).
I guess i must change this code right?
try(Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/"+ vardbserver, vardbuser, vardbpassword);
PreparedStatement stmts = c.prepareStatement("SELECT * FROM "+ vardbname +" where xmin::varchar::bigint > ? and xmin::varchar::bigint < ? ");
PreparedStatement max = c.prepareStatement("select max(xmin::varchar::bigint) as txid from "+ vardbname)
) {
A couple thoughts supplemental to Thorsten's answer.
First, xmin is a system column which is, iirc, stored in the row header on disk. It is updated by writes. I have not yet run into a case where the transaction id's don't increase. However, there has to be some wraparound point. I think you are better off with a trigger which stores the transaction ids in another table for processing for this reason (and using that to process things).
For Oracle and MySQL, underlying storage is sufficiently different that I don't see how you can do this directly.
If you want a common solution you want a queue table where you can use a trigger to insert waiting copies, and then select/delete from that in your worker. This will likely work better on MySQL than on PostgreSQL, and for Oracle you want to look for index-oriented tables. If autovacuum has trouble keeping up, ask more questions or hire a consultant.
After further research
InnoDB provides a DB_TRX_ID column which is similar. Note you cannot assume you have this column if you are running MySQL because MySQL has different table storage engines and not all even support transactions. So that is an important limitation.
I was unable to locate a similar column on Oracle.
This script is looking in intervals at a table and putting out all inserted messages since that last loop.
PostgreSQL stores the transaction number that inserted a record, so this can be used to find the newly inserted records (although I am not sure whether it is guaranteed for a new transaction to have a higher number than all previous ones as the script assumes).
Other DBMS don't have this pseudo column. So you would have to have a timestamp column in your table and use this instead. You'd have to change the two queries as well as the code to match the data type (I suppose java.sql.Timestamp instead of Long, but I am no Java guy).

Do not update row in ResultSet if data has changed

we are extracting data from various database types (Oracle, MySQL, SQL-Server, ...). Once it is successfully written to a file we want to mark it as transmitted, so we update a specific column.
Our problem is, that a user has the possibility to change the data in the meantime but might forget to commit. The record is blocked with a select for update statement. So it can happen, that we mark something as transmitted, which is not.
This is an excerpt from our code:
Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
if (!extractedData.rowUpdated()) {
extractedData.updateString("COLUMNNAME", "TRANSMITTED");
// code will stop here if user has changed data but did not commit
extractedData.updateRow();
// once committed the changed data is marked as transmitted
}
}
The method extractedData.rowUpdated() returns false, because technically the user didn't change anything yet.
Is there any way to not update the row and detect if data was changed at this late stage?
Unfortunately I cannot change the program the user is using to change the data.
So you want to
Run through all rows of the table that have not been exported
Export this data somewhere
Mark these rows exported so your next iteration will not export them again
As there might be pending changes on a row, you don't want to mess with that information
How about:
You iterate over all rows.
for every row
generate a hash value for the contents of the row
compare column "UPDATE_STATUS" with calulated hash
if no match
export row
store hash into "UPDATE_STATUS"
if store fails (row locked)
-> no worries, will be exported again next time
if store succeeds (on data already changed by user)
-> no worries, will be exported again as hash will not match
This might further slow your export as you'll have to iterate over everything instead of over everything WHERE UPDATE_STATUS IS NULL but you might be able to do two jobs - one (fast)
iterating over WHERE UPDATE_STATUS IS NULL and one slow and thorough WHERE UPDATE_STATUS IS NOT NULL (with the hash-rechecking in place)
If you want to avoid store-failures/waits, you might want to store the hash /updated information into a second table copying the primary key plus the hash field value - that way user
locks on the main table would not interfere with your updates at all (as those would be on another table)
"a user [...] might forget to commit" > A user either commits or he doesn't. "Forgetting" to commit is tantamount to a bug in his software.
To work around that you need to either:
Start a transaction with isolation level SERIALIZABLE, and within that transaction:
Read the data and export it. Data read this way is blocked from being updated.
Update the data you processed. Note: don't do that with an updateable ResultSet, do that with an UPDATE statement. That way you don't need an CONCUR_UPDATABLE + TYPE_SCROLL_SENSITIVE which is much slower than a CONCUR_READ_ONLY + TYPE_FORWARD_ONLY.
Commit the transaction.
That way the buggy software will be blocked from updating data you are processing.
Another way
Start a TRANSACTION at a lower isolation level (default READ COMMITTED) and within that transaction
Select the data with proper Table Hints Eg for SQL Server these: TABLOCKX + HOLDLOCK (large datasets), or ROWLOCK + XLOCK + HOLDLOCK (small datasets), or PAGLOCK + XLOCK + HOLDLOCK. Having HOLDLOCK as a table hint is practically equivalent to having a SERIALIZABLE transaction. Note that lock escalation may escalate the latter two to table locks if the number of locks becomes too high.
Update the data you processed; Note: use an UPDATE statement. Lose the updatable/scroll_sensitive resultset.
Commit the TRANSACTION.
Same deal, the buggy software will be blocked from updating data you are processing.
In the end we had to implement optimistic locking. In some tables we already have a column that stores the version number. Some other tables have a timestamp column that holds the time of the last change (changed by trigger).
While a timestamp might not always be a reliable source for optimistic locking we went with it anyway. Several changes during a single second are not very realistic in our environment.
Since we have to know the primary key without describing it before hand, we had to access the resultset metadata. Some of our databases do not support this (DB/2 legacy tables for example). We are still using the old system for these.
Note: The tableMetaData is an XML-config file where our description of the table is stored. This is not directly related to the metadata of the table in the database.
Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
if (tableMetaData.getVersion() != null) {
markDataAsExported(extractedData, tableMetaData);
} else {
markResultSetAsExported(extractedData, tableMetaData);
}
}
// new way with building of an update statement including the version column in the where clause
private void markDataAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
ResultSet resultSetPrimaryKeys = null;
PreparedStatement versionedUpdateStatement = null;
try {
ResultSetMetaData extractedMetaData = extractedData.getMetaData();
resultSetPrimaryKeys = conn.getMetaData().getPrimaryKeys(null, null, tableMetaData.getTable());
ArrayList<String> primaryKeyList = new ArrayList<String>();
String sqlStatement = "update " + tableMetaData.getTable() + " set " + tableMetaData.getUpdateColumn()
+ " = ? where ";
if (resultSetPrimaryKeys.isBeforeFirst()) {
while (resultSetPrimaryKeys.next()) {
primaryKeyList.add(resultSetPrimaryKeys.getString(4));
sqlStatement += resultSetPrimaryKeys.getString(4) + " = ? and ";
}
sqlStatement += tableMetaData.getVersionColumn() + " = ?";
versionedUpdateStatement = conn.prepareStatement(sqlStatement);
while (extractedData.next()) {
versionedUpdateStatement.setString(1, tableMetaData.getUpdateValue());
for (int i = 0; i < primaryKeyList.size(); i++) {
versionedUpdateStatement.setObject(i + 2, extractedData.getObject(primaryKeyList.get(i)),
extractedMetaData.getColumnType(extractedData.findColumn(primaryKeyList.get(i))));
}
versionedUpdateStatement.setObject(primaryKeyList.size() + 2,
extractedData.getObject(tableMetaData.getVersionColumn()), tableMetaData.getVersionType());
if (versionedUpdateStatement.executeUpdate() == 0) {
logger.warn(Message.COLLECTOR_DATA_CHANGED, tableMetaData.getTable());
}
}
} else {
logger.warn(Message.COLLECTOR_PK_ERROR, tableMetaData.getTable());
markResultSetAsExported(extractedData, tableMetaData);
}
} finally {
if (resultSetPrimaryKeys != null) {
resultSetPrimaryKeys.close();
}
if (versionedUpdateStatement != null) {
versionedUpdateStatement.close();
}
}
}
//the old way as fallback
private void markResultSetAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
while (extractedData.next()) {
extractedData.updateString(tableMetaData.getUpdateColumn(), tableMetaData.getUpdateValue());
extractedData.updateRow();
}
}

Using a database API cursor with JDBC and SQLServer to select batch results

SOLVED (See answer below.)
I did not understand my problem within the proper context. The real issue was that my query was returning multiple ResultSet objects, and I had never come across that before. I have posted code below that solves the problem.
PROBLEM
I have an SQL Server database table with many thousand rows. My goal is to pull the data back from the source database and write it to a second database. Because of application memory constraints, I will not be able to pull the data back all at once. Also, because of this particular table's schema (over which I have no control) there is no good way for me to tick off the rows using some sort of ID column.
A gentleman over at the Database Administrators StackExchange helped me out by putting together something called a database API cursor, and basically wrote this complicated query that I only need to drop my statement into. When I run the query in SQL Management Studio (SSMS) it works great. I get all the data back, a thousand rows at a time.
Unfortunately, when I try to translate this into JDBC code, I get back the first thousand rows only.
QUESTION
Is it possible using JDBC to retrieve a database API cursor, pull the first set of rows from it, allow the cursor to advance, and then pull the subsequent sets one at a time? (In this case, a thousand rows at a time.)
SQL CODE
This gets complicated, so I'm going to break it up.
The actual query can be simple or complicated. It doesn't matter. I've tried several different queries during my experimentation and they all work. You just basically drop it into the the SQL code in the appropriate place. So, let's take this simple statement as our query:
SELECT MyColumn FROM MyTable;
The actual SQL database API cursor is far more complicated. I will print it out below. You can see the above query buried in it:
-- http://dba.stackexchange.com/a/82806
DECLARE #cur INTEGER
,
-- FAST_FORWARD | AUTO_FETCH | AUTO_CLOSE
#scrollopt INTEGER = 16 | 8192 | 16384
,
-- READ_ONLY, CHECK_ACCEPTED_OPTS, READ_ONLY_ACCEPTABLE
#ccopt INTEGER = 1 | 32768 | 65536
,#rowcount INTEGER = 1000
,#rc INTEGER;
-- Open the cursor and return the first 1,000 rows
EXECUTE #rc = sys.sp_cursoropen #cur OUTPUT
,'SELECT MyColumn FROM MyTable'
,#scrollopt OUTPUT
,#ccopt OUTPUT
,#rowcount OUTPUT;
IF #rc <> 16 -- FastForward cursor automatically closed
BEGIN
-- Name the cursor so we can use CURSOR_STATUS
EXECUTE sys.sp_cursoroption #cur
,2
,'MyCursorName';
-- Until the cursor auto-closes
WHILE CURSOR_STATUS('global', 'MyCursorName') = 1
BEGIN
EXECUTE sys.sp_cursorfetch #cur
,2
,0
,1000;
END;
END;
As I've said, the above creates a cursor in the database and asks the database to execute the statement, keep track (internally) of the data it's returning, and return the data a thousand rows at a time. It works great.
JDBC CODE
Here's where I'm having the problem. I have no compilation problems or run-time problems with my Java code. The problem I am having is that it returns only the first thousand rows. I don't understand how to utilize the database cursor properly. I have tried variations on the Java basics:
// Hoping to get all of the data, but I only get the first thousand.
ResultSet rs = stmt.executeQuery(fq.getQuery());
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
}
I'm not surprised by the results, but all of the variations I've tried produce the same results.
From my research it seems like the JDBC does something with database cursors when the database is Oracle, but you have to set the data type returned in the result set as an Oracle cursor object. I'm guessing there is something similar with SQL Server, but I have been unable to find anything yet.
Does anyone know of a way?
I'm including example Java code in full (as ugly as that gets).
// FancyQuery.java
import java.sql.*;
public class FancyQuery {
// Adapted from http://dba.stackexchange.com/a/82806
String query = "DECLARE #cur INTEGER\n"
+ " ,\n"
+ " -- FAST_FORWARD | AUTO_FETCH | AUTO_CLOSE\n"
+ " #scrollopt INTEGER = 16 | 8192 | 16384\n"
+ " ,\n"
+ " -- READ_ONLY, CHECK_ACCEPTED_OPTS, READ_ONLY_ACCEPTABLE\n"
+ " #ccopt INTEGER = 1 | 32768 | 65536\n"
+ " ,#rowcount INTEGER = 1000\n"
+ " ,#rc INTEGER;\n"
+ "\n"
+ "-- Open the cursor and return the first 1,000 rows\n"
+ "EXECUTE #rc = sys.sp_cursoropen #cur OUTPUT\n"
+ " ,'SELECT MyColumn FROM MyTable;'\n"
+ " ,#scrollopt OUTPUT\n"
+ " ,#ccopt OUTPUT\n"
+ " ,#rowcount OUTPUT;\n"
+ " \n"
+ "IF #rc <> 16 -- FastForward cursor automatically closed\n"
+ "BEGIN\n"
+ " -- Name the cursor so we can use CURSOR_STATUS\n"
+ " EXECUTE sys.sp_cursoroption #cur\n"
+ " ,2\n"
+ " ,'MyCursorName';\n"
+ "\n"
+ " -- Until the cursor auto-closes\n"
+ " WHILE CURSOR_STATUS('global', 'MyCursorName') = 1\n"
+ " BEGIN\n"
+ " EXECUTE sys.sp_cursorfetch #cur\n"
+ " ,2\n"
+ " ,0\n"
+ " ,1000;\n"
+ " END;\n"
+ "END;\n";
public String getQuery() {
return this.query;
}
public static void main(String[ ] args) throws Exception {
String dbUrl = "jdbc:sqlserver://tc-sqlserver:1433;database=MyBigDatabase";
String user = "mario";
String password = "p#ssw0rd";
String driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";
FancyQuery fq = new FancyQuery();
Class.forName(driver);
Connection conn = DriverManager.getConnection(dbUrl, user, password);
Statement stmt = conn.createStatement();
// We expect to get 1,000 rows at a time.
ResultSet rs = stmt.executeQuery(fq.getQuery());
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
}
// Alas, we've only gotten 1,000 rows, total.
rs.close();
stmt.close();
conn.close();
}
}
I figured it out.
stmt.execute(fq.getQuery());
ResultSet rs = null;
for (;;) {
rs = stmt.getResultSet();
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
}
if ((stmt.getMoreResults() == false) && (stmt.getUpdateCount() == -1)) {
break;
}
}
if (rs != null) {
rs.close();
}
After some additional googling, I found a bit of code posted back in 2004:
http://www.coderanch.com/t/300865/JDBC/databases/SQL-Server-JDBC-Registering-cursor
The gentleman who posted the snippet that I found helpful (Julian Kennedy) suggested: "Read the Javadoc for getUpdateCount() and getMoreResults() for a clear understanding." I was able to piece it together from that.
Basically, I don't think I understood my problem well enough at the outset in order to phrase it correctly. What it comes down to is that my query will be returning the data in multiple ResultSet instances. What I needed was a way to not merely iterate through each row in a ResultSet but, rather, iterate through the entire set of ResultSets. That's what the code above does.
If you want all records from the table, just do "Select * from table".
The only reason to retrieve in chunks is if there is some intermediate place for the data: e.g. if you are showing it on the screen, or storing it in memory.
If you are simply reading from one and inserting to another, just read everything from the first.You will not get any better performance by trying to retrieve in batches. If there is a difference, it will be negative. Frame your query in a way that brings back everything. The JDBC software will handle all the other breaking-up and reconstituting that you need.
However, you should batch the update/insert side of things.
The set-up would create two statements on the two connections:
Statement stmt = null;
ResultSet rs = null;
PreparedStatement insStmt = null;
stmt = conDb1.createStatement();
insStmt = conDb2.prepareStament("insert into tgt_db2_table (?,?,?,?,?......etc. ?,?) ");
rs = stmt.executeQuery("select * from src_db1_table");
Then, loop over the select as normal, but use batching on the target.
int batchedRecordCount = 0;
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
//Here you read values from the cursor and set them to the insStmt ...
String field1 = rs.getString(1);
String field2 = rs.getString(2);
int field3 = rs.getInt(3);
//--- etc.
insStmt.setString(1, field1);
insStmt.setString(2, field2);
insStmt.setInt(3, field3);
//----- etc. for all the fields
batchedRecordCount++;
insStmt.addBatch();
if (batchRecordCount > 1000) {
insStmt.executeBatch();
}
}
if (batchRecordCount > 0) {
//Finish of the final (partial) set of records
insStmt.executeBatch();
}
//Close resources...

Getting column metadata from jdbc/postgresql for newly created table

I'm trying to get the column list from newly created table(it is created in the java code).
The thing is that I do not get the columns.
The code works for tables that are already in the database, but if i create a new one and try to get the column info immediately it does not find any...
Update:
Here is full code that I used for testing:
#Test
public void testtest() throws Exception {
try (Connection conn = dataSource.getConnection()) {
String tableName = "Table_" + UUID.randomUUID().toString().replace("-", "");
try (Statement statement = conn.createStatement()) {
statement.executeUpdate(String.format("create table %s (id int primary key,name varchar(30));", tableName));
}
DatabaseMetaData metaData = conn.getMetaData();
try (ResultSet rs = metaData.getColumns(null, null, tableName, null)) {
int colsFound = 0;
while (rs.next()) {
colsFound++;
}
System.out.println(String.format("Found %s cols.", colsFound));
}
System.out.println(String.format("Autocommit is set to %s.", conn.getAutoCommit()));
}
}
The and the output:
Found 0 cols.
Autocommit is set to true.
The problem is with the case of your tablename:
String tableName = "Table_"
As that is an unquoted identifier (a good thing) the name is converted to lowercase when Postgres stores its name in the system catalog.
The DatabaseMetaData API calls are case sensitive ( "Table_" != "table_"), so you need to pass the lowercase tablename:
ResultSet rs = metaData.getColumns(null, null, tableName.toLowerCase(), null))
More details on how identifiers are using are in the manual: http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
I have made simple test and it seems to work. I can create new table and show its columns using PostgreSQL JDBC (I use Jython):
conn = db.createStatement()
conn.execute("CREATE TABLE new_table (id SERIAL, txt VARCHAR(200))")
db_meta_data = db.getMetaData()
for tbl_name in ('date_test', 'new_table'):
print('\n-- %s --' % (tbl_name))
rs = db_meta_data.getColumns(None, None, tbl_name, None)
while (rs.next()):
print('%s:%s' % (rs.getString(3), rs.getString(4)))
conn.close()
This code shows columns for both already existing table: date_test and for just created new_table. I also added some code to close connection after CREATE TABLE but my results are always the same and correct.
Maybe it is problem with your JDBC driver. I use driver from postgresql-9.3-1100.jdbc41.jar.
It may be also problem with user permissions. Do you use the same user for both creating table and getting metadata? Is new table visible in psql, pgAdmin or other tool?
Other reason is that PostgreSQL uses transactions also for schema changes. So if you disabled default autocommit and closed connection your schema changes will be lost. Do you use db.setAutoCommit(false)?
You can also query PostgreSQL schema directly:
SELECT DISTINCT table_name, column_name
FROM information_schema.columns
WHERE table_schema='public'
AND table_name = 'new_table'
ORDER BY 1, 2
Strangely giving passing table name in lower case to getColumns method does work...thanks for the query MichaƂ Niklas it got me on the right track.

Program hangs after retrieving 100 rows containg CLOB

I am retrieving one text column (CLOB) from a table in a "remote" H2 database (actually on a local drive, but using tcp to access it) and after retrieving the first 100 rows the program hangs on retrieving the next row of the result set. If, on the other hand, I access the same database as an embedded database, there is no problem. If I try to display the table's rows using H2's console application accessing the database using the Server (i.e. tcp) method, then I get the following error message:
IO Exception: "java.io.IOException: org.h2.message.DbException: The object is already closed [90007-164]";
"lob: null table: 14 id: 1" [90031-164] 90031/90031
Here is the program. If I uncomment out the call that sets the system property, the program works. I have also tried retrieving the column using a character stream or simply a call to getString, controlled by constant USE_STREAM. There is no difference in the results:
import java.sql.*;
import java.util.*;
import java.io.*;
public class Jdbc4
{
private static final boolean USE_STREAM = false;
public static void main(String[] args) throws Exception
{
//System.setProperty("h2.serverResultSetFetchSize", "50");
Connection conn = null;
try {
Class.forName("org.h2.Driver").newInstance();
conn = DriverManager.getConnection("jdbc:h2:tcp://localhost/file:C:/h2/db/test/test;IFEXISTS=TRUE", "sa", "");
Statement stmt = conn.createStatement();
String sql = "select select_variables from ipm_queues";
ResultSet rs = stmt.executeQuery(sql);
int count = 0;
while (rs.next()) {
++count;
String s;
if (USE_STREAM) {
Clob clob = rs.getClob(1);
Reader rdr = clob.getCharacterStream();
char[] cbuf = new char[1024];
StringBuffer sb = new StringBuffer();
int len;
while ((len = rdr.read(cbuf, 0, cbuf.length)) != -1)
sb.append(cbuf, 0, len);
rdr.close();
s = sb.toString();
clob.free();
}
else
s = rs.getString(1);
System.out.println(count + ": " + s);
}
}
finally {
if (conn != null)
conn.close();
}
}
}
Here is the DDL for creating the table (you can see it was originally a MySql table):
CREATE TABLE `ipm_queues` (
`oid` bigint NOT NULL,
`queue_id` varchar(256) NOT NULL,
`store_id` bigint NOT NULL,
`creation_time` datetime NOT NULL,
`status` bigint NOT NULL,
`deleted` bigint NOT NULL,
`last_mod_time` datetime NOT NULL,
`queue_name` varchar(128),
`select_variables` text,
`where_clause` text,
`from_table` varchar(128),
`order_by` varchar(256),
`from_associate_table` varchar(256),
`from_view` varchar(128)
);
ALTER TABLE ipm_queues
ADD CONSTRAINT ipm_queues_pkey PRIMARY KEY (oid);
CREATE UNIQUE INDEX ipm_queues_key_idx ON ipm_queues(queue_id, store_id);
CREATE INDEX ipm_queues_str_idx ON ipm_queues(store_id);
I believe I understand the the cause of the hang. I investigated the simplest case of using a h2.serverResultSetFetchSize value of 600, which is greater than the 523 rows I know that I have. As I mentioned, I can retrieve the first 3 rows (single CLOB column) okay and then I either hang on the retrieval of the 4th row or I get a "The object is already closed" exception.
It turns out that the actual string comprising the first three columns seem to be rather short in length and method getInputStream in class org.h2.value.ValueLobDb has the data already and simply returns a ByteArrayInputStream constructed on this data. The 4th row's data is still on the server side and so an actual RemoteInputStream has to be built to process fetch the data from the server-side LOB.
Here's what seems to be the problem: Class org.h2.server.TcpServerThread is caching these LOBs in in instance of a SmallLRUCache. This cache seems to be designed to maintain only the least recently referenced LOBs!!! The default size of this cache is given by system property h2.serverCachedObjects, which defaults to 64, whereas the default fetch size is 100. So even if I had not overridden the default h2.serverResultSetFetchSize property, if all of my rows had sufficiently large columns requiring cached LOBs, any fetch size > 64 would cause the LOB representing the first row to be flushed out of the cache and I would not even be able to retrieve the first row.
An LRU cache seems to be the wrong structure for holding LOBs that are in an active result set. Certainly having a default cache size that is less than the default fetch size seems less than ideal.
you should probably give more details, but did you check your network connection? Maybe your database server is blocking connections (or network connections) as soon as they try to fetch too much data. This could be a "sort of" protection.

Categories