PostgreSQL's XMIN in Oracle & MySQL - java

I'm trying to get the equivalent for this code on Oracle & MySQL
if(vardbtype.equals("POSTGRESQL")){
Long previousTxId = 0L;
Long nextTxId = 0L;
Class.forName("org.postgresql.Driver");
System.out.println("----------------------------");
try(Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/"+ vardbserver, vardbuser, vardbpassword);
PreparedStatement stmts = c.prepareStatement("SELECT * FROM "+ vardbname +" where xmin::varchar::bigint > ? and xmin::varchar::bigint < ? ");
PreparedStatement max = c.prepareStatement("select max(xmin::varchar::bigint) as txid from "+ vardbname)
) {
c.setAutoCommit(false);
while(true) {
stmts.clearParameters();
try(ResultSet rss = max.executeQuery()) {
if(rss.next()) {
nextTxId = rss.getLong(1);
}
}
stmts.setLong(1, previousTxId);
stmts.setLong(2, nextTxId + 1);
try(ResultSet rss = stmts.executeQuery()) {
while(rss.next()) {
String message = rss.getString("MESSAGE");
System.out.println("Message = " + message);
TextMessage mssg = session.createTextMessage(message);
System.out.println("Sent: " + mssg.getText());
producer.send(mssg);
}
previousTxId = nextTxId;
}
Thread.sleep(batchperiod2);
}
}
}
Basically, the code works to get contents inside a database's table and sent it to ActiveMQ. And when the table updated, it will sent the content that just updated (not sending the past that was sent). But this code only works on PostgreSQL
Then i'm planning to create an "if" function. So i can use another database to getting the data (Oracle and MySQL).
I guess i must change this code right?
try(Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/"+ vardbserver, vardbuser, vardbpassword);
PreparedStatement stmts = c.prepareStatement("SELECT * FROM "+ vardbname +" where xmin::varchar::bigint > ? and xmin::varchar::bigint < ? ");
PreparedStatement max = c.prepareStatement("select max(xmin::varchar::bigint) as txid from "+ vardbname)
) {

A couple thoughts supplemental to Thorsten's answer.
First, xmin is a system column which is, iirc, stored in the row header on disk. It is updated by writes. I have not yet run into a case where the transaction id's don't increase. However, there has to be some wraparound point. I think you are better off with a trigger which stores the transaction ids in another table for processing for this reason (and using that to process things).
For Oracle and MySQL, underlying storage is sufficiently different that I don't see how you can do this directly.
If you want a common solution you want a queue table where you can use a trigger to insert waiting copies, and then select/delete from that in your worker. This will likely work better on MySQL than on PostgreSQL, and for Oracle you want to look for index-oriented tables. If autovacuum has trouble keeping up, ask more questions or hire a consultant.
After further research
InnoDB provides a DB_TRX_ID column which is similar. Note you cannot assume you have this column if you are running MySQL because MySQL has different table storage engines and not all even support transactions. So that is an important limitation.
I was unable to locate a similar column on Oracle.

This script is looking in intervals at a table and putting out all inserted messages since that last loop.
PostgreSQL stores the transaction number that inserted a record, so this can be used to find the newly inserted records (although I am not sure whether it is guaranteed for a new transaction to have a higher number than all previous ones as the script assumes).
Other DBMS don't have this pseudo column. So you would have to have a timestamp column in your table and use this instead. You'd have to change the two queries as well as the code to match the data type (I suppose java.sql.Timestamp instead of Long, but I am no Java guy).

Related

Do not update row in ResultSet if data has changed

we are extracting data from various database types (Oracle, MySQL, SQL-Server, ...). Once it is successfully written to a file we want to mark it as transmitted, so we update a specific column.
Our problem is, that a user has the possibility to change the data in the meantime but might forget to commit. The record is blocked with a select for update statement. So it can happen, that we mark something as transmitted, which is not.
This is an excerpt from our code:
Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
if (!extractedData.rowUpdated()) {
extractedData.updateString("COLUMNNAME", "TRANSMITTED");
// code will stop here if user has changed data but did not commit
extractedData.updateRow();
// once committed the changed data is marked as transmitted
}
}
The method extractedData.rowUpdated() returns false, because technically the user didn't change anything yet.
Is there any way to not update the row and detect if data was changed at this late stage?
Unfortunately I cannot change the program the user is using to change the data.
So you want to
Run through all rows of the table that have not been exported
Export this data somewhere
Mark these rows exported so your next iteration will not export them again
As there might be pending changes on a row, you don't want to mess with that information
How about:
You iterate over all rows.
for every row
generate a hash value for the contents of the row
compare column "UPDATE_STATUS" with calulated hash
if no match
export row
store hash into "UPDATE_STATUS"
if store fails (row locked)
-> no worries, will be exported again next time
if store succeeds (on data already changed by user)
-> no worries, will be exported again as hash will not match
This might further slow your export as you'll have to iterate over everything instead of over everything WHERE UPDATE_STATUS IS NULL but you might be able to do two jobs - one (fast)
iterating over WHERE UPDATE_STATUS IS NULL and one slow and thorough WHERE UPDATE_STATUS IS NOT NULL (with the hash-rechecking in place)
If you want to avoid store-failures/waits, you might want to store the hash /updated information into a second table copying the primary key plus the hash field value - that way user
locks on the main table would not interfere with your updates at all (as those would be on another table)
"a user [...] might forget to commit" > A user either commits or he doesn't. "Forgetting" to commit is tantamount to a bug in his software.
To work around that you need to either:
Start a transaction with isolation level SERIALIZABLE, and within that transaction:
Read the data and export it. Data read this way is blocked from being updated.
Update the data you processed. Note: don't do that with an updateable ResultSet, do that with an UPDATE statement. That way you don't need an CONCUR_UPDATABLE + TYPE_SCROLL_SENSITIVE which is much slower than a CONCUR_READ_ONLY + TYPE_FORWARD_ONLY.
Commit the transaction.
That way the buggy software will be blocked from updating data you are processing.
Another way
Start a TRANSACTION at a lower isolation level (default READ COMMITTED) and within that transaction
Select the data with proper Table Hints Eg for SQL Server these: TABLOCKX + HOLDLOCK (large datasets), or ROWLOCK + XLOCK + HOLDLOCK (small datasets), or PAGLOCK + XLOCK + HOLDLOCK. Having HOLDLOCK as a table hint is practically equivalent to having a SERIALIZABLE transaction. Note that lock escalation may escalate the latter two to table locks if the number of locks becomes too high.
Update the data you processed; Note: use an UPDATE statement. Lose the updatable/scroll_sensitive resultset.
Commit the TRANSACTION.
Same deal, the buggy software will be blocked from updating data you are processing.
In the end we had to implement optimistic locking. In some tables we already have a column that stores the version number. Some other tables have a timestamp column that holds the time of the last change (changed by trigger).
While a timestamp might not always be a reliable source for optimistic locking we went with it anyway. Several changes during a single second are not very realistic in our environment.
Since we have to know the primary key without describing it before hand, we had to access the resultset metadata. Some of our databases do not support this (DB/2 legacy tables for example). We are still using the old system for these.
Note: The tableMetaData is an XML-config file where our description of the table is stored. This is not directly related to the metadata of the table in the database.
Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
if (tableMetaData.getVersion() != null) {
markDataAsExported(extractedData, tableMetaData);
} else {
markResultSetAsExported(extractedData, tableMetaData);
}
}
// new way with building of an update statement including the version column in the where clause
private void markDataAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
ResultSet resultSetPrimaryKeys = null;
PreparedStatement versionedUpdateStatement = null;
try {
ResultSetMetaData extractedMetaData = extractedData.getMetaData();
resultSetPrimaryKeys = conn.getMetaData().getPrimaryKeys(null, null, tableMetaData.getTable());
ArrayList<String> primaryKeyList = new ArrayList<String>();
String sqlStatement = "update " + tableMetaData.getTable() + " set " + tableMetaData.getUpdateColumn()
+ " = ? where ";
if (resultSetPrimaryKeys.isBeforeFirst()) {
while (resultSetPrimaryKeys.next()) {
primaryKeyList.add(resultSetPrimaryKeys.getString(4));
sqlStatement += resultSetPrimaryKeys.getString(4) + " = ? and ";
}
sqlStatement += tableMetaData.getVersionColumn() + " = ?";
versionedUpdateStatement = conn.prepareStatement(sqlStatement);
while (extractedData.next()) {
versionedUpdateStatement.setString(1, tableMetaData.getUpdateValue());
for (int i = 0; i < primaryKeyList.size(); i++) {
versionedUpdateStatement.setObject(i + 2, extractedData.getObject(primaryKeyList.get(i)),
extractedMetaData.getColumnType(extractedData.findColumn(primaryKeyList.get(i))));
}
versionedUpdateStatement.setObject(primaryKeyList.size() + 2,
extractedData.getObject(tableMetaData.getVersionColumn()), tableMetaData.getVersionType());
if (versionedUpdateStatement.executeUpdate() == 0) {
logger.warn(Message.COLLECTOR_DATA_CHANGED, tableMetaData.getTable());
}
}
} else {
logger.warn(Message.COLLECTOR_PK_ERROR, tableMetaData.getTable());
markResultSetAsExported(extractedData, tableMetaData);
}
} finally {
if (resultSetPrimaryKeys != null) {
resultSetPrimaryKeys.close();
}
if (versionedUpdateStatement != null) {
versionedUpdateStatement.close();
}
}
}
//the old way as fallback
private void markResultSetAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
while (extractedData.next()) {
extractedData.updateString(tableMetaData.getUpdateColumn(), tableMetaData.getUpdateValue());
extractedData.updateRow();
}
}

Using a database API cursor with JDBC and SQLServer to select batch results

SOLVED (See answer below.)
I did not understand my problem within the proper context. The real issue was that my query was returning multiple ResultSet objects, and I had never come across that before. I have posted code below that solves the problem.
PROBLEM
I have an SQL Server database table with many thousand rows. My goal is to pull the data back from the source database and write it to a second database. Because of application memory constraints, I will not be able to pull the data back all at once. Also, because of this particular table's schema (over which I have no control) there is no good way for me to tick off the rows using some sort of ID column.
A gentleman over at the Database Administrators StackExchange helped me out by putting together something called a database API cursor, and basically wrote this complicated query that I only need to drop my statement into. When I run the query in SQL Management Studio (SSMS) it works great. I get all the data back, a thousand rows at a time.
Unfortunately, when I try to translate this into JDBC code, I get back the first thousand rows only.
QUESTION
Is it possible using JDBC to retrieve a database API cursor, pull the first set of rows from it, allow the cursor to advance, and then pull the subsequent sets one at a time? (In this case, a thousand rows at a time.)
SQL CODE
This gets complicated, so I'm going to break it up.
The actual query can be simple or complicated. It doesn't matter. I've tried several different queries during my experimentation and they all work. You just basically drop it into the the SQL code in the appropriate place. So, let's take this simple statement as our query:
SELECT MyColumn FROM MyTable;
The actual SQL database API cursor is far more complicated. I will print it out below. You can see the above query buried in it:
-- http://dba.stackexchange.com/a/82806
DECLARE #cur INTEGER
,
-- FAST_FORWARD | AUTO_FETCH | AUTO_CLOSE
#scrollopt INTEGER = 16 | 8192 | 16384
,
-- READ_ONLY, CHECK_ACCEPTED_OPTS, READ_ONLY_ACCEPTABLE
#ccopt INTEGER = 1 | 32768 | 65536
,#rowcount INTEGER = 1000
,#rc INTEGER;
-- Open the cursor and return the first 1,000 rows
EXECUTE #rc = sys.sp_cursoropen #cur OUTPUT
,'SELECT MyColumn FROM MyTable'
,#scrollopt OUTPUT
,#ccopt OUTPUT
,#rowcount OUTPUT;
IF #rc <> 16 -- FastForward cursor automatically closed
BEGIN
-- Name the cursor so we can use CURSOR_STATUS
EXECUTE sys.sp_cursoroption #cur
,2
,'MyCursorName';
-- Until the cursor auto-closes
WHILE CURSOR_STATUS('global', 'MyCursorName') = 1
BEGIN
EXECUTE sys.sp_cursorfetch #cur
,2
,0
,1000;
END;
END;
As I've said, the above creates a cursor in the database and asks the database to execute the statement, keep track (internally) of the data it's returning, and return the data a thousand rows at a time. It works great.
JDBC CODE
Here's where I'm having the problem. I have no compilation problems or run-time problems with my Java code. The problem I am having is that it returns only the first thousand rows. I don't understand how to utilize the database cursor properly. I have tried variations on the Java basics:
// Hoping to get all of the data, but I only get the first thousand.
ResultSet rs = stmt.executeQuery(fq.getQuery());
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
}
I'm not surprised by the results, but all of the variations I've tried produce the same results.
From my research it seems like the JDBC does something with database cursors when the database is Oracle, but you have to set the data type returned in the result set as an Oracle cursor object. I'm guessing there is something similar with SQL Server, but I have been unable to find anything yet.
Does anyone know of a way?
I'm including example Java code in full (as ugly as that gets).
// FancyQuery.java
import java.sql.*;
public class FancyQuery {
// Adapted from http://dba.stackexchange.com/a/82806
String query = "DECLARE #cur INTEGER\n"
+ " ,\n"
+ " -- FAST_FORWARD | AUTO_FETCH | AUTO_CLOSE\n"
+ " #scrollopt INTEGER = 16 | 8192 | 16384\n"
+ " ,\n"
+ " -- READ_ONLY, CHECK_ACCEPTED_OPTS, READ_ONLY_ACCEPTABLE\n"
+ " #ccopt INTEGER = 1 | 32768 | 65536\n"
+ " ,#rowcount INTEGER = 1000\n"
+ " ,#rc INTEGER;\n"
+ "\n"
+ "-- Open the cursor and return the first 1,000 rows\n"
+ "EXECUTE #rc = sys.sp_cursoropen #cur OUTPUT\n"
+ " ,'SELECT MyColumn FROM MyTable;'\n"
+ " ,#scrollopt OUTPUT\n"
+ " ,#ccopt OUTPUT\n"
+ " ,#rowcount OUTPUT;\n"
+ " \n"
+ "IF #rc <> 16 -- FastForward cursor automatically closed\n"
+ "BEGIN\n"
+ " -- Name the cursor so we can use CURSOR_STATUS\n"
+ " EXECUTE sys.sp_cursoroption #cur\n"
+ " ,2\n"
+ " ,'MyCursorName';\n"
+ "\n"
+ " -- Until the cursor auto-closes\n"
+ " WHILE CURSOR_STATUS('global', 'MyCursorName') = 1\n"
+ " BEGIN\n"
+ " EXECUTE sys.sp_cursorfetch #cur\n"
+ " ,2\n"
+ " ,0\n"
+ " ,1000;\n"
+ " END;\n"
+ "END;\n";
public String getQuery() {
return this.query;
}
public static void main(String[ ] args) throws Exception {
String dbUrl = "jdbc:sqlserver://tc-sqlserver:1433;database=MyBigDatabase";
String user = "mario";
String password = "p#ssw0rd";
String driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";
FancyQuery fq = new FancyQuery();
Class.forName(driver);
Connection conn = DriverManager.getConnection(dbUrl, user, password);
Statement stmt = conn.createStatement();
// We expect to get 1,000 rows at a time.
ResultSet rs = stmt.executeQuery(fq.getQuery());
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
}
// Alas, we've only gotten 1,000 rows, total.
rs.close();
stmt.close();
conn.close();
}
}
I figured it out.
stmt.execute(fq.getQuery());
ResultSet rs = null;
for (;;) {
rs = stmt.getResultSet();
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
}
if ((stmt.getMoreResults() == false) && (stmt.getUpdateCount() == -1)) {
break;
}
}
if (rs != null) {
rs.close();
}
After some additional googling, I found a bit of code posted back in 2004:
http://www.coderanch.com/t/300865/JDBC/databases/SQL-Server-JDBC-Registering-cursor
The gentleman who posted the snippet that I found helpful (Julian Kennedy) suggested: "Read the Javadoc for getUpdateCount() and getMoreResults() for a clear understanding." I was able to piece it together from that.
Basically, I don't think I understood my problem well enough at the outset in order to phrase it correctly. What it comes down to is that my query will be returning the data in multiple ResultSet instances. What I needed was a way to not merely iterate through each row in a ResultSet but, rather, iterate through the entire set of ResultSets. That's what the code above does.
If you want all records from the table, just do "Select * from table".
The only reason to retrieve in chunks is if there is some intermediate place for the data: e.g. if you are showing it on the screen, or storing it in memory.
If you are simply reading from one and inserting to another, just read everything from the first.You will not get any better performance by trying to retrieve in batches. If there is a difference, it will be negative. Frame your query in a way that brings back everything. The JDBC software will handle all the other breaking-up and reconstituting that you need.
However, you should batch the update/insert side of things.
The set-up would create two statements on the two connections:
Statement stmt = null;
ResultSet rs = null;
PreparedStatement insStmt = null;
stmt = conDb1.createStatement();
insStmt = conDb2.prepareStament("insert into tgt_db2_table (?,?,?,?,?......etc. ?,?) ");
rs = stmt.executeQuery("select * from src_db1_table");
Then, loop over the select as normal, but use batching on the target.
int batchedRecordCount = 0;
while (rs.next()) {
System.out.println(rs.getString("MyColumn"));
//Here you read values from the cursor and set them to the insStmt ...
String field1 = rs.getString(1);
String field2 = rs.getString(2);
int field3 = rs.getInt(3);
//--- etc.
insStmt.setString(1, field1);
insStmt.setString(2, field2);
insStmt.setInt(3, field3);
//----- etc. for all the fields
batchedRecordCount++;
insStmt.addBatch();
if (batchRecordCount > 1000) {
insStmt.executeBatch();
}
}
if (batchRecordCount > 0) {
//Finish of the final (partial) set of records
insStmt.executeBatch();
}
//Close resources...

Execute "sp_msforeachdb" in a Java application

Hi StackOverflow community :)
I come to you to share one of my problems...
I have to extract a list of every table in each database of a SQL Server instance, I found this query :
EXEC sp_msforeachdb 'Use ?; SELECT DB_NAME() AS DB, * FROM sys.tables'
It works perfectly on Microsoft SQL Server Management Studio but when I try to execute it in my Java program (that includes JDBC drivers for SQL Server) it says that it doesn't return any result.
My Java code is the following :
this.statement = this.connect.createStatement(); // Create the statement
this.resultats = this.statement.executeQuery("EXEC sp_msforeachdb 'Use ?; SELECT DB_NAME() AS DB, * FROM sys.tables'"); // Execute the query and store results in a ResultSet
this.sortie.ecrireResultats(this.statement.getResultSet()); // Write the ResultSet to a file
Thanks to anybody who will try to help me,
Have a nice day :)
EDIT 1 :
I'm not sure that the JDBC driver for SQL Server supports my query so I'll try to get to my goal in another way.
What I'm trying to get is a list of all the tables for each database on a SQL Server instance, the output format will be the following :
+-----------+--------+
| Databases | Tables |
+-----------+--------+
So now I'm asking can someone help me to get to that solution using SQL queries thru Java's JDBC for SQL Server driver.
I also wish to thanks the very quick answers I got from Tim Lehner and Mark Rotteveel.
If a statement can return no or multiple results, you should not use executeQuery, but execute() instead, this method returns a boolean indicating the type of the first result:
true: result is a ResultSet
false : result is an update count
If the result is true, then you use getResultSet() to retrieve the ResultSet, otherwise getUpdateCount() to retrieve the update count. If the update count is -1 it means there are no more results. Note that the update count will also be -1 when the current result is a ResultSet. It is also good to know that getResultSet() should return null if there are no more results or if the result is an update count.
Now if you want to retrieve more results, you call getMoreResults() (or its brother accepting an int parameter). The return value of boolean has the same meaning as that of execute(), so false does not mean there are no more results!
There are only no more results if the getMoreResults() returns false and getUpdateCount() returns -1 (as also documented in the Javadoc)
Essentially this means that if you want to correctly process all results you need to do something like below:
boolean result = stmt.execute(...);
while(true)
if (result) {
ResultSet rs = stmt.getResultSet();
// Do something with resultset ...
} else {
int updateCount = stmt.getUpdateCount();
if (updateCount == -1) {
// no more results
break;
}
// Do something with update count ...
}
result = stmt.getMoreResults();
}
NOTE: Part of this answer is based on my answer to Java SQL: Statement.hasResultSet()?
If you're not getting an error, one issue might be that sp_msforeachdb will return a separate result set for each database rather than one set with all records. That being the case, you might try a bit of dynamic SQL to union-up all of your rows:
-- Use sys.tables
declare #sql nvarchar(max)
select #sql = coalesce(#sql + ' union all ', '') + 'select ''' + quotename(name) + ''' as database_name, * from ' + quotename(name) + '.sys.tables'
from sys.databases
select #sql = #sql + ' order by database_name, name'
exec sp_executesql #sql
I still sometimes use INFORMATION_SCHEMA views as well, since it's easier to see the schema name, among other things:
-- Use INFORMATION_SCHEMA.TABLES to easily get schema name
declare #sql nvarchar(max)
select #sql = coalesce(#sql + ' union all ', '') + 'select * from ' + quotename(name) + '.INFORMATION_SCHEMA.TABLES where TABLE_TYPE = ''BASE TABLE'''
from sys.databases
select #sql = #sql + ' order by TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME'
exec sp_executesql #sql
Be aware that this method of string concatenation (select #sql = foo from bar) may not work as you intend through a linked server (it will only grab the last record). Just a small caveat.
UPDATE
I've found the solution !
After reading an article about sp_spaceused being used with Java, I figured out that I was in the same case.
My final code is the following :
this.instances = instances;
for(int i = 0 ; i < this.instances.size() ; i++)
{
try
{
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");
this.connect = DriverManager.getConnection("jdbc:sqlserver://" + this.instances.get(i), "tluser", "result");
this.statement = this.connect.prepareCall("{call sp_msforeachdb(?)}");
this.statement.setString(1, "Use ?; SELECT DB_NAME() AS DB, name FROM sys.tables WHERE DB_NAME() NOT IN('master', 'model', 'msdb', 'tempdb')");
this.resultats = this.statement.execute();
while(true)
{
int rowCount = this.statement.getUpdateCount();
if(rowCount > 0)
{
this.statement.getMoreResults();
continue;
}
if(rowCount == 0)
{
this.statement.getMoreResults();
continue;
}
ResultSet rs = this.statement.getResultSet();
if(rs != null)
{
while (rs.next())
{
this.sortie.ecrireResultats(rs); // Write the results to a file
}
rs.close();
this.statement.getMoreResults();
continue;
}
break;
}
this.statement.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
It tried it out and my file has everything I want in it.
Thank you all for your help ! :)

Failing to load large dataset into h2 database

Here is the problem: At my company we have a large database that we want to perform some automated operations in it. To test that we got a small sample of that data about 6 10MB sized csv files. We want to use H2 to test the results of our program in it. H2 Seemed to work fine with our previous cvs though they were, at most, 1000 entries long. When it comes to any of our 10MB files the command
insert into myschema.mytable (select * from csvread('mycsvfile.csv'));
reports a failure because one of the registries is supposedly duplicated and offends our primary key constraints.
Unique index or primary key violation: "PRIMARY_KEY_6 ON MYSCHEMA.MYTABLE(DATETIME, LARGENUMBER, KIND)"; SQL statement:
insert into myschema.mytable (select * from csvread('src/test/resources/h2/data/mycsvfile.csv')) [23001-148] 23001/23001
Breaking the mycsvfile.csv into smaller pieces I was able to see that the problem starts to appear after about 10000 rows inserted(though the number varies depending on what data I used). I could however insert more than 10000 rows if I broke the file into pieces and then ran the command individually. But even if I manage to insert all that data manually I need an automated method to fill the database.
Since running the command would not give me the row that was causing the problem I guessed that the problem could be some cache in the csvread routine.
Then I created a small java program that could insert the data in the H2 database manually. No matter whether I batched the commands, closed and opened the connection for 1000 rows h2 reported that I was trying to duplicate an entry in the database.
org.h2.jdbc.JdbcSQLException: Unique index or primary key violation: "PRIMARY_KEY_6 ON MYSCHEMA.MYTABLE(DATETIME, LARGENUMBER, KIND)"; SQL statement:
INSERT INTO myschema.mytable VALUES ( '1997-10-06 01:00:00.0',25485116,1.600,0,18 ) [23001-148]
Doing a normal search for that registry using emacs I can find that the registry is not duplicated as the datetime column is unique in the whole dataset.
I cannot give that data for you to test since the company sells that information. But here is how my table definition is like.
create table myschema.mytable (
datetime timestamp,
largenumber numeric(8,0) references myschema.largenumber(largecode),
value numeric(8,3) not null,
flag numeric(1,0) references myschema.flag(flagcode),
kind smallint references myschema.kind(kindcode),
primary key (datetime, largenumber, kind)
);
This is how our csv looks like:
datetime,largenumber,value,flag,kind
1997-06-11 16:45:00.0,25485116,0.710,0,18
1997-06-11 17:00:00.0,25485116,0.000,0,18
1997-06-11 17:15:00.0,25485116,0.000,0,18
1997-06-11 17:30:00.0,25485116,0.000,0,18
And the java code that would fill our test database(forgive my ugly code, I got desperate :)
private static void insertFile(MyFile file) throws SQLException {
int updateCount = 0;
ResultSet rs = Csv.getInstance().read(file.toString(), null, null);
ResultSetMetaData meta = rs.getMetaData();
Connection conn = DriverManager.getConnection(
"jdbc:h2:tcp://localhost/mytestdatabase", "sa", "pass");
rs.next();
while (rs.next()) {
Statement stmt = conn.createStatement();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < meta.getColumnCount(); i++) {
if (i == 0)
sb.append("'" + rs.getString(i + 1) + "'");
else
sb.append(rs.getString(i + 1));
sb.append(',');
}
updateCount++;
if (sb.length() > 0)
sb.deleteCharAt(sb.length() - 1);
stmt.execute(String.format(
"INSERT INTO myschema.mydatabase VALUES ( %s ) ",
sb.toString()));
if (updateCount == 1000) {
conn.close();
conn = DriverManager.getConnection(
"jdbc:h2:tcp://localhost/mytestdatabase", "sa", "pass");
updateCount = 0;
}
}
if (!conn.isClosed()) {
conn.close();
}
rs.close();
}
I'll be glad to provide more information if requested.
EDIT
#Randy I always check if the database is clean before running the command and in my java program I have a routine to delete all data from a file that fails to be inserted.
select * from myschema.mytable where largenumber = 25485116;
DATETIME LARGENUMBER VALUE FLAG KIND
(no rows, 8 ms)
The only thing that I can think of is that there is a trigger on the table that sets the timestamp to "now". Although that would not explain why you are successful with a few rows, it would explain why the primary key is being violated.

Obtain id of an insert in the same statement [duplicate]

This question already has answers here:
How to get the insert ID in JDBC?
(14 answers)
Closed 7 years ago.
Is there any way of insert a row in a table and get the new generated ID, in only one statement? I want to use JDBC, and the ID will be generated by a sequence or will be an autoincrement field.
Thanks for your help.
John Pollancre
using getGeneratedKeys():
resultSet = pstmt.getGeneratedKeys();
if (resultSet != null && resultSet.next()) {
lastId = resultSet.getInt(1);
}
You can use the RETURNING clause to get the value of any column you have updated or inserted into. It works with trigger (i-e: you get the values actually inserted after the execution of triggers). Consider:
SQL> CREATE TABLE a (ID NUMBER PRIMARY KEY);
Table created
SQL> CREATE SEQUENCE a_seq;
Sequence created
SQL> VARIABLE x NUMBER;
SQL> BEGIN
2 INSERT INTO a VALUES (a_seq.nextval) RETURNING ID INTO :x;
3 END;
4 /
PL/SQL procedure successfully completed
x
---------
1
SQL> /
PL/SQL procedure successfully completed
x
---------
2
Actually, I think nextval followed by currval does work. Here's a bit of code that simulates this behaviour with two threads, one that first does a nextval, then a currval, while a second thread does a nextval in between.
public void checkSequencePerSession() throws Exception {
final Object semaphore = new Object();
Runnable thread1 = new Runnable() {
public void run() {
try {
Connection con = getConnection();
Statement s = con.createStatement();
ResultSet r = s.executeQuery("SELECT SEQ_INV_BATCH_DWNLD.nextval AS val FROM DUAL ");
r.next();
System.out.println("Session1 nextval is: " + r.getLong("val"));
synchronized(semaphore){
semaphore.notify();
}
synchronized(semaphore){
semaphore.wait();
}
r = s.executeQuery("SELECT SEQ_INV_BATCH_DWNLD.currval AS val FROM DUAL ");
r.next();
System.out.println("Session1 currval is: " + r.getLong("val"));
con.commit();
} catch (Exception e) {
e.printStackTrace();
}
}
};
Runnable thread2 = new Runnable(){
public void run(){
try{
synchronized(semaphore){
semaphore.wait();
}
Connection con = getConnection();
Statement s = con.createStatement();
ResultSet r = s.executeQuery("SELECT SEQ_INV_BATCH_DWNLD.nextval AS val FROM DUAL ");
r.next();
System.out.println("Session2 nextval is: " + r.getLong("val"));
con.commit();
synchronized(semaphore){
semaphore.notify();
}
}catch(Exception e){
e.printStackTrace();
}
}
};
Thread t1 = new Thread(thread1);
Thread t2 = new Thread(thread2);
t1.start();
t2.start();
t1.join();
t2.join();
}
The result is as follows:
Session1 nextval is: 47
Session2 nextval is: 48
Session1 currval is: 47
I couldn't comment otherwise I would have added to Vinko Vrsalovic's post:
The id generated by a sequence can be obtained via
insert into table values (sequence.NextVal, otherval)
select sequence.CurrVal
ran in the same transaction as to get a consistent view.
Updating de sequence after getting a nextval from it is an autonomous transaction. Otherwise another session would get the same value from the sequence. So getting currval will not get the inserted id if anothers sesssion has selected from the sequence in between the insert and select.
Regards,
Rob
The value of the auto-generated ID is not known until after the INSERT is executed, because other statements could be executing concurrently and the RDBMS gets to decide how to schedule which one goes first.
Any function you call in an expression in the INSERT statement would have to be evaluated before the new row is inserted, and therefore it can't know what ID value is generated.
I can think of two options that are close to what you're asking:
Write a trigger that runs AFTER INSERT, so you have access to the generated ID key value.
Write a procedure to wrap the insert, so you can execute other code in the procedure and query the last generated ID.
However, I suspect what you're really asking is whether you can query for the last generated ID value by your current session even if other sessions are also inserting rows and generating their own ID values. You can be assured that every RDBMS that offers an auto-increment facility offers a way to query this value, and it tells you the last ID generated in your current session scope. This is not affected by inserts done in other sessions.
The id generated by a sequence can be obtained via
insert into table values (sequence.NextVal, otherval)
select sequence.CurrVal
ran in the same transaction as to get a consistent view.
I think you'll find this helpful:
I have a table with a
auto-incrementing id. From time to
time I want to insert rows to this
table, but want to be able to know
what the pk of the newly inserted row
is.
String query = "BEGIN INSERT INTO movement (doc_number) VALUES ('abc') RETURNING id INTO ?; END;";
OracleCallableStatement cs = (OracleCallableStatement) conn.prepareCall(query);
cs.registerOutParameter(1, OracleTypes.NUMBER );
cs.execute();
System.out.println(cs.getInt(1));
Source: Thread: Oracle / JDBC Error when Returning values from an Insert
I couldn't comment, otherwise I would have just added to dfa's post, but the following is an example of this functionality with straight JDBC.
http://www.ibm.com/developerworks/java/library/j-jdbcnew/
However, if you are using something such as Spring, they will mask a lot of the gory details for you. If that can be of any assistance, just good Spring Chapter 11, which is the JDBC details. Using it has saved me a lot of headaches.

Categories