I have a java application which read files and writes to oracle db row by row.
We have come across a strange error during batch insert which does not occur during sequential insert. The error is strange because it occurs only with IBM JDK7 on AIX platform and I get this error on different rows every time. My code looks like below:
prpst = conn.prepareStatement(query);
while ((line = bf.readLine()) != null) {
numLine++;
batchInsert(prpst, line);
//onebyoneInsert(prpst, line);
}
private static void batchInsert(PreparedStatement prpst, String line) throws IOException, SQLException {
prpst.setString(1, "1");
prpst.setInt(2, numLine);
prpst.setString(3, line);
prpst.setString(4, "1");
prpst.setInt(5, 1);
prpst.addBatch();
if (++batchedLines == 200) {
prpst.executeBatch();
batchedLines = 0;
prpst.clearBatch();
}
}
private static void onebyoneInsert(PreparedStatement prpst, String line) throws Exception{
int batchedLines = 0;
prpst.setString(1, "1");
prpst.setInt(2, numLine);
prpst.setString(3, line);
prpst.setString(4, "1");
prpst.setInt(5, 1);
prpst.executeUpdate();
}
I get this error during batch insert mode :
java.sql.BatchUpdateException: ORA-01461: can bind a LONG value only for insert into a LONG column
at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:10345)
I already know why this Ora error occurs but this is not my case. I am nearly sure that I am not setting some large data to a smaller column. May be I am hitting some bugs in IBM jdk7 but could not prove that.
My question if there is a way that I can avoid this problem ? One by one insert is not an option because we have big files and it takes too much time.
Try with
prpst.setInt(5,new Integer(1))
What is the type of variable "numLine"?
Can you share type of columns corresponding to the fields you set in PreparedStatement?
Try once by processing with "onebyoneInsert". Share the output for this case. It might help identifying root cause.
Also print value of "numLine" to console.
Related
I have a problem, the below code runs fine if I run it without the autoCommit property, however I would prefer to run it as a transaction, the code basically inserts an article's header information and then the list each articles associated with it (so it's like a one-to-many relationship), so I could like to commit everything in one go rather than first the article information and then its items. The issue is that when I reach to the cn.commit() line, I get an exception that says "Closed Statement"
database insertion method
public static void addArticle(Article article) throws SQLException {
Connection cn = null;
PreparedStatement ps = null;
StringBuffer insert = new StringBuffer();
StringBuffer itemsSQL = new StringBuffer();
try {
article.setArticleSortNum(getNextArticleNum(article.getShopId()));
article.setArticleId(DAOHelper.getNextId("article_id_sequence"));
cn = DBHelper.makeConnection();
cn.setAutoCommit(false);
insert.append("insert query for article goes here");
ps = cn.prepareStatement(insert.toString());
int i = 1;
ps.setLong(i, article.getArticleId()); i++;
ps.setLong(i, article.getShopId()); i++;
ps.setInt(i, article.getArticleNum()); i++;
// etcetera...
ps.executeUpdate();
itemsSQL.append("insert query for each line goes here");
itemStatement = cn.prepareStatement(itemsSQL.toString());
for(Article item : article.getArticlesList()) {
item.setArticleId(article.getArticleId());
i= 1;
itemStatement.setLong(i, item.getArticleId()); i++;
itemStatement.setInt(i, item.getItemsOnStock()); i++;
itemStatement.setInt(i, item.getQuantity()); i++;
// etcetera...
itemStatement.executeUpdate();
}
cn.commit();
} catch (SQLException e) {
cn.rollback();
log.error(e.getMessage());
throw e;
}
finally {
DBHelper.releasePreparedStatement(ps);
DBHelper.releasePreparedStatement(itemStatement);
DBHelper.releaseConnection(cn);
}
}
I also had the items insertion where the For is running with addBatch() then executeBatch but also the same Closed Statement error upon reaching cn.commit()... I dont understand why its closing, all connections and everything is released in the finally clause, so I get the feeling I'm making some fundamental error I'm not aware of... Any ideas? Thanks in advance!
EDIT: Below is the stack trace:
java.sql.SQLException: Closed Statement at
oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:189) at
oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:231) at
oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:294) at
oracle.jdbc.driver.OracleStatement.ensureOpen(OracleStatement.java:6226)
at
oracle.jdbc.driver.OraclePreparedStatement.sendBatch(OraclePreparedStatement.java:592)
at
oracle.jdbc.driver.OracleConnection.commit(OracleConnection.java:1376)
at com.evermind.sql.FilterConnection.commit(FilterConnection.java:201)
at
com.evermind.sql.OrionCMTConnection.commit(OrionCMTConnection.java:461)
at com.evermind.sql.FilterConnection.commit(FilterConnection.java:201)
at com.dao.ArticlesDAO.addArticle(ArticlesDAO.java:571) at
com.action.registry.CustomBaseAction.execute(CustomBaseAction.java:57)
at
org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:431)
at
org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:236)
at
org.apache.struts.action.ActionServlet.process(ActionServlet.java:1196)
at
org.apache.struts.action.ActionServlet.doPost(ActionServlet.java:432)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:760) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:853) at
com.evermind.server.http.ServletRequestDispatcher.invoke(ServletRequestDispatcher.java:765)
at
com.evermind.server.http.ServletRequestDispatcher.forwardInternal(ServletRequestDispatcher.java:317)
at
com.evermind.server.http.HttpRequestHandler.processRequest(HttpRequestHandler.java:790)
at
com.evermind.server.http.HttpRequestHandler.run(HttpRequestHandler.java:270)
at
com.evermind.server.http.HttpRequestHandler.run(HttpRequestHandler.java:112)
at
com.evermind.util.ReleasableResourcePooledExecutor$MyWorker.run(ReleasableResourcePooledExecutor.java:192)
at java.lang.Thread.run(Unknown Source)
EDIT 2:
These are the parameters in the driver's datasource config, I thought the debugging process might be making it time out, but even finishing in less than a second throws the closed statement exception
min-connections="20"
max-connections="200"
inactivity-timeout="20"
stmt-cache-size="40"/>
It's usually best to create a statement, use it and close it as soon as possible, and it does no harm to do so before the transaction gets committed. From reading the Oracle tuturial about the batch model it's sounding like it could be a problem to have multiple statements open at one time. I would try closing the ps object before working with the itemStatement, then moving the initialization
itemStatement = cn.prepareStatement(itemsSQL.toString());
to directly above the for loop, and also move where you close the itemStatement to immediately after the for loop:
PreparedStatement itemStatement = cn.prepareStatement(itemsSQL.toString());
try {
for(Article item : article.getArticlesList()) {
item.setArticleId(article.getArticleId());
i= 1;
itemStatement.setLong(i, item.getArticleId()); i++;
itemStatement.setInt(i, item.getItemsOnStock()); i++;
itemStatement.setInt(i, item.getQuantity()); i++;
// etcetera...
itemStatement.executeUpdate();
}
} finally {
DBHelper.releasePreparedStatement(itemStatement);
}
It looks like what is going on is you have some batching parameter set on the connection that is causing the connection to try to look for unfinished business in the statement to finish up; it's finding the statement is already closed and the connection is complaining about it. This is weird because at the point the commit blows up on you the code hasn't reached the finally where the statement gets closed.
Reading up on Oracle batching models may be helpful. Also check the JDBC driver version and make sure it's right for the version of Oracle you're using, and see if there are any updates available for it.
json2sstable tool supplied with Cassandra 1.2.15 fails with out-of-memory error. Back in 2011 a similar issue was reported as bug and fixed: https://issues.apache.org/jira/browse/CASSANDRA-2189
Either I am missing some steps in the tool configuration/usage or the bug has re-emerged. Please point out what I am missing.
Repro steps:
1) Cassandra 1.2.15, one table with varchar key and one varchar column filled with random uuids, 6x10^6 records.
2) JSON file generated with sstable2json tool (~1G).
3) Cassandra restarted with new configuration (new data/cache/commit dirs, new partitioner)
4) Keyspace re-created
5) json2sstable fails after several minutes of processing:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at org.codehaus.jackson.util.TextBuffer.contentsAsString(TextBuffer.java:350)
at org.codehaus.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:278)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:59)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:165)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:165)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:204)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:104)
at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:18)
at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2695)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1294)
at org.codehaus.jackson.JsonParser.readValueAs(JsonParser.java:1368)
at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:344)
at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:328)
at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:547)
From json2sstable source code, the tool loads all the records from json file into memory and sorts records by keys:
private int importUnsorted(String jsonFile, ColumnFamily columnFamily, String ssTablePath, IPartitioner<?> partitioner) throws IOException
{
int importedKeys = 0;
long start = System.currentTimeMillis();
JsonParser parser = getParser(jsonFile);
Object[] data = parser.readValueAs(new TypeReference<Object[]>(){});
keyCountToImport = (keyCountToImport == null) ? data.length : keyCountToImport;
SSTableWriter writer = new SSTableWriter(ssTablePath, keyCountToImport);
System.out.printf("Importing %s keys...%n", keyCountToImport);
// sort by dk representation, but hold onto the hex version
SortedMap<DecoratedKey,Map<?, ?>> decoratedKeys = new TreeMap<DecoratedKey,Map<?, ?>>();
for (Object row : data)
{
Map<?,?> rowAsMap = (Map<?, ?>)row;
decoratedKeys.put(partitioner.decorateKey( hexToBytes((String)rowAsMap.get("key"))), rowAsMap);
....
According to Jonathan Elis' comment in CASSANDRA-2322 issue the behavior is by design.
Thus json2sstable is not very well suited for importing production size data to Cassandra. The tool is likely to crash on large datasets.
This is my FULL test code with the main method:
public class TestSetAscii {
public static void main(String[] args) throws SQLException, FileNotFoundException {
String dataFile = "FastLoad1.csv";
String insertTable = "INSERT INTO " + "myTableName" + " VALUES(?,?,?)";
Connection conStd = DriverManager.getConnection("jdbc:xxxxx", "xxxxxx", "xxxxx");
InputStream dataStream = new FileInputStream(new File(dataFile));
PreparedStatement pstmtFld = conStd.prepareStatement(insertTable);
// Until this line everything is awesome
pstmtFld.setAsciiStream(1, dataStream, -1); // This line fails
System.out.println("works");
}
}
I get the "cbColDef value out of range" error
Exception in thread "main" java.sql.SQLException: [Teradata][ODBC Teradata Driver] Invalid precision: cbColDef value out of range
at sun.jdbc.odbc.JdbcOdbc.createSQLException(Unknown Source)
at sun.jdbc.odbc.JdbcOdbc.standardError(Unknown Source)
at sun.jdbc.odbc.JdbcOdbc.SQLBindInParameterAtExec(Unknown Source)
at sun.jdbc.odbc.JdbcOdbcPreparedStatement.setStream(Unknown Source)
at sun.jdbc.odbc.JdbcOdbcPreparedStatement.setAsciiStream(Unknown Source)
at file.TestSetAscii.main(TestSetAscii.java:21)
Here is the link to my FastLoad1.csv file. I guess that setAsciiStream fails because of the FastLoad1.csv file , but I am not sure
(In my previous question I was not able to narrow down the problem that I had. Now I have shortened the code.)
It would depend on the table schema, but the third parameter of setAsciiStream is length.
So
pstmtFld.setAsciiStream(1, dataStream, 4);
would work for a field of length 4 bytes.
But I dont think it would work as you expect it in the code. For each bind you should have separate stream.
This function setAsciiStream() is designed to be used for large data values BLOBS or long VARCHARS. It is not designed to read csv file line by line and split them into separate values.
Basicly it just binds one of the question marks with the inputStream.
After looking into the provided example it looks like teradata could handle csv but you have to explicitly tell that with:
String urlFld = "jdbc:teradata://whomooz/TMODE=ANSI,CHARSET=UTF8,TYPE=FASTLOADCSV";
I don't have enough reputation to comment, but I feel that this info can be valuable to those navigating fast load via JDBC for the first time.
This code will get the full stack trace and is very helpful for diagnosing problems with fast load:
catch (SQLException ex){
for ( ; ex != null ; ex = ex.getNextException ())
ex.printStackTrace () ;
}
In the case of the code above, it works if you specify TYPE=FASTLOADCSV in the connection string, but when run multiple times will fail due to the creation of the error tables _ERR_1 and _ERR_2. Drop these tables and clear out the destination tables to run again.
I run the following code on my local (mac) machine and on a remote unix server.:
public void deleteValue(final String id, final String value) {
log.info("Removing value " + value);
final Collection<String> valuesBeforeRemoval = getValues(id);
final MutationBatch m = keyspace.prepareMutationBatch();
m.withRow(VALUES_CF, id).deleteColumn(value);
try {
m.execute();
} catch (final ConnectionException e) {
log.error("Unable to delete location " + value, e);
}
final Collection<String> valuesAfterRemoval = getValues(id);
if (valuesAfterRemoval.size()!=(valuesBeforeRemoval.size()-1)) {
log.error("value " + value + " was supposed to be removed from list " + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);
}
...
}
protected Collection<String> getValues(final String id) {
try {
final OperationResult<ColumnList<String>> operationResult = keyspace
.prepareQuery(VALUES_CF).getKey(id).execute();
final ColumnList<String> result = operationResult.getResult();
if (result.isEmpty()) {
log.info("No value found for id: " + id);
return new ArrayList<String>();
}
return result.getColumnNames();
} catch (final ConnectionException e) {
log.error("Unable to retrieve session " + id, e);
}
return new ArrayList<String>();
}
Locally, that line is never executed, which makes sense:
log.error("value " + value + " was supposed to be removed from list " + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);
but that line is executed on my dev server:
[ERROR] [main] [n.o.w.s.d.SessionDaoCassandraImpl] [2013-03-08 13:12:24,801]
[] - value 3 was supposed to be removed from list [3, 2, 1, 0, 7, 6, 5, 4, 9, 8] but it wasn't: [3, 2, 1, 0, 7, 6, 5, 4, 9, 8]
I am using com.netflix.astyanax
Both my local machine and the remote dev server connect to the very
same cassandra instance.
Both my local machine and the remote dev server run the very same test
creating a new row family, and adding 10 records before one is deleted.
When the error occurs on dev, log.error("Unable to delete
location " + value, e); was not executed (i.e. running the deletion
command didn't produce any exception).
I am 100% positive that no other code is affecting the content of the
database while I am running the test on dev so this isn't some
strange concurrency issue.
What could possibly explain that the deleteColumn(value) request runs without producing any error but still does not remove the column from the database?
ADDITIONAL INFO
Here is how I created the keyspace:
create keyspace sessiondata
with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options = {replication_factor:1};
Here is how I created the column family values, referenced as VALUES_CF in the code above:
create column family values
with comparator = UTF8Type
;
Here is how the keyspace referenced in the java code above is defined:
final AstyanaxContext.Builder contextBuilder = getBuilder();
final AstyanaxContext<Keyspace> keyspaceContext = contextBuilder
.forKeyspace(keyspaceName).buildKeyspace(
ThriftFamilyFactory.getInstance());
keyspaceContext.start();
keyspace = keyspaceContext.getEntity();
where getBuilder is:
private Builder getBuilder() {
final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.NONE)
.setRetryPolicy(new RunOnce());
final ConnectionPoolConfigurationImpl poolConf = new ConnectionPoolConfigurationImpl("MyPool")
.setPort(port)
.setMaxConnsPerHost(1)
.setSeeds(value);
return new AstyanaxContext.Builder()
.forCluster(cluster)
.withAstyanaxConfiguration(conf)
.withConnectionPoolConfiguration(poolConf)
.withConnectionPoolMonitor(new CountingConnectionPoolMonitor());
}
SECOND UPDATE
First, the issues are not solely related to deletes. I observe similar problems when updating records in the database, reading them, and not being able to read the updates I just wrote
Second, I created a test that does 100 times the following operations:
write a row into cassandra
update that row in cassandra
read back that row from cassandra and check whether the row was indeed updated, and checking again regularly after delays if it wasn't
What I observe from that test is that:
again, when I run that code locally, all 100 iterations pass right away (no retry ever needed)
when I run that code on the remote server, some of the iterations pass, some fail. When they fail, no matter how large the delay (I wait up to 10 seconds), the test always fail.
At this point, I am really not sure how any cassandra setup could explain this behavior since I connect to the very same server for my tests and since the delays I insert are much larger than any additional latency I may need to run the test when connecting from my local machine.
The only relevant difference seems to be which machine the code is running on.
THIRD UPDATE
If in the test mentioned in the previous update, I insert a delay between the 2 writes, the code starts passing if the delay is >= 1,000 ms. A delay of, say, 100 ms doesn't help. I also modified the builder to set the default read and write consistencies to the most demanding: ALL, and that had no impact on the results of the test (still failing about half of the time unless delay between writes >1s):
final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.NONE)
.setRetryPolicy(new RunOnce()).setDefaultReadConsistencyLevel(ConsistencyLevel.CL_ALL).setDefaultWriteConsistencyLevel(ConsistencyLevel.CL_ALL);
To debug, try printing the full row instead of just the column names. When I say the full row I mean the column name, column value and the time stamp. A long shot is clocks are wrong on one of your test machines and this is throwing out your tests on the other.
Another thing to double check is that ip is indeed what you think it is, in both your application and cassandra. When you retrieve it print it between something, like println("-" + ip "-"). Before and after your try block for the execute in deleteSecureLocation do a get for only that column, not the entire row. I'm not too sure how to do that in astynax, on the cli it would be get[id][ip].
Something to keep in mind is that a delete won't fail even if there's nothing to delete. To cassandra it's a write, the only thing that will make it a delete is if on read it's the latest timestamped entry against that row/column name.
I'm blocking with a problem from a few days. I've found some similar posts but still didn't understanding the problem of my code.
In fact, I'm reading a file (18,4Kbytes) which is containing SQL queries. So the only thing I want to do is read the file and execute the queries.
I've no problem reading the file, the problem occurs when after having executed all the queries (if I don't execute it, it works but it's not the deal!)
So here's my code (between try / catch for IO Exception):
InputStream in = ctx.getAssets().open("file.sql");
ByteArrayBuffer queryBuff = new ByteArrayBuffer(in.available());
String query = null;
int curent;
while (-1 != (curent = in.read())) {
queryBuff.append((char) curent);
if (((char) curent) == ';') {
query = new String(queryBuff.toByteArray());
db.execSQL(query);
queryBuff.clear();
query = null;
}
}
in.close();
queryBuff.clear();
And my GC_CONCURENT occurs when there is "new String" in the loop, after the end of the loop.
Thanks !
EDIT :
I'm a little annoyed, because my memory-leak didn't occurs in this part of code but in a part of code executed laterly (don't know what for now) but my problem wasn't a problem, app worked properly in fact...
Sorry !