Sybase JConnect: ENABLE_BULK_LOAD usage - java

Can anyone out there provide an example of bulk inserts via JConnect (with ENABLE_BULK_LOAD) to Sybase ASE?
I've scoured the internet and found nothing.

I got in touch with one of the engineers at Sybase and they provided me a code sample. So, I get to answer my own question.
Basically here is a rundown, as the code sample is pretty large... This assumes a lot of pre initialized variables, but otherwise it would be a few hundred lines. Anyone interested should get the idea. This can yield up to 22K insertions a second in a perfect world (as per Sybase anyway).
SybDriver sybDriver = (SybDriver) Class.forName("com.sybase.jdbc3.jdbc.SybDriver").newInstance();
sybDriver.setVersion(com.sybase.jdbcx.SybDriver.VERSION_6);
DriverManager.registerDriver(sybDriver);
//DBProps (after including normal login/password etc.
props.put("ENABLE_BULK_LOAD","true");
//open connection here for sybDriver
dbConn.setAutoCommit(false);
String SQLString = "insert into batch_inserts (row_id, colname1, colname2)\n values (?,?,?) \n";
PreparedStatement pstmt;
try
{
pstmt = dbConn.prepareStatement(SQLString);
}
catch (SQLException sqle)
{
displaySQLEx("Couldn't prepare statement",sqle);
return;
}
for (String[] val : valuesToInsert)
{
pstmt.setString(1, val[0]); //row_id varchar(30)
pstmt.setString(2, val[1]);//logical_server varchar(30)
pstmt.setString(3, val[2]); //client_host varchar(30)
try
{
pstmt.addBatch();
}
catch (SQLException sqle)
{
displaySQLEx("Failed to build batch",sqle);
break;
}
}
try {
pstmt.executeBatch();
dbConn.commit();
pstmt.close();
} catch (SQLException sqle) {
//handle
}
try {
if (dbConn != null)
dbConn.close();
} catch (Exception e) {
//handle
}

After following most of your advice we didn't see any improvement over simply creating a massive string and sending that across in batches of ~100-1000rows with a surrounding transaction. we got around:
*Big String Method [5000rows in 500batches]: 1716ms = ~2914rows per second.
(this is shit!).
Our db is sitting on a virtual host with one CPU (i7 underneath) and the table schema is:
CREATE TABLE
archive_account_transactions
(
account_transaction_id INT,
entered_by INT,
account_id INT,
transaction_type_id INT,
DATE DATETIME,
product_id INT,
amount float,
contract_id INT NULL,
note CHAR(255) NULL
)
with four indexes on account_transaction_id (pk), account_id, DATE, contract_id.
Just thought I would post a few comments first we're connecting using:
jdbc:sybase:Tds:40.1.1.2:5000/ikp?EnableBatchWorkaround=true;ENABLE_BULK_LOAD=true
we did also try the .addBatch syntax described above but it was marginally slower than just using java StringBuilder to build the batch in sql manually and then just push it across in one execute statement. Removing the column names in the insert statement gave us a surprisingly large performance boost it seemed to be the only thing that actually effected the performance. As the Enable_bulk_load param didn't seem to effect it at all nor did the EnableBatchWorkaround we also tried DYNAMIC_PREPARE=false which sounded promising but also didn't seem to do anything.
Any help getting these parameters actually functioning would be great! In other words are there any tests we could run to verify that they are in effect? I'm still convinced that this performance isn't close to pushing the boundaries of sybase as mysql out of the box does more like 16,000rows per second using the same "big string method" with the same schema.
Cheers
Rod

In order to get the sample provided by Chris Kannon working, do not forget to disable auto commit mode first:
dbConn.setAutoCommit(false);
And place the following line before dbConn.commit():
pstmt.executeBatch();
Otherwise this technique will only slowdown the insertion.

Don't know how to do this in Java, but you can bulk-load text files with LOAD TABLE SQL statement. We did it with Sybase ASA over JConnect.

Support for Batch Updates
Batch updates allow a Statement object to submit multiple update commands
as one unit (batch) to an underlying database for processing together.
Note: To use batch updates, you must refresh the SQL scripts in the sp directory
under your jConnect installation directory.
CHAPTER
See BatchUpdates.java in the sample (jConnect 4.x) and sample2 (jConnect
5.x) subdirectories for an example of using batch updates with Statement,
PreparedStatement, and CallableStatement.
jConnect also supports dynamic PreparedStatements in batch.
Reference:
http://download.sybase.com/pdfdocs/jcg0420e/prjdbc.pdf
http://manuals.sybase.com/onlinebooks/group-jcarc/jcg0520e/prjdbc/#ebt-link;hf=0;pt=7694?target=%25N%14_4440_START_RESTART_N%25#X
.
Other Batch Update Resources
http://java.sun.com/j2se/1.3/docs/guide/jdbc/spec2/jdbc2.1.frame6.html
http://www.jguru.com/faq/view.jsp?EID=5079

Related

Most efficient multithreading Database Insert in Java

We have to read a lot of data from a HDD (~50GB) into our database, but our multithreading procedure is pretty slow (~2h for ~10GB), because of a Thread lock inside of org.sqlite.core.NativeDB.reset[native] (see thread sampler).
We read our data relatively fast and use our insert method to execute a prepared statement. But only if we collected like 500.000 datasets we commit all these statements to our database. Currently we use JDBC as Interface for our sqlite database.
Everything works fine so far, if you use one thread total. But if you want to use multiple threads you do not see much of a performance/speed increase, because only one thread can run at time, and not in parallel.
We already reuse our preparedStatement and all threads use one instance of our Database class to prevent file locks (there is one connection to the database).
Unfortunately we have no clue how to improve our insert method any further. Is anyone able to give us some tips/solutions or a way how to not use this NativeDB.reset method?
We do not have to use SQLite, but we would like to use Java.
(Threads are named 1,2,...,15)
private String INSERT = "INSERT INTO urls (url) VALUES (?);";
public void insert(String urlFromFile) {
try {
preparedStatement.setString(1, urlFromFile);
preparedStatement.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}
}
Updated insert method as suggested by #Andreas , but it is still throwing some Exceptions
public void insert(String urlFromFile) {
try {
preparedStatement.setString(1, urlFromFile);
preparedStatement.addBatch();
++callCounter;
if (callCounter%500000 == 0 && callCounter>0){
preparedStatement.executeBatch();
commit();
System.out.println("Exec");
}
} catch (SQLException e) {
e.printStackTrace();
}
}
java.lang.ArrayIndexOutOfBoundsException: 9
at org.sqlite.core.CorePreparedStatement.batch(CorePreparedStatement.java:121)
at org.sqlite.jdbc3.JDBC3PreparedStatement.setString(JDBC3PreparedStatement.java:421)
at UrlDatabase.insert(UrlDatabase.java:85)
Most databases have some sort of bulk insert functionality, though there's no standard for it, AFAIK.
Postrgresql has COPY, and MySql has LOAD DATA, for instance.
I don't think that SQLite has this facility, though - it might be worth switching to a database that does.
SQLite has no write concurrency.
The fastest way to load a large amount of data is to use a single thread (and a single transaction) to insert everything into the DB (and not to use WAL).

Best way to archive (in flat file) and then purge huge data

I have written below program to achieve this:
try {
PreparedStatement statement = connection.prepareStatement(
"SELECT * FROM some_table some_timestamp<?)");
statement.setTimestamp(1, new java.sql.Timestamp(dt.getTime()));
ResultSet resultSet = statement.executeQuery();
CSVWriter csvWriter = new CSVWriter(new FileWriter(activeDirectory + "/archive_data" + timeStamp + ".csv"), ',');
csvWriter.writeAll(resultSet, true);
csvWriter.flush();
} catch (Exception e) {
e.printStackTrace();
}
// delete from table
try {
PreparedStatement statement = connection.prepareStatement(
"DELETE FROM some_table some_timestamp<?)");
statement.setTimestamp(1, new java.sql.Timestamp(dt.getTime()));
statement.executeUpdate();
} catch (Exception e) {
e.printStackTrace();
}
}
dbUtil.close(connection);
Above program would just work fine for an average scenario but I would like to know how can I improve this program which:
Works smoothly for a million records without overloading the application server
Considering there would be many records getting inserted into the same table at the time this program runs, how can I ensure this program archives and then purges the exact same records.
Update: I m using openscv http://opencsv.sourceforge.net/
I would like to suggest several things:
refrain from using time as limit point. It can be cause
unpredictable bugs. Time can be different in different places and
different environment so we should be careful with time. Instead
time use sequence
Use connection pool to get data from database
Save information from db in different files. You can store them on
different drives. After that you have to concatenate information
from them.
Use memory mapped files.
Use multi-threaded model for getting and storing/restoring
information. Note: JDBC doens't support many threads so connection
pool is your helper
And these steps are only about java part. You need to have good design on your DB side. Not easy, right? But this is price for using large data.

HSQLDB over JDBC: execute SQL statements in batch

I need to initialize a database from my Java application. For reasons of code maintainability, I would like to maintain the SQL code separately from the Java code (it is currently in a separate source file).
The first few lines of the file are as follows:
-- 1 - Countries - COUNTRIES.DAT;
drop table Countries if exists;
create table Countries(
CID integer,
ECC varchar(2),
CCD varchar(1),
NAME varchar(50));
I read the SQL code from the file and store it in a string. Then I do:
PreparedStatement stmt = dbConnection.prepareStatement(sqlString);
This fails with the following exception:
java.sql.SQLSyntaxErrorException: unexpected token: CREATE : line: 2
This looks as if JDBC doesn't like multiple SQL statements in a single PreparedStatement. I have also tried CallableStatement and prepareCall(), with the same result.
Does JDBC provide a way to pass the entire SQL script in one go?
The JDBC standard (and the SQL standard for that matter) assumes a single statement per execute. Some drivers have an option to allow execution of multiple statements in one execute, but technically that option violates the JDBC standard. There is nothing in JDBC itself that supports multi-statement script execution.
You need to separate the statements yourself (on the ;), and execute them individually, or find a third-party tool that does this for you (eg MyBatis ScriptRunner).
You might also want to look at something like flyway or liquibase.
To run a hardcoded / loaded from file queries you can use execute like:
Statement stmt = null;
String query = "drop table Countries if exists;
create table Countries(
CID integer,
ECC varchar(2),
CCD varchar(1),
NAME varchar(50));";
try {
stmt = con.createStatement();
stmt.execute(query);
} catch (SQLException e ) {
JDBCTutorialUtilities.printSQLException(e);
} finally {
if (stmt != null) { stmt.close(); }
}
If you want to run dynamic queries for example to append values you have to use PreparedStatement. For running a query from a file it's not recommended to put dynamic queries in it.

JDBC SELECT very slow in comparison to Firefox DB manager

Solved, of course after posting here it hit me... Now using different drivers from http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC#Download that don't need extensive configuration.
Original question below the break.
I'm fooling around with a SQLite database containing OpenStreetMap data, and I'm having some trouble with JDBC.
The query below is the one I'd like to use to get a location close to the location of my user quicky (numbers are from my test-data, and are added by the java code).
SELECT roads.nodeID, lat, lon
FROM roads
INNER JOIN nodes
ON roads.nodeID=nodes.nodeID
ORDER BY (ABS(lat - (12.598418)) + ABS(lon - (-70.043514))) ASC
LIMIT 1
'roads' and 'nodes' both contain approximately 130,000 rows.
This specific query is one of the most intensive buyt it's only used twice so that should be OK for my needs. It executes in about 281 ms when using the Firefox SQLite, but in Java using sqlitejdbc-v056 it takes between 12 and 14 seconds (with full processor load).
Any clues on how to fix this?
public Node getNodeClosestToLocation(Location loc){
try {
Class.forName("org.sqlite.JDBC");
Statement stat = conn.createStatement();
String q = "SELECT roads.nodeID, lat, lon "+
"FROM roads "+
"INNER JOIN nodes "+
"ON roads.nodeID=nodes.nodeID "+
"ORDER BY (ABS(lat - ("+loc.getLat()+")) +
ABS(lon - ("+loc.getLon()+"))) ASC "+
"LIMIT 1";
long start = System.currentTimeMillis();
System.out.println(q);
rs = stat.executeQuery(q);
if(rs.next()) {
System.out.println("Done. " + (System.currentTimeMillis() - start));
return new Node(rs.getInt("nodeID"), rs.getFloat("lat"), rs.getFloat("lon"));
}
}
catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
When it comes to select statement queries in JDBC, they can be very painfully slow if they're not utilized correctly. A few points:
Make sure you index the proper columns in your table. A simple line such as:
Statement stat = connection.createStatement();
stat.executeUpdate("create index {index_name} on orders({column_name});");
stat.close();
Creating an index: http://www.w3schools.com/sql/sql_create_index.asp
Insertion takes longer with indices, since each previous index needs to be updated as new records are inserted. Creating an index is best done after all the insert statements have been executed (better performance). Indexed columns take a hit on insertion performance but have significantly faster selection performance.
Changing the JDBC driver may help slightly but overall should not be the underlying issue. Also make sure you're running in native mode. Pure-java mode is significantly slower, at least from what I've noticed. The following code segment will tell you which mode you're running, assuming you're using SQLite JDBC.
System.out.println(String.format("%s mode", SQLiteJDBCLoader.isNativeMode() ? "native" : "pure-java"));
I experienced the same issue with slow selections in a database with more than 500K records. The run time of my application would have been 9.9 days if I had not indexed. Now it is a blazing fast 2 minutes to do the exact same thing. SQLite is very fast when proper and optimized sql is used.
Using a PreparedStatement might give you slightly better performances but nothing of the magnitude described here.
Perhaps Firefox SQLite is using some hints. You could try to get the execution plan to see where the query is doing the hard work and create some index if required.
Have you tried to log any timing information to make sure it is not getting the connection which is expensive?

Java: Making concurrent MySQL queries from multiple clients synchronised

I work at a gaming cybercafe, and we've got a system here (smartlaunch) which keeps track of game licenses. I've written a program which interfaces with this system (actually, with it's backend MySQL database). The program is meant to be run on a client PC and (1) query the database to select an unused license from the pool available, then (2) mark this license as in use by the client PC.
The problem is, I've got a concurrency bug. The program is meant to be launched simultaneously on multiple machines, and when this happens, some machines often try and acquire the same license. I think that this is because steps (1) and (2) are not synchronised, i.e. one program determines that license #5 is available and selects it, but before it can mark #5 as in use another copy of the program on another PC tries to grab that same license.
I've tried to solve this problem by using transactions and table locking, but it doesn't seem to make any difference - Am I doing this right? Here follows the code in question:
public LicenseKey Acquire() throws SmartLaunchException, SQLException {
Connection conn = SmartLaunchDB.getConnection();
int PCID = SmartLaunchDB.getCurrentPCID();
conn.createStatement().execute("LOCK TABLE `licensekeys` WRITE");
String sql = "SELECT * FROM `licensekeys` WHERE `InUseByPC` = 0 AND LicenseSetupID = ? ORDER BY `ID` DESC LIMIT 1";
PreparedStatement statement = conn.prepareStatement(sql);
statement.setInt(1, this.id);
ResultSet results = statement.executeQuery();
if (results.next()) {
int licenseID = results.getInt("ID");
sql = "UPDATE `licensekeys` SET `InUseByPC` = ? WHERE `ID` = ?";
statement = conn.prepareStatement(sql);
statement.setInt(1, PCID);
statement.setInt(2, licenseID);
statement.executeUpdate();
statement.close();
conn.commit();
conn.createStatement().execute("UNLOCK TABLES");
return new LicenseKey(results.getInt("ID"), this, results.getString("LicenseKey"), results.getInt("LicenseKeyType"));
} else {
throw new SmartLaunchException("All licenses of type " + this.name + "are in use");
}
}
You must do two things:
Wrap your code in a transaction (to avoid autocommit releasing locks immediately)
Use SELECT ... FOR UPDATE and mysql will give you the lock you need (released on commit)
SELECT ... FOR UPDATE is better than LOCK TABLE as it can possibly get by with row-level locking, instead of automatically locking the whole table
According to the online manual, the correct syntax for locking is:
LOCK TABLES ...
and you have
LOCK TABLE ...
but you don't have any error checking. Hence you're probably failing to get the lock and it's silently ignoring that.
FWIW, I'd put your cleanup code (UNLOCK TABLES, conn.commit(), etc) in a finally block to ensure that you always clean up properly in the event of an exception.
As it is, you appear to be potentially leaking database connection handles, and never releasing the lock if there's no free license.
I would like to suggest just doing an update statement and checking how many rows where updated. i will write it out in psudo code.
int uniqueId = SmartLaunchDB.getCurrentPCID();;
int updatedRows = execute('UPDATE `licensekeys` SET `InUseByPC` = uniqueId WHERE `InUseByPC` NOT null LIMIT1')
if (updatedRows == 1)
SUCCESS
else
FAIL
If it succeeds you can then get the licence key/ID by doing a select.
As is so often the case, OP is an idiot. The code I posted was actually working, but I've just discovered a duplicate row in the database - I guess someone entered the same license twice by mistake. This led me to believe that a concurrency bug I had fixed (by introducing table locks) was still unfixed.
Thanks for the general advice, I've introduced better exception handling to this method.

Categories