Solved, of course after posting here it hit me... Now using different drivers from http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC#Download that don't need extensive configuration.
Original question below the break.
I'm fooling around with a SQLite database containing OpenStreetMap data, and I'm having some trouble with JDBC.
The query below is the one I'd like to use to get a location close to the location of my user quicky (numbers are from my test-data, and are added by the java code).
SELECT roads.nodeID, lat, lon
FROM roads
INNER JOIN nodes
ON roads.nodeID=nodes.nodeID
ORDER BY (ABS(lat - (12.598418)) + ABS(lon - (-70.043514))) ASC
LIMIT 1
'roads' and 'nodes' both contain approximately 130,000 rows.
This specific query is one of the most intensive buyt it's only used twice so that should be OK for my needs. It executes in about 281 ms when using the Firefox SQLite, but in Java using sqlitejdbc-v056 it takes between 12 and 14 seconds (with full processor load).
Any clues on how to fix this?
public Node getNodeClosestToLocation(Location loc){
try {
Class.forName("org.sqlite.JDBC");
Statement stat = conn.createStatement();
String q = "SELECT roads.nodeID, lat, lon "+
"FROM roads "+
"INNER JOIN nodes "+
"ON roads.nodeID=nodes.nodeID "+
"ORDER BY (ABS(lat - ("+loc.getLat()+")) +
ABS(lon - ("+loc.getLon()+"))) ASC "+
"LIMIT 1";
long start = System.currentTimeMillis();
System.out.println(q);
rs = stat.executeQuery(q);
if(rs.next()) {
System.out.println("Done. " + (System.currentTimeMillis() - start));
return new Node(rs.getInt("nodeID"), rs.getFloat("lat"), rs.getFloat("lon"));
}
}
catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
When it comes to select statement queries in JDBC, they can be very painfully slow if they're not utilized correctly. A few points:
Make sure you index the proper columns in your table. A simple line such as:
Statement stat = connection.createStatement();
stat.executeUpdate("create index {index_name} on orders({column_name});");
stat.close();
Creating an index: http://www.w3schools.com/sql/sql_create_index.asp
Insertion takes longer with indices, since each previous index needs to be updated as new records are inserted. Creating an index is best done after all the insert statements have been executed (better performance). Indexed columns take a hit on insertion performance but have significantly faster selection performance.
Changing the JDBC driver may help slightly but overall should not be the underlying issue. Also make sure you're running in native mode. Pure-java mode is significantly slower, at least from what I've noticed. The following code segment will tell you which mode you're running, assuming you're using SQLite JDBC.
System.out.println(String.format("%s mode", SQLiteJDBCLoader.isNativeMode() ? "native" : "pure-java"));
I experienced the same issue with slow selections in a database with more than 500K records. The run time of my application would have been 9.9 days if I had not indexed. Now it is a blazing fast 2 minutes to do the exact same thing. SQLite is very fast when proper and optimized sql is used.
Using a PreparedStatement might give you slightly better performances but nothing of the magnitude described here.
Perhaps Firefox SQLite is using some hints. You could try to get the execution plan to see where the query is doing the hard work and create some index if required.
Have you tried to log any timing information to make sure it is not getting the connection which is expensive?
Related
I am trying to improve my Java app's performance and I'm focusing at this point on one end point which has to insert a large amount of data into mysql.
I'm using plain JDBC with the MariaDB Java client driver:
try (PreparedStatement stmt = connection.prepareStatement(
"INSERT INTO data (" +
"fId, valueDate, value, modifiedDate" +
") VALUES (?,?,?,?)") {
for (DataPoint dp : datapoints) {
stmt.setLong(1, fId);
stmt.setDate(2, new java.sql.Date(dp.getDate().getTime()));
stmt.setDouble(3, dp.getValue());
stmt.setDate(4, new java.sql.Date(modifiedDate.getTime()));
stmt.addBatch();
}
int[] results = statement.executeBatch();
}
From populating the new DB from dumped files, I know that max_allowed_packet is important and I've got that set to 536,870,912 bytes.
In https://dev.mysql.com/doc/refman/5.7/en/insert-optimization.html it states that:
If you are inserting many rows from the same client at the same time,
use INSERT statements with multiple VALUES lists to insert several
rows at a time. This is considerably faster (many times faster in some
cases) than using separate single-row INSERT statements. If you are
adding data to a nonempty table, you can tune the
bulk_insert_buffer_size variable to make data insertion even faster.
See Section 5.1.7, “Server System Variables”.
On my DBs, this is set to 8MB
I've also read about key_buffer_size (currently set to 16MB).
I'm concerned that these last 2 might not be enough. I can do some rough calculations on the JSON input to this algorithm because it looks someething like this:
[{"actualizationDate":null,"data":[{"date":"1999-12-31","value":0},
{"date":"2000-01-07","value":0},{"date":"2000-01-14","value":3144},
{"date":"2000-01-21","value":358},{"date":"2000-01-28","value":1049},
{"date":"2000-02-04","value":-231},{"date":"2000-02-11","value":-2367},
{"date":"2000-02-18","value":-2651},{"date":"2000-02-25","value":-
393},{"date":"2000-03-03","value":1725},{"date":"2000-03-10","value":-
896},{"date":"2000-03-17","value":2210},{"date":"2000-03-24","value":1782},
and it looks like the 8MB configured for bulk_insert_buffer_size could easily be exceeded, if not key_buffer_size as well.
But the MySQL docs only make mention of MyISAM engine tables, and I'm currently using InnoDB tables.
I can set up some tests but it would be good to know how this will break or degrade, if at all.
[EDIT] I have --rewriteBatchedStatements=true. In fact here's my connection string:
jdbc:p6spy:mysql://myhost.com:3306/mydb\
?verifyServerCertificate=true\
&useSSL=true\
&requireSSL=true\
&cachePrepStmts=true\
&cacheResultSetMetadata=true\
&cacheServerConfiguration=true\
&elideSetAutoCommits=true\
&maintainTimeStats=false\
&prepStmtCacheSize=250\
&prepStmtCacheSqlLimit=2048\
&rewriteBatchedStatements=true\
&useLocalSessionState=true\
&useLocalTransactionState=true\
&useServerPrepStmts=true
(from https://github.com/brettwooldridge/HikariCP/wiki/MySQL-Configuration )
An alternative is to execute the batch from time to time. This allows you to reduce the size of the batchs and let you focus on more important problems.
int batchSize = 0;
for (DataPoint dp : datapoints) {
stmt.setLong(1, fId);
stmt.setDate(2, new java.sql.Date(dp.getDate().getTime()));
stmt.setDouble(3, dp.getValue());
stmt.setDate(4, new java.sql.Date(modifiedDate.getTime()));
stmt.addBatch();
//When limit reach, execute and reset the counter
if(batchSize++ >= BATCH_LIMIT){
statement.executeBatch();
batchSize = 0;
}
}
// To execute the remaining items
if(batchSize > 0){
statement.executeBatch();
}
I generally use a constant or a parameter based on the DAO implementation to be more dynamic but a batch of 10_000 row is a good start.
private static final int BATCH_LIMIT = 10_000;
Note that this is not necessary to clear the batch after an execution. Even if this is not specified in Statement.executeBatch documentation, this is in the JDBC specification 4.3
14 Batch Updates
14.1 Description of Batch Updates
14.1.2 Successful Execution
Calling the method executeBatch closes the calling Statement object’s current result set if one is open.
The statement’s batch is reset to empty once executeBatch returns.
The management of the result is a bit more complicated but you can still concatenate the results if you need them. This can be analyzed at any time since the ResultSet is not needed anymore.
I have read the previous questions but none of them seem to match my problem although they might seem similar at first. :/_
So, I am working on a local database on Java(JDBC). When I press a button I should be getting the result of a "SELECT" query. So far so good, but for some reason which my beginner brain does not understand I keep getting only one row from the query. I have even run the same exact query on "DB Browser for SQLite" and it returns the correct result (1+ rows) .
So this is the method I am using to get the result of the query:
public ResultSet returnBill(int no) throws SQLException{
String sql = "SELECT * FROM billList WHERE no = " + no + " ;";
ResultSet thisSet = stmt.executeQuery(sql); // stmt is a 'Statement' type variable
return thisSet;
}
The method does not crash but it only returns the very first row of a query which should return more than 2 ( while (thisSet.next()) RUNS ONCE). I run other "SELECT" queries on the program which are supposed to return more than one rows and they all work fine so it's not a matter of not being able to start/close the connection etc.
Below is the method being used:
int number = table.getModel().getValueAt(rows, 0);
ResultSet thisSet = db.returnBill(number);
while (thisSet.next()){
String name = thisSet.getString("name");
int quantity = thisSet.getInt("quantity");
// do something with the returned data
}
So I get this magical number from a table (of course I made sure it's not 0, -1 etc.) and I run a query using that number. You could think of the structure of the table consisting of columns :
number | name | quantity |
where 'number' is nonzero.
I understand that probably using this method to run a query on a DB might not be safe or might post security threats but it's not the case right now. I have been working on this project for quite a long time already and I have been through many silly mistakes and I think this is yet one of them. Any help is APPRECIATED ! :D
So yes, it was a silly mistake as I expected.
So I had previously initiated a variable
Database db = new Database();
which opened the database for 2 queries (the SELECT query and an UPDATE query on another table as shown) which would then be closed at the end of the following code.
When I removed this UPDATE query however the loop executed the correct amount of times. So it seems like the SQLite JDBC is somehow prone to running a SELECT and UPDATE query on the same Statement (as far as my super mega* brain perceives it.)
So I created 2 connections at the very beginning and closed them at the end using one of them for the SELECT and the other one for the UPDATE query:
Database db = new Database(); // open database
Database db2 = new Database(); // open another creepy one :/
int number = table.getModel().getValueAt(rows, 0);
ResultSet thisSet = db.returnBill(number);
while (thisSet.next()){
String name = thisSet.getString("name");
int quantity = thisSet.getInt("quantity");
// do something with the returned data
// --------> STUPIDO <----------
//** Now executing an UPDATE query on db2 :
// ex.: UPDATE anotherTable SET amount = (current+ "+ quantity+") WHERE name= '" + name+ "' ;";
}
db.closeConn(); // close db ++
db2.closeConn(); // closes db 2
I don't know if this is the best approach but it solved my problem, so I'm leaving it so probably it could help. Any suggestions though would be welcomed :D
I have written below program to achieve this:
try {
PreparedStatement statement = connection.prepareStatement(
"SELECT * FROM some_table some_timestamp<?)");
statement.setTimestamp(1, new java.sql.Timestamp(dt.getTime()));
ResultSet resultSet = statement.executeQuery();
CSVWriter csvWriter = new CSVWriter(new FileWriter(activeDirectory + "/archive_data" + timeStamp + ".csv"), ',');
csvWriter.writeAll(resultSet, true);
csvWriter.flush();
} catch (Exception e) {
e.printStackTrace();
}
// delete from table
try {
PreparedStatement statement = connection.prepareStatement(
"DELETE FROM some_table some_timestamp<?)");
statement.setTimestamp(1, new java.sql.Timestamp(dt.getTime()));
statement.executeUpdate();
} catch (Exception e) {
e.printStackTrace();
}
}
dbUtil.close(connection);
Above program would just work fine for an average scenario but I would like to know how can I improve this program which:
Works smoothly for a million records without overloading the application server
Considering there would be many records getting inserted into the same table at the time this program runs, how can I ensure this program archives and then purges the exact same records.
Update: I m using openscv http://opencsv.sourceforge.net/
I would like to suggest several things:
refrain from using time as limit point. It can be cause
unpredictable bugs. Time can be different in different places and
different environment so we should be careful with time. Instead
time use sequence
Use connection pool to get data from database
Save information from db in different files. You can store them on
different drives. After that you have to concatenate information
from them.
Use memory mapped files.
Use multi-threaded model for getting and storing/restoring
information. Note: JDBC doens't support many threads so connection
pool is your helper
And these steps are only about java part. You need to have good design on your DB side. Not easy, right? But this is price for using large data.
Can anyone out there provide an example of bulk inserts via JConnect (with ENABLE_BULK_LOAD) to Sybase ASE?
I've scoured the internet and found nothing.
I got in touch with one of the engineers at Sybase and they provided me a code sample. So, I get to answer my own question.
Basically here is a rundown, as the code sample is pretty large... This assumes a lot of pre initialized variables, but otherwise it would be a few hundred lines. Anyone interested should get the idea. This can yield up to 22K insertions a second in a perfect world (as per Sybase anyway).
SybDriver sybDriver = (SybDriver) Class.forName("com.sybase.jdbc3.jdbc.SybDriver").newInstance();
sybDriver.setVersion(com.sybase.jdbcx.SybDriver.VERSION_6);
DriverManager.registerDriver(sybDriver);
//DBProps (after including normal login/password etc.
props.put("ENABLE_BULK_LOAD","true");
//open connection here for sybDriver
dbConn.setAutoCommit(false);
String SQLString = "insert into batch_inserts (row_id, colname1, colname2)\n values (?,?,?) \n";
PreparedStatement pstmt;
try
{
pstmt = dbConn.prepareStatement(SQLString);
}
catch (SQLException sqle)
{
displaySQLEx("Couldn't prepare statement",sqle);
return;
}
for (String[] val : valuesToInsert)
{
pstmt.setString(1, val[0]); //row_id varchar(30)
pstmt.setString(2, val[1]);//logical_server varchar(30)
pstmt.setString(3, val[2]); //client_host varchar(30)
try
{
pstmt.addBatch();
}
catch (SQLException sqle)
{
displaySQLEx("Failed to build batch",sqle);
break;
}
}
try {
pstmt.executeBatch();
dbConn.commit();
pstmt.close();
} catch (SQLException sqle) {
//handle
}
try {
if (dbConn != null)
dbConn.close();
} catch (Exception e) {
//handle
}
After following most of your advice we didn't see any improvement over simply creating a massive string and sending that across in batches of ~100-1000rows with a surrounding transaction. we got around:
*Big String Method [5000rows in 500batches]: 1716ms = ~2914rows per second.
(this is shit!).
Our db is sitting on a virtual host with one CPU (i7 underneath) and the table schema is:
CREATE TABLE
archive_account_transactions
(
account_transaction_id INT,
entered_by INT,
account_id INT,
transaction_type_id INT,
DATE DATETIME,
product_id INT,
amount float,
contract_id INT NULL,
note CHAR(255) NULL
)
with four indexes on account_transaction_id (pk), account_id, DATE, contract_id.
Just thought I would post a few comments first we're connecting using:
jdbc:sybase:Tds:40.1.1.2:5000/ikp?EnableBatchWorkaround=true;ENABLE_BULK_LOAD=true
we did also try the .addBatch syntax described above but it was marginally slower than just using java StringBuilder to build the batch in sql manually and then just push it across in one execute statement. Removing the column names in the insert statement gave us a surprisingly large performance boost it seemed to be the only thing that actually effected the performance. As the Enable_bulk_load param didn't seem to effect it at all nor did the EnableBatchWorkaround we also tried DYNAMIC_PREPARE=false which sounded promising but also didn't seem to do anything.
Any help getting these parameters actually functioning would be great! In other words are there any tests we could run to verify that they are in effect? I'm still convinced that this performance isn't close to pushing the boundaries of sybase as mysql out of the box does more like 16,000rows per second using the same "big string method" with the same schema.
Cheers
Rod
In order to get the sample provided by Chris Kannon working, do not forget to disable auto commit mode first:
dbConn.setAutoCommit(false);
And place the following line before dbConn.commit():
pstmt.executeBatch();
Otherwise this technique will only slowdown the insertion.
Don't know how to do this in Java, but you can bulk-load text files with LOAD TABLE SQL statement. We did it with Sybase ASA over JConnect.
Support for Batch Updates
Batch updates allow a Statement object to submit multiple update commands
as one unit (batch) to an underlying database for processing together.
Note: To use batch updates, you must refresh the SQL scripts in the sp directory
under your jConnect installation directory.
CHAPTER
See BatchUpdates.java in the sample (jConnect 4.x) and sample2 (jConnect
5.x) subdirectories for an example of using batch updates with Statement,
PreparedStatement, and CallableStatement.
jConnect also supports dynamic PreparedStatements in batch.
Reference:
http://download.sybase.com/pdfdocs/jcg0420e/prjdbc.pdf
http://manuals.sybase.com/onlinebooks/group-jcarc/jcg0520e/prjdbc/#ebt-link;hf=0;pt=7694?target=%25N%14_4440_START_RESTART_N%25#X
.
Other Batch Update Resources
http://java.sun.com/j2se/1.3/docs/guide/jdbc/spec2/jdbc2.1.frame6.html
http://www.jguru.com/faq/view.jsp?EID=5079
I need to store up to tens or even hundreds of millions of pieces of data on-disk. Each piece of data contains information like:
id=23425
browser=firefox
ip-address=10.1.1.1
outcome=1.0
New pieces of data may be added at the rate of up-to 1 per millisecond.
So its a relatively simple set of key-value pairs, where the values can be strings, integers, or floats. Occasionally I may need to update the piece of data with a particular id, changing the flag field from 0 to 1. In other words, I need to be able to do random key lookups by id, and modify the data (actually only the floating point "outcome" field - so I'll never need to modify the size of the value).
The other requirement is that I need to be able to stream this data off disk (the order isn't particularly important) efficiently. This means that the hard disk head should not need to jump around the disk to read the data, rather it should be read in consecutive disk blocks.
I'm writing this in Java.
I've thought about using an embedded database, but DB4O is not an option as it is GPL and the rest of my code is not. I also worry about the efficiency of using an embedded SQL database, given the overhead of translating to and from SQL queries.
Does anyone have any ideas? Might I have to build a custom solution to this (where I'm dealing directly with ByteBuffers, and handling the id lookup)?
How about H2? The License should work for you.
You can use H2 for free. You can
integrate it into your application
(including commercial applications),
and you can distribute it.
Files
containing only your code are not
covered by this license (it is
'commercial friendly').
Modifications
to the H2 source code must be
published.
You don't need to provide
the source code of H2 if you did not
modify anything.
I get
1000000 insert in 22492ms (44460.252534234394 row/sec)
100000 updates in 9565ms (10454.783063251438 row/sec)
from
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.Random;
/**
* #author clint
*
*/
public class H2Test {
static int testrounds = 1000000;
public static void main(String[] args) {
try {
Class.forName("org.h2.Driver");
Connection conn = DriverManager.
getConnection("jdbc:h2:/tmp/test.h2", "sa", "");
// add application code here
conn.createStatement().execute("DROP TABLE IF EXISTS TEST");
conn.createStatement().execute("CREATE TABLE IF NOT EXISTS TEST(id INT PRIMARY KEY, browser VARCHAR(64),ip varchar(16), outcome real)");
//conn.createStatement().execute("CREATE INDEX IDXall ON TEST(id,browser,ip,outcome");
PreparedStatement ps = conn.prepareStatement("insert into TEST (id, browser, ip, outcome) values (?,?,?,?)");
long time = System.currentTimeMillis();
for ( int i = 0; i < testrounds; i++ ) {
ps.setInt(1,i);
ps.setString(2,"firefox");
ps.setString(3,"000.000.000.000");
ps.setFloat(4,0);
ps.execute();
}
long last = System.currentTimeMillis() ;
System.out.println( testrounds + " insert in " + (last - time) + "ms (" + ((testrounds)/((last - time)/1000d)) + " row/sec)" );
ps.close();
ps = conn.prepareStatement("update TEST set outcome = 1 where id=?");
Random random = new Random();
time = System.currentTimeMillis();
/// randomly updadte 10% of the entries
for ( int i = 0; i < testrounds/10; i++ ) {
ps.setInt(1,random.nextInt(testrounds));
ps.execute();
}
last = System.currentTimeMillis();
System.out.println( (testrounds/10) + " updates in " + (last - time) + "ms (" + ((testrounds/10)/((last - time)/1000d)) + " row/sec)" );
conn.close();
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
JDBM is a great embedded database for Java (and not as encumbered with licensing as the Java version of Berkley). It would be worth trying. If you don't need ACID guarantees (i.e. you are OK with the database getting corrupted in the event of a crash), turn off the transaction manager (significantly increases speed).
I think you'd have a lot more success writing something that caches the most active records in memory and queues data changes as a low priority insert into the DB.
I understand there's a slight increase in IO using this method but if you're talking about millions of records I think it would still be faster because any search algorithm you create is going to be greatly outperformed by a a full fledged database engine.
You could try Berkley DB which is now owned by Oracle. They have Open Source and Commercial licenses. It uses a Key/Value model (with an option to create indexes if other forms of queries are required). There is a pure Java version and a native version with Java bindings.
http://www.zentus.com/sqlitejdbc/
SQLite database (public domain), JDBC connector with BSD license, native for a whole bunch of platforms (OSX, Linux, Windows), emulation for the rest.
You can use Apache Derby (or JavaDB) which is bundled with JDK. However, if a DBMS doesn't provide the required speed you may implement a specific file structure yourself. If just exact key lookup is required, you may use a hash-file to implement it. The hash-file is the fastest file structure for such requirements (much faster than general purpose file structures such as B-Trees and grids which are used in DBs). It also provides acceptable streaming efficiency.
In the end I decided to log the data to disk as it comes in, and also keep it in memory where I can update it. After a period of time I write the data out to disk and delete the log.
Have you taken a look at Oracle's 'TimesTen' database? Its an in-memory db that is supposed to be very high-performance. Don't know about costs/licenses, etc, but take a look at Oracles site and search for it. Eval download should be available.
I'd also take a look to see if there's anything existing based on either EHCache or JCS that might help.