Increase JDBC performance for bulk inserts

Increase JDBC performance for bulk inserts - java

Currently I have a throughput of about 350MB/hour which isn't alot. The bottleneck is the insertions into the Sybase database so I am looking for ways to increase the throughput.
I Can only use free JDBC drivers - none of which support driver level bulk inserts (as far as I know).
Currently I have got autoCommit set to false (so is transactional). Preparing statement, adding to the batch and then executing the batch every 2000 records (I have played with this number, but it doesn't help). Then commiting the transaction all the inserts have been executed.
Currently using the JTDS driver.
So I am resorting to any hacks, tips and tricks anyone has to increase the throughput.
Additional details:
There are no triggers on the table.
Only constraint is a public key consisting of 3 fields. (with indices)
The statement is literally INSERT INTO table([col],[col1],[col2],[col3]) VALUES (?,?,?,?)

I also ran into performance issues.
I came to know that multiple queries using JDBC causes lot of network overhead on Application and Database server. Also it results in delay due to network round trips.
Please consider following. It might help:
Multiple queries VS Stored Procedure

Write a redcords in file and insert into through bcp utility. it would be much faster then what you are currenlty doing.

Related

Concurrent calls to a custom plugin processed 1 at a time

I developed a plugin of my own in Neo4j in order to speed the process of inserting node. Mainly because I needed to insert node and relationship only if they didn't exists before which can be too slow using the REST API.
If I try to call my plugin a 100 time, inserting roughly 100 nodes and 100 relationship each time, it take approximately 350ms on each call. Each call is inserting different nodes, in order to rule out locking cause.
However if I parallelize my calls (2, 3 , 4... at time), the response time drop accordingly to the parallelism degree. It takes 750ms to insert my 200 objects when I do 2 call at a time, 1000ms when I do 3 etc.
I'm calling my plugin from a .NET MVC controller, using HttpWebRequest. I set the maxConnection to 10000, and I can see all the TCP connection opened.
I investigated a little on this issue but it seems very wrong. I must have done something wrong, either in my neo4j configuration, or in my plugin code. Using VisualVM I found out that the threads launched by Neo4j to handle my calls are working sequentially. See the picture linked.
http://i.imgur.com/vPWofTh.png
My conf :
Windows 8, 2 core
8G of RAM
Neo4j 2.0M03 installed as a service with no conf tuning
Hope someone will be able to help me. As it is, I will be unable to use Neo4j in production, where there will be tens of concurrent calls, which cannot be done sequentially.

Neo4j is transactional. Every commit triggers an IO operation on filesystem which needs to run in a synchronized block - this explains the picture you've attached. Therefore it's best practice to run writes single threaded. Any pre-processing prior can of course benefit from parallelizing.
In general for maximum performance go with the stable version (1.9.2 as of today). Early milestone builds are not optimized yet, so you might get a wrong picture.
Another thing to consider is the transaction size used in your plugin. 10k to 50k in a single transaction should give you best results. If your transactions are very small, transactional overhead is significant, in case of huge transactions, you need lots of memory.
Write performance is heavily driven by the performance of underlying IO subsystem. If possible use fast SSD drives, even better stripe then.

How can I speed up the importing process to a postgres database?

I am importing some data from a 10Gb file to a postgres database tables using java (jdbc). Import process taking more 12 hours to complete, so need to improve the importing process. I tried copy command for inserting. Some select commands are also running with the inserting tables. Can anyone suggests the way to improve the speed?

Standard SQL INSERT statement typically has a too big overhead when millions of rows are needed. 10 GiB of data isn't really that much, but certainly too much for INSERT (you either have a huge transaction or commit/rollback every INSERT).
There is a nice 14.4. Populating a Database chapter in official documentation. 14.4.2. Use COPY is especially interesting for you:
Use COPY to load all the rows in one command, instead of using a series of INSERT commands. The COPY command is optimized for loading large numbers of rows; it is less flexible than INSERT, but incurs significantly less overhead for large data loads. Since COPY is a single command, there is no need to disable autocommit if you use this method to populate a table.
See also:
Whats the fastest way to do a bulk insert into Postgres?

DB2 jdbc performance

doing profiling on an java application running websphere 7 and DB2 we can see that we spend most of our time in the com.ibm.ws.rsadapter.jdbc package handling connections to and from the database.
How can we tune our jdbc performance?
What other strategies exist when database performance is a bottleneck?
Thanks

You should check your websphere manual for how you configure a connection pool.
Update 2021
Here is an introduction inculding code samples
Update 2021

One cause of slow connect times is a deactivated database, which does not open its files and allocate its memory buffers and heaps until the first application attempts to connect to it. Ask your DBA to confirm that the database is active before running your tests. The LIST ACTIVE DATABASES command (run from the local DB2 server or over a remote attachment) should show your database in its output. If the database is not activated, have your DBA activate it explicitly with ACTIVATE DATABASE yourDBname. That will ensure that the database files and memory structures remain available even when the last user disconnects from the database.
Use GET MONITOR SWITCHES to ensure all your monitor switches are enabled for your database, otherwise you'll miss out on some potentially revealing performance details. The additional overhead of tracking the data associated with those monitor switches is minimal, while the value of the performance data is significant.
If the database is always active and things still seem slow, there are detailed DB2 traces called event monitors that log everything they encounter to a file, pipe, or DB2 table. The statement event monitor is one I turn to fairly often to analyze SQL statement efficiency and UOW hygiene. I also prefer taking the extra hit to log the event monitor records to a table rather than a file, so I can use SQL to search the data for all sorts of patterns. The db2evtbl utility makes it fairly easy to define the event monitor you want and create the tables to store its output. The SET EVENT MONITOR STATE command is how you start and stop the event monitor you've created.

In my experience what you are seeing is pretty common. The question to ask is what exactly is the DB2 connection doing...
The first thing to do is to try and isolate the performance issue down to a section of the website - i.e. is there one part of the application that see poor performance, when you find that you can increase the trace logging to see if you can see the query causing issues.
Additionally, if you chat to your DBA's they may be able to run some analysis on the database to tell you what queries are taking the time to return values, this may also help in your troubleshooting.
Good luck!

Connection pooling
Caching
DBAs

MySQL performance

I have this LAMP application with about 900k rows in MySQL and I am having some performance issues.
Background - Apart from the LAMP stack , there's also a Java process (multi-threaded) that runs in its own JVM. So together with LAMP & java, they form the complete solution. The java process is responsible for inserts/updates and few selects as well. These inserts/updates are usually in bulk/batch, anywhere between 5-150 rows. The PHP front-end code only does SELECT's.
Issue - the PHP/SELECT queries become very slow when the java process is running. When the java process is stopped, SELECT's perform alright. I mean the performance difference is huge. When the java process is running, any action performed on the php front-end results in 80% and more CPU usage for mysqld process.
Any help would be appreciated.
MySQL is running with default parameters & settings.
Software stack -
Apache - 2.2.x
MySQL -5.1.37-1ubuntu5
PHP - 5.2.10
Java - 1.6.0_15
OS - Ubuntu 9.10 (karmic)

What engine are you using for MySQL? The thing to note here is if you're using MyISAM, then you're going to have locking issues due to the table locking that engine uses.
From: MySQL Table Locking
Table locking is also disadvantageous
under the following scenario:
* A session issues a SELECT that takes a long time to run.
* Another session then issues an UPDATE on the same table. This session
waits until the SELECT is finished.
* Another session issues another SELECT statement on the same table.
Because UPDATE has higher priority than SELECT, this SELECT waits for the UPDATE to finish,
after waiting for the first SELECT to finish.
I won't repeat them here, but the page has some tips on increasing concurrency on a table within MySQL. Obviously, one option would be to change to an engine like InnoDB which has a more complex row locking mechanism that for high concurrency tables can make a huge difference in performance. For more info on InnoDB go here.
Prior to changing the engine though it would probably be worth looking at the other tips like making sure your table is indexed properly, etc. as this will increase select and update performance regardless of the storage engine.
Edit based on user comment:
I would say it's one possible solution based on the symptoms you've described, but it may not be
the one that will get you where you want to be. It's impossible to say without more information.
You could be doing full table scans due to the lack of indexes. This could be causing I/O contention
on your disk, which just further exasterbates the table locks used by MyISAM. If this is the case then
the root of the cause is the improper indexing and rectifying that would be your best course of action
before changing storage engines.
Also, make sure your tables are normalized. This can have profound implications on performance
especially on updates. Normalized tables can allow you to update a single row instead of hundreds or
thousands in an un-normalized table. This is due to unduplicated values. It can also save huge amounts
of I/O on selects as the db can more efficiently cache data blocks. Without knowing the structure of
the tables you're working with or the indexes you have present it's difficult to provide you with a
more detailed response.
Edit after user attempted using InnoDB:
You mentioned that your Java process is multi-threaded. Have you tried running the process with a single thread? I'm wondering if maybe it's possibly you're sending the same rows to update out to multiple threads and/or the way you're updating across threads is causing locking issues.
Outside of that, I would check the following:
Have you checked your explain plans to verify you have reasonable costs and that the query is actually using the indexes you have?
Are your tables normalized? More specifically, are you updating 100 rows when you could update a single record if the tables were normalized?
Is it possible that you're running out of physical memory when the Java process is running and the machine is busy swapping stuff in and out?
Are you flooding your disk (a single disk?) with more IOPs than it can reasonably handle?

We'd need to know a lot more about the system to say if thats normal or how to solve the problem.
with about 900k rows in MySQL
I would say that makes it very small - so if its performing badly then you're going seriously wrong somewhere.
Enable the query log to see exactly what queries are running, prioritize based on the product of frequency and duration. Have a look at the explain plans, create some indexes. Think about splitting the database across multiple disks.
HTH
C.

--single-transaction --lock-tables options of mysqldump- what happens internally?

I did find other posts similar to this, but wanted a little extra information on these options of mysqldump. I have understood that the --single-transaction and --lock-tables are mutually exclusive operations. Following are my questions regarding these options.
a) Suppose I have chosen to use --lock-tables option. In this case the mysqldump acquires a read lock on all the tables. So any other process trying to write to the tables will go into a blocked (wait) state. But if the mysqldump takes a really long time, would the processes that are waiting continue to wait indefinitely?
I tried this experiment for example- I have a Java (JDBC) program writing to a mysql database table called MY_TEST. I logged in to mysql console and issue "LOCK TABLES MY_TEST READ;" command manually. So the Java process got blocked waiting for the lock to get released. My question is would there be a connection time out or any such problem if the read lock does not get released for long time? I waited for two minutes and did not notice any error and the java process continued normally once the lock was released using "UNLOCK tables" command. Is this behavior specific to the java mysql driver or can I expect the same thing from a C program using mysql driver?
b) My second question is on the --single-transaction option. Suppose I have 10 InnoDB tables, out of which 3 tables are related to each other (using FK) and the others being independent but still using InnoDB engine. Does single transaction apply only for the 3 tables which are inter-related using FK? or can I expect the state of the 7 independent tables to be exactly how it was when the 3 inter-dependent tables were dumped.

a.) I believe the answer is yes, at the mysql level the connections will wait indefinitely for mysqldump to release the table locks. You can control this a bit at the application level by using a connection pool with a validation query that queries against the tables getting locked and setting the timeout for retrieval to whatever you want. This would be pretty easy to do in c3p0 for example. However, in the absence of other information, I would not recommend this approach; it seems pretty kludgey. I've not used the mysql C driver so I can't so for certain, but I would assume similar behavior to Java. All of this is why for mysqldump is not a good option for a live backup of systems with non-trivial amounts of data and activity.
b. All tables dumped will be dumped as part of a single transaction, thereby yielding a consistent snapshot for all of the tables participating in the dump. Primary-foreign key relationships will have no bearing on the transaction. Using single-transaction is viable option for hot backups.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.