Context
I have a RESTful API for a versus fighting game, using JAX-RS, tomcat8 and Neo4j embedded.
Today I figured that a lot of queries will be done in a limited time, I'm using embedded for faster queries but I still want to go as fast as possible.
Problem
In fact, the problem is a bit different but not that much.
Actually, I'm using a Singleton with a getDabatase() method returning the current GraphDatabaseServiceinstance to begin a transaction, once it's done, the transaction is closed... and that's all.
I don't know if the best solution for optimal perfs is a Singleton pattern or a pool one (like creating XX instances of database connection, and reuse them when the database operation is finished).
I can't test it myself actually, because I don't have enough connections to even know which one is the fastest (and the best overall).
Also, I wonder if I create a pool of GraphDatabaseService instances, will they all be able to access the same datas without getting blocked by the lock?
Crate only one on GraphDatabaseService instance and use it everywhere. There are no need to create instance pool for them. GraphDatabaseService is completely thread-safe, so you can not worry about concurrency (note: transaction are thread-bound, so you can't run multiple transactions in same thread).
All operations in Neo4j should be executed in Transaction. On commit transaction is written in transaction log, and then persisted into database. General rules are:
Always close transaction as early as possible (use try-with-resource)
Close all resources as early as possible (ResourceIterator returned by findNodes() and execute())
Here you can find information about locking strategy.
To be sure that you have best performance, you should:
Check database settings (memory mapping)
Check OS settings (file system)
Check JVM settings (GC, heap size)
Data model
Here you can find some articles about Neo4j configuration & optimizations. All of them have useful information.
Use a pool - definitely.
Creating a database connection is generally very expensive. Using a pool will ensure that connections are kept for a reasonable mount of time and re-used whenever possible.
Related
As I understand it, all transactions are Thread-bound (i.e. with the context stored in ThreadLocal). For example if:
I start a transaction in a transactional parent method
Make database insert #1 in an asynchronous call
Make database insert #2 in another asynchronous call
Then that will yield two different transactions (one for each insert) even though they shared the same "transactional" parent.
For example, let's say I perform two inserts (and using a very simple sample, i.e. not using an executor or completable future for brevity, etc.):
#Transactional
public void addInTransactionWithAnnotation() {
addNewRow();
addNewRow();
}
Will perform both inserts, as desired, as part of the same transaction.
However, if I wanted to parallelize those inserts for performance:
#Transactional
public void addInTransactionWithAnnotation() {
new Thread(this::addNewRow).start();
new Thread(this::addNewRow).start();
}
Then each one of those spawned threads will not participate in the transaction at all because transactions are Thread-bound.
Key Question: Is there a way to safely propagate the transaction to the child threads?
The only solutions I've thought of to solve this problem:
Use JTA or some XA manager, which by definition should be able to do
this. However, I ideally don't want to use XA for my solution
because of it's overhead
Pipe all of the transactional work I want performed (in the above example, the addNewRow() function) to a single thread, and do all of the prior work in the multithreaded fashion.
Figuring out some way to leverage InheritableThreadLocal on the Transaction status and propagate it to the child threads. I'm not sure how to do this.
Are there any more solutions possible? Even if it's tastes a little bit of like a workaround (like my solutions above)?
The JTA API has several methods that operate implicitly on the current Thread's Transaction, but it doesn't prevent you moving or copying a Transaction between Threads, or performing certain operations on a Transaction that's not bound to the current (or any other) Thread. This causes no end of headaches, but it's not the worst part...
For raw JDBC, you don't have a JTA Transaction at all. You have a JDBC Connection, which has its own ideas about transaction context. In which case, the transaction is Connection bound, not thread bound. Pass the Connection around and the tx goes with it. But Connections aren't necessarily threadsafe and are probably a performance bottleneck anyhow, so sharing one between multiple concurrent threads doesn't really help you. You likely need multiple Connections that think they are in the same Transaction, which means you need XA, since that's how the db identifies such cases. At which point you're back to JTA, but now with a JCA in the picture to handle the Connection management properly. In short, you've reinvented the JavaEE application server.
For frameworks that layer on JDBC e.g. ORMs like Hibernate, you have an additional complication: their abstractions are not necessarily threadsafe. So you can't have a Session that is bound to multiple Threads concurrently. But you can have multiple concurrent Sessions that each participate in the same XA transaction.
As usual it boils down to Amdahl's law. If the speedup you get from using multiple Connections per tx to allow for multiple concurrent Threads to share the db I/O work is large relative to what you get from batching, then the overhead of XA is worthwhile. If the speedup is in local computation and the db I/O is a minor concern, then a single Thread that handles the JDBC Connection and offloads non-IO computation work to a Thread pool is the way to go.
First, a clarification: if you want to speed up several inserts of the same kind, as your example suggests, you will probably get the best performance by issuing the inserts in the same thread and using some type of batch inserting. Depending on your DBMS there are several techniques available, look at:
Efficient way to do batch INSERTS with JDBC
What's the fastest way to do a bulk insert into Postgres?
As for your actual question, I would personally try to pipe all the work to a worker thread. It is the simplest option as you don't need to mess with either ThreadLocals or transaction enlistment/delistment. Furthermore, once you have your units of work in the same thread, if you are smart you might be able to apply the batching techniques above for better performance.
Lastly, piping work to worker threads does not mean that you must have a single worker thread, you could have a pool of workers and achieve some parallelism if it is really beneficial to your application. Think in terms of producers/consumers.
I created an application, which deals with multiple database table at a same time. At present I created a single connection for the process and trying to execute query like select query for multiple tables parallel.
Each table may have hundreds of thousands or millions of records.
I have a connection and multiple statements that are executing parallel in threads.
I want to find out is there any better solution or approach?
I am thinking that if I use connection pool of for example 10 connections and run multiple thread (less than 10) to execute select query. Will this increase my application's performance?
Is my first approach okay?
Is it not a good approach to execute multiple statement same time (parallel) on the database?
In this forum link mentioned that single connection is better.
Databases are designed to run multiple parallel queries. Using a pool will almost certainly enhance your throughput if you are experiencing latency not caused by the database.
If the latency is caused by the database then parallelising may not help - and may even make it worse. Obviously it depends on the kind of query you are running.
I understand from your question that you are using a single Connection object and sharing it across threads. Each of those threads then executes it own statement. I will attempt to respond to your queries in reverse order.
Is it not good approach to execute multiple statement same time
(parallel) on the database?
This is not really a relevant point for this question. Almost all databases should be able to run queries in parallel. And if it cannot then either of your approaches would be almost identical for a concurrency benefit perspective.
Is my first approach Okay?
If you are just doing SELECTs it may not cause issues but you have to very cautious about sharing a Connection object. A number of transactional attributes such as autoCommit and isolation are set on the Connection object - this would mean all those would be shared by all your statements. You have to understand how that works in your case.
See the following links for more information
Is MySQL Connector/JDBC thread safe?
https://db.apache.org/derby/docs/10.2/devguide/cdevconcepts89498.html
Bottomline is if you can use a Connection pool, please do so.
Will this increase my application's performance ?
The best way to check this is to try it out. Theoretical analysis for performance in a multithreaded environment and with database functions rarely gets you accurate results. But then again, considering point 2 it seems you should just go with Connection pool.
EDIT
I just realized what I am thinking as the concern here and what your concern actually is may be different. I was thinking purely from sharing the Connection object perspective to avoid creating additional Connection objects [either pooled or new].
For performance of getting all the data from the database either way (assuming the the 1st way doesn't pose a problem) should be almost identical. In fact even if you create a new Connection object in each thread the overhead of that should typically be insignificant compared to querying millions of records.
I want to use Spring-Hibernate and JDBC together in my application.
Hibernate should do all the updating and writing from one thread and other threads should just be able to read from the database without too much synchronization effort.
Will those JDBC-using threads deliver correct results (if they read from the database a short time after calling persist() or merge()) or could it happen, that Hibernate
has not flushed any updates and therefore other threads return wrong database entries?
"Wrong" depends on the isolation level you set for your connection pool.
I think it can work if Hibernate and Spring share the same connection pool and you set the isolation level to SERIALIZABLE for all connections.
Long-running transactions will be the problem. If all your write operations are fast you won't block. If you don't commit and flush updates quickly the read operations will either have to block and wait OR allow "dirty reads".
That depends. You're basically describing a race condition - if you want to make sure that your read-thread only reads after the write-thread has persisted, you will have to look into thread synchronization methodology.
Cheers,
I am trying to log the creation and destruction of database connections in our application using c3p0's ConnectionCustomizer. In it, I have some code that looks like this:
log(C3P0Registry.getPooledDataSources())
I'm running into deadlocks. I'm discovering that c3p0 has at least a couple of objects in its library that use synchronized methods, and don't seem to specify their intended lock ordering. When I log the connections, I'm holding a lock on C3P0Registry and eventually PoolBackedDataSource (simply creating a list of the datasources is accessing the hashcode causing a lock).
Shutting down the connection provider (calling C3P0ConnectionProvider.close()) causes the locks to be called in the opposite order. But while the child datasources are being shut down, my logging is being triggered. The result is a deadlock.
It seems like both calls I am making into the c3p0 library are valid, expected calls:
C3P0ConnectionProvider.close()
C3P0Registry.getPooledDataSources()
It also seems like (unless explicitly stated in the documentation) it should be the library's responsibility to manage it's own locking strategy. (I don't say this to blame anyone.. just to confirm my understanding of best practices)
How should I deal with this issue? Since c3p0 uses synchronized methods rather than a more modern mechanism, I can't really test the locks.
From my DataSource closing code, I could first grab the C3P0Registry lock before closing the DataSource. I would be guessing at the correct lock order, which I don't know if I feel comfortable with.
I don't think I could reverse the lock order for the logging call. I need the C3P0Registry to get the list of DataSources, so I couldn't lock the DataSources without first locking C3P0Registry to get references to them.
Another solution, of course is to provide another, higher level lock above everything c3p0. In the case of a connection pool, that seems to defeat the point.
For now, I'm rolling back my logging. Thanks for any help.
I dont know how to fix the locking issue, but i think you should take a step back here and think about the original problem.
"I am trying to log the creation and destruction of database connections in our application ..."
I would recommend the following.
Create a class and make it implement javax.sql.DataSource.
Create a field of the same type and delegate all methods to it.
In the getConnection() method return your own Connection class wrapping around
java.sql.Connection and so on.
Then wrap this class around your original data source.
In your classes you can now simply create a logger and log all actions you want to see in your log.
I have a Java program consisting of about 15 methods. And, these methods get invoked very frequently during the exeuction of the program. At the moment, I am creating a new connection in every method and invoking statements on them (Database is setup on another machine on the network).
What I would like to know is: Should I create only one connection in the main method and pass it as an argument to all the methods that require a connection object since it would significantly reduce the number of connections object in the program, instead of creating and closing connections very frequently in every method.
I suspect I am not using the resources very efficiently with the current design, and there is a lot of scope for improvement, considering that this program might grow a lot in the future.
Yes, you should consider re-using connections rather than creating a new one each time. The usual procedure is:
make some guess as to how many simultaneous connections your database can sensibly handle (e.g. start with 2 or 3 per CPU on the database machine until you find out that this is too few or too many-- it'll tend to depend on how disk-bound your queries are)
create a pool of this many connections: essentially a class that you can ask for "the next free connection" at the beginning of each method and then "pass back" to the pool at the end of each method
your getFreeConnection() method needs to return a free connection if one is available, else either (1) create a new one, up to the maximum number of connections you've decided to permit, or (2) if the maximum are already created, wait for one to become free
I'd recommend the Semaphore class to manage the connections; I actually have a short article on my web site on managing a resource pool with a Semaphore with an example I think you could adapt to your purpose
A couple of practical considerations:
For optimum performance, you need to be careful not to "hog" a connection while you're not actually using it to run a query. If you take a connection from the pool once and then pass it to various methods, you need to make sure you're not accidentally doing this.
Don't forget to return your connections to the pool! (try/finally is your friend here...)
On many systems, you can't keep connections open 'forever': the O/S will close them after some maximum time. So in your 'return a connection to the pool' method, you'll need to think about 'retiring' connections that have been around for a long time (build in some mechanism for remembering, e.g. by having a wrapper object around an actual JDBC Connection object that you can use to store metrics such as this)
You may want to consider using prepared statements.
Over time, you'll probably need to tweak the connection pool size
You can either pass in the connection or better yet use something like Jakarta Database Connection Pooling.
http://commons.apache.org/dbcp/
You should use a connection pool for that.
That way you could ask for the connection and release it when you are finish with it and return it to the pool
If another thread wants a new connection and that one is in use, a new one could be created. If no other thread is using a connection the same could be re-used.
This way you can leave your app somehow the way it is ( and not passing the connection all around ) and still use the resources properly.
Unfortunately first class ConnectionPools are not very easy to use in standalone applications ( they are the default in application servers ) Probably a microcontainer ( such as Sping ) or a good framework ( such as Hibernate ) could let you use one.
They are no too hard to code one from the scratch though.
:)
This google search will help you to find more about how to use one.
Skim through
Many JDBC drivers do connection pooling for you, so there is little advantage doing additional pooling in this case. I suggest you check the documentation for you JDBC driver.
Another approach to connection pools is to
Have one connection for all database access with synchronised access. This doesn't allow concurrency but is very simple.
Store the connections in a ThreadLocal variable (override initialValue()) This works well if there is a small fixed number of threads.
Otherwise, I would suggest using a connection pool.
If your application is single-threaded, or does all its database operations from a single thread, it's ok to use a single connection. Assuming you don't need multiple connections for any other reason, this would be by far the simplest implementation.
Depending on your driver, it may also be feasible to share a connection between threads - this would be ok too, if you trust your driver not to lie about its thread-safety. See your driver documentation for more info.
Typically the objects below "Connection" cannot safely be used from multiple threads, so it's generally not advisable to share ResultSet, Statement objects etc between threads - by far the best policy is to use them in the same thread which created them; this is normally easy because those objects are not generally kept for too long.