Using connection pool for read only DB instance which skips COMMIT - java

My current setup:
Java backend
Ebean
HikariCP
RDS Aurora MySQL v5.7 having writer and reader nodes
We use reader RDS node for business operations which only require read access to the database. This works just fine (no db locks, better performance, yay!).
However looking into AWS Performance insights I can see that a lot of time is spent on COMMIT operation. In fact, it's the most expensive operation on read instance by far.
Not only it takes time to process but requires extra client-server roundtrip. My naïve self suggests this could be entirely avoided but I could not find any HikariCP settings on this matter. Surely there's nothing to commit for read-only database access, no?
Above said, I do know that databases are allowed to create temporary tables even for read only replicas but to me it seems that they should be equally smart enough to destroy them once the transaction is over and connection is returned to the pool.
FWIW, we never use autocommit=true for write access due to the nature of our app. I'd prefer not using it for read only access as well.
Has anybody managed to get COMMIT-less setup working, or perhaps this is a bad idea?

Related

Replicate modified data on different database

I would like to ask for an starting point of what technology or framework to research.
What I need to accomplish is the following:
We have a Java EE 6 application using JPA for persistance; we would like to use a primary database as some sort of scratchpad, where users can insert/delete records according to the tasks they are are given. Then, at the end of the day an administrator will do some kind of check on their work approving or disapproving it. If he approves the work, all changes will be done permanent and the primary database will be synced - replicated to another one (for security reasons). Otherwise, if administrator do not approve changes they will be rolled back.
Now here I got two problems to figure out:
First.- Is it possible to rollback a bunch of JPA operations done through a certain amount of time?
Second.- Trigger the replication (This can be done by RDBMS engines) process by code.
Now, if RDBMS replication is not possible (maybe because of client requirement) we would need a sync framework for JPA as a backup. I was looking at some JMS solutions, however not clear about the exact process or how to make them work on JPA.
Any help would be greatly appreciated,
Thanks.
I think, your design steps are having too much risk on loosing data. What I understand that you are talking about holding data in memory until admin approves/reject it. You must think about a disaster scenario and saving your data in that case.
Rather this problem statement is more inclined towards a workflow design, where the
data is entered by one entity, it is persisted.
Other entity approve/> reject the data.
All the approved data is further replicated to next database.
All these three steps could be implemented in 3 modules, backed by a persistent storage/ JMS technology. Depending on how real time, each of these steps needs to be; you could think of an elegant design to accomplish this in a cost effective manner.
Add a "workflow state" column to your table. States: Wait for approval, approved, replicated
Persist your data normally using JPA (state: wait for approval)
Approver approves: Update using JPA, change to approved state
As for the replication
In the approve method you could replicate the data synchronously to the other database (using JPA)
You could copy as well the approved data to another table, and use some RDBMS functionality to have the RDBMS replicate the data of that table
You could as well send a JMS message. At the end of the day a job reads the queue and persists the data into the other database
Anyway I suggest using a normal RDBMS cluster with synchronous replication. In that scenario you don't have to develop a self-made replication scheme, and you always have a copy of your data. You always have the workflow state.

Modify JDBC code to target multiple database servers for Inesrt/Update/Deletions

Is there a way that I can use JDBC to target multiple databases when I execute statements (basic inserts, updates, deletes).
For example, assume both servers [200.200.200.1] and [200.200.200.2] have a database named MyDatabase, and the databases are exactly the same. I'd like to run "INSERT INTO TestTable VALUES(1, 2)" on both databases at the same time.
Note regarding JTA/XA:
We're developing a JTA/XA architecture to target multiple databases in the same transaction, but it won't be ready for some time. I'd like to use standard JDBC batch commands and have them hit multiple servers for now if its possible. I realize that it won't be transaction safe, I just wan't the commands to hit both servers for basic testing at the moment.
You need one connection per database. Once you have those, the standard auto commit/rollback calls will work.
You could try Spring; it already has transaction managers set up.
Even if you don't use Spring, all you have to do is get XA versions of the JDBC driver JARs in your CLASSPATH. Two phase commit will not work if you don't have them.
I'd wonder if replication using the database would not be a better idea. Why should the middle tier care about database clustering?
Best quick and dirty way for development is to use multiple database connections. They won't be in the same transaction since they are in different connection. I don't think this would be much of an issue if this is just for testing.
When your JTA/XA architecture is ready, just plug it into the already working code.

Connection pool for couchdb

I have one couchdb database and I am querying it in parallel. Now, I want to create a connection pool, because I discovered a bottleneck in my design - I was using a single instance of couchd , so parallelization was failing due to that.
I searched the web for connection pool implementations, but I was not able to find a proper java connection pool implementation for couchdb - most of the frameworks support relational databases. I will be appreciated if someone can help me for that.
I've never used a couchdb connection pool, but you may have some luck with this:
http://commons.apache.org/pool/
It lets you pool any old object, including connections. It'll take a few lines of code to get it working for you though.
Hope this helps,
Nate
If you are searching for a simple way to load-balance multiple CouchDB instances, why not use an HTTP load balancer like Varnish? Take a look here on how you can set up a simple round-robin load balancer. You can also disable caching if it's undesirable.

What does a Terracotta server do when it is used as a backend for EHCache with Hibernate?

My DAL is implemented with Hibernate and I want to use EHCache as its second level cache with its distributed capabilities (for scalability and HA).
Seeing as EHCache provides distributed caching only with Terracotta my question is what is the role of the Terracotta server instance? Does it also hold data? Does it only coordinate the distribution between the partitioned cache parts?
My confusion derives mainly from this explanation regarding TSA which says the server holds the data but I think that maybe in my scenario the cache and the Terracotta server are sort of merged. Am I correct?
If the server does hold data then why shouldn't the bottleneck just move from the db to the Terracotta server?
Update:
Affe's answer answered the second part of my question which was the important part but just in case someone comes by looking for the first part I'll say that the TC server has to hold all the data that the EHCache in memory holds and so if you want a distributed cache (not replicated) then the L2 (TC server) must hold all the objects itself as well.
Thanks in advance,
Ittai
The idea is it's still significantly faster to contact the terracotta cluster via the terracotta driver and do what's basically a Map lookup, than to acquire a database connection and execute an SQL statement. Even if that does become the application's choke point, overall throughput would be expected to still be significantly higher than a JDBC Connection + SQL choke point. Open connections and open cursors are big resource hogs in the database, an open socket to the terracotta cluster is not!
You can get ehcache clustered without using terracotta. They have documentation for doing it via RMI, JGroups and JMS. We are using JMS since we have a significant JMS infrastructure to handle the communication already. I don't know how well it will scale in the long term, but our current concern is just HA.

Best way to manage database connection for a Java servlet

What is the best way to manage a database connection in a Java servlet?
Currently, I simply open a connection in the init() function, and then close it in destroy().
However, I am concerned that "permanently" holding onto a database connection could be a bad thing.
Is this the correct way to handle this? If not, what are some better options?
edit: to give a bit more clarification: I have tried simply opening/closing a new connection for each request, but with testing I've seen performance issues due to creating too many connections.
Is there any value in sharing a connection over multiple requests? The requests for this application are almost all "read-only" and come fairly rapidly (although the data requested is fairly small).
As everybody says, you need to use a connection pool. Why? What up? Etc.
What's Wrong With Your Solution
I know this since I also thought it was a good idea once upon a time. The problem is two-fold:
All threads (servlet requests get served with one thread per each) will be sharing the same connection. The requests will therefore get processed one at a time. This is very slow, even if you just sit in a single browser and lean on the F5 key. Try it: this stuff sounds high-level and abstract, but it's empirical and testable.
If the connection breaks for any reason, the init method will not be called again (because the servlet will not be taken out of service). Do not try to handle this problem by putting a try-catch in the doGet or doPost, because then you will be in hell (sort of writing an app server without being asked).
Contrary to what one might think, you will not have problems with transactions, since the transaction start gets associated with the thread and not just the connection. I might be wrong, but since this is a bad solution anyway, don't sweat it.
Why Connection Pool
Connection pools give you a whole bunch of advantages, but most of all they solve the problems of
Making a real database connection is costly. The connection pool always has a few extra connections around and gives you one of those.
If the connections fail, the connection pool knows how to open a new one
Very important: every thread gets its own connection. This means that threading is handled where it should be: at the DB level. DBs are super efficient and can handle concurrent request with ease.
Other stuff (like centralizing location of JDBC connect strings, etc.), but there are millions of articles, books, etc. on this
When to Get a Connection
Somewhere in the call stack initiated in your service delegate (doPost, doGet, doDisco, whatever) you should get a connection and then you should do the right thing and return it in a finally block. I should mention that the C# main architect dude said once up a time that you should use finally blocks 100x more than catch blocks. Truer words never spoken...
Which Connection Pool
You're in a servlet, so you should use the connection pool the container provides. Your JNDI code will be completely normal except for how you obtain the connection. As far as I know, all servlet containers have connection pools.
Some of the comments on the answers above suggest using a particular connection pool API instead. Your WAR should be portable and "just deploy." I think this is basically wrong. If you use the connection pool provided by your container, your app will be deployable on containers that span multiple machines and all that fancy stuff that the Java EE spec provides. Yes, the container-specific deployment descriptors will have to be written, but that's the EE way, mon.
One commenter mentions that certain container-provided connection pools do not work with JDBC drivers (he/she mentions Websphere). That sounds totally far-fetched and ridiculous, so it's probably true. When stuff like that happens, throw everything you're "supposed to do" in the garbage and do whatever you can. That's what we get paid for, sometimes :)
I actually disagree with using Commons DBCP. You should really defer to the container to manage connection pooling for you.
Since you're using Java Servlets, that implies running in a Servlet container, and all major Servlet containers that I'm familiar with provide connection pool management (the Java EE spec may even require it). If your container happens to use DBCP (as Tomcat does), great, otherwise, just use whatever your container provides.
I'd use Commons DBCP. It's an Apache project that manages the connection pool for you.
You'd just get your connection in your doGet or doPost run your query and then close the connection in a finally block. (con.close() just returns it to the pool, it doesn't actually close it).
DBCP can manage connection timeouts and recover from them. The way you are currently doing things if your database goes down for any period of time you'll have to restart your application.
Are you pooling your connections? If not, you probably should to reduce the overhead of opening and closing your connections.
Once that's out of the way, just keep the connection open for as long as it's need, as John suggested.
The best way, and I'm currently looking through Google for a better reference sheet, is to use pools.
On initialization, you create a pool that contains X number of SQL connection objects to your database. Store these objects in some kind of List, such as ArrayList. Each of these objects has a private boolean for 'isLeased', a long for the time it was last used and a Connection. Whenever you need a connection, you request one from the pool. The pool will either give you the first available connection, checking on the isLeased variable, or it will create a new one and add it to the pool. Make sure to set the timestamp. Once you are done with the connection, simply return it to the pool, which will set isLeased to false.
To keep from constantly having connections tie up the database, you can create a worker thread that will occasionally go through the pool and see when the last time a connection was used. If it has been long enough, it can close that connection and remove it from the pool.
The benefits of using this, is that you don't have long wait times waiting for a Connection object to connect to the database. Your already established connections can be reused as much as you like. And you'll be able to set the number of connections based on how busy you think your application will be.
You should only hold a database connection open for as long as you need it, which dependent on what you're doing is probably within the scope of your doGet/doPost methods.
Pool it.
Also, if you are doing raw JDBC, you could look into something that helps you manage the Connection, PreparedStatement, etc. Unless you have very tight "lightweightness" requirements, using Spring's JDBC support, for instance, is going to simplify your code a lot- and you are not forced to use any other part of Spring.
See some examples here:
http://static.springframework.org/spring/docs/2.5.x/reference/jdbc.html
A connection pool associated with a Data source should do the trick. You can get hold of the connection from the dataSource in the servlet request method(doget/dopost, etc).
dbcp, c3p0 and many other connection pools can do what you're looking for. While you're pooling connections, you might want to pool Statements and PreparedStatements; Also, if you're a READ HEAVY environment as you indicated, you might want to cache some of the results using something like ehcache.
BR,
~A
Usually you will find that opening connections per request is easier to manage. That means in the doPost() or the doGet() method of your servlet.
Opening it in the init() makes it available to all requests and what happens when you have concurrent requests?

Categories