Handling muliple database using spring and hibernate - java

We want to scale horizontally by pushing data to different database based on user groups. This is required since data would be huge. Right now we are looking at RDBMS only. Spring and hibernate/Eclipse-link is what options we have. I have few questions around it and I see similar questions have been asked multiple times. I am asking this again since I want to understand few more specifics.
What are the best practices one should follow when using multiple databases?(Detailed questions below)
Multiple session factories or single session factory? What is the recommended approach? I see lot of posts talking about creating multiple session factories and dynamic data source implementation uses single session factory and provides different data source based on user group. Any scalability issues using single session factory having many user groups?
All sessions are tied to session factories or the underlying data source? I am assuming that multiple connection pools would be created based on each db am I right?
Dynamic data source implementation of spring to handle multiple databases or Hibernate multi-tenancy?
Are their any issues with Transaction mgmt when it comes to using dynamic data source? I didn't see any posts except for 2nd level cache.
If used c3P0 for connection pooling how its handled in the case of dynamic data source approach?
Any Dos and donts for above approaches?

There is a project developed by Hibernate people and Google, called Hibernate Shards. It is there for just what you need, sharding data across multiple databases.
It provides interfaces named:
org.hibernate.shards.session.ShardedSession
org.hibernate.shards.ShardedSessionFactory
org.hibernate.shards.criteria.ShardedCriteria
org.hibernate.shards.query.ShardedQuery
Each one extends Hibernate's classic interface (with similar name!)
By using Hibernate Shards all multi-tenancy logic goes behind Hibernate, and most of the time you don't have to deal with it.
So transaction management remains, in your case, you shall have a distributed JTA transaction manager, and it would be much easier if you use an application server such as JBoss.

Related

Java web - sessions design for horizontal scalability

I'm definitely not an expert Java coder, I need to implement sessions in my java Servlet based web application, and as far as I know this is normally done through HttpSession. However this method stores the session data in the local filesystem and I don't want to have such constraints to horizontal scalability. Therefore I thought to save sessions in an external database to which the application communicates through a REST interface.
Basically in my application there are users performing some actions such as searches. Therefore what I'm going to persist in sessions is essentialy the login data, and the meta data associated to searches.
As the main data storage I'm planning to use a graph noSQL database, the question is: let's say I can eventually also use another database of another kind for sessions, which architecture fits better for this kind of situation?
I currently thought to two possible ways. the first one uses another db (such as an SQL db) to store sessions data. In this way I would have a more distributed workload since I'm not using the main storage also for sessions. Moreover I'd also have a more organized environment being session state variables and persisten ones not mixed up.
The second way instead consists in storing every information relative to any session into the "user node" of the main database. The sessionid will be at this point just a "shortcut" for an authentication. This way I dont have to rely on a second database, however I move all the workload to the main db mixing the session data with the persistent ones.
is there any standard general architecture to which I can ake reference? DO I miss some important point which should constraint my architecture?
Your idea to store sessions in a different location is good. How about using an in-memory cache like memcached or redis? Session data is generally not long-lived so you have other options other than a full-blown database. Memcached & Redis can both be clustered and can scale horizontally.

Database caching with Spring and being able to query it

So, I have a Java EE application using Spring framework and JDBCtemplate. And, my application has to do several JDBC database read requests (no/very little writes) on the same database (which is a Postgres DB but is not normalized for a bunch of reasons) but with different sql statements (different where clauses). So, given this situation, I would like to be able to cache the database and be able to run queries on the cache, thereby saving me expensive JDBC calls. So, please suggest appropriate tools or frameworks or any other solutions.
You can start with using simple maps depending the query parameter you are using. A more viable solution is using ehcache.
If you use Spring 3.1 or later, you can use #Cacheable on methods. You need to include <cache:annotation-driven /> in your application context configuration. For simple cases you may use spring's ConcurrentCacheFactoryBean as cache manager. For more complex cases you can use ehcache via spring's ehcache adapter. Use #CacheEvict to reset cache.

Best approach for Spring+MyBatis with Multiple Databases to support failovers

I need to develop some services and expose an API to some third parties.
In those services I may need to fetch/insert/update/delete data with some complex calculations involved(not just simple CRUD). I am planning to use Spring and MyBatis.
But the real challenge is there will be multiple DB nodes with same data(some external setup will takes care of keeping them in sync). When I got a request for some data I need to randomly pick one DB node and query it and return the results. If the selected DB is unreachable or having some network issues or some unknown problem then I need to try to connect to some other DB node.
I am aware of Spring's AbstractRoutingDataSource. But where to inject the DB Connection Retry logic? Will Spring handle transactions properly if I switch the dataSource dynamically?
Or should I avoid Spring & MyBatis out-of-the-box integration and do Transaction management by myself using MyBatis?
What do you guys suggest?
I propose to you using of NoSQL database like MongoDB. It is easy clustering. You can configure for example use 10 servers and do replication of data 3 times.
Thats mean that if 2 of your 10 servers will fails - you still got data save.
NoSQL databases is different comparing to RDBS, but they can give hight performance for clustering.
Also, there is no transactions support for NoSQL - you have to do it manually in case of financial operations.
Actually you should thing in different way when developing with NoSQL.
Yes, it will work. Get AbstractRoutingDataSource and code your own one. The only thing you cannot do is to change the target database while a transaction is running.
So what you have to do is putting the db retry code in the getConnection. If during the transaction that connection becomes invalid you should let it fail.

Java EE / EJB vs Spring for Distributed Transaction management with multiple DB Clusters

I have a requirement to produce a prototype (running in a J2EE compatible application server with MySQL) demonstrating the following
Demonstrate ability to distribute a transaction over multiple database located at different sites globally (Application managed data replication)
Demonstrate ability to write a transaction to a database from a choice of a number of database clusters located at multiple locations. The selection of which database to write to is based on user location. (Database managed data replication)
I have the option to choose either a Spring stack or a Java EE stack (EJB etc). It would be useful to know of your opinions as to which stack is better at supporting distributed transactions on multiple database clusters.
If possible, could you also please point me to any resources you think would be useful to learn of how to implement the above using either of the two stacks. I think seeing examples of both would help in understanding how they work and probably be in a better position to decide which stack to use.
I have seen a lot of sites by searching on Google but most seem to be outdated (i.e. pre EJB 3 and pre Spring 3)
Thanks
I would use the JavaEE stack the following way:
configure a XA DataSource for each database server
according to user's location, a Stateless EJB looks up the corresponding DataSource and get the connection from it
when broadcasting a transaction into all servers, a Stateless EJB has to iterate on all configured DataSources to execute one or more queries on them, but in a single transaction
In case of a technical failure, the transaction is rolled back on all concerned servers. In case of a business failure, the code can trigger a rollback thanks to context.setRollbackOnly().
That way, you benefit from JavaEE automatic distributed transaction demarcation first, and then you can use more complex patterns if you need to manage transaction manually.
BUT the more servers you have enlisted in your transaction, the longest the two-phase commit operation will last, moreover if you have high latency between systems. And I doubt MySQL is the best relational database implementation to do such complex distributed transactions.

Are there any design patterns that could work in this scenario?

We have a system (Java web application) that's been in active development / maintenance for a long time now (something like ten years).
What we're looking at doing is implementing a RESTful API to the web app. This web application, using Jersey, will be a separate project with the intent that it should be able to run alongside the main application or deployed in the cloud.
Because of the nature and age of our application, we've had to implement a (somewhat) comprehensive caching layer on top of the database (postgres) to help keep load down. Anyway, for the RESTful API, the idea is that GET requests will go to the cache first instead of the database to keep load of the database.
The cache will be populated in a way to help ensure that most things registered API users will need should be in there.
If there is a cache miss, the needed data should be retrieved from the database (also being entered into the cache in the process).
Obviously, this should remain transparent from the RESTful endpoint methods in my code. We've come up with the idea of creating a 'Broker' to handle communications with the DB and the cache. The REST layer will simply pass across ids (if looking to retrieve) or populated Java objects (if looking to insert / update) and the broker will take care of retrieving / updating / invalidating, etc.
There is also the issue of extensibility. To begin with, the API will be living alongside the rest of servers so access to the database won't be an issue however if we deploy to the cloud, we're going to need a different Broker implementation that will communicate with the system (namely the database) in a different manner (potentially through the use of an internal API).
I already have a rough idea on how I can implement this but it struck me that is probably a problem for which a suitable pattern could exist. If I could follow an established pattern as opposed to coming up with my own solution, that'll probably be a better choice. Any ideas?
Ehcache has an implementation of just such a cache that it calls a SelfPopulatingCache.
Requests are made to the cache, not to the database. Then if there is a cache miss Ehcache will call the database (or whatever external data source you have) on your behalf.
You just need to implement a CacheEntryFactory which has a single method:
Object createEntry(Object key) throws Exception;
So as the name suggests, Ehcache implements this concept with a pretty standard factory pattern...
There's no pattern. Just hide the initial DB services behind interfaces, build tests around their intended behavior, then switch in an implementation that uses the caching layer. I guess dependency injection would be the best thing to help you do that?
Sounds like decorator pattern will suit your need: http://en.wikipedia.org/wiki/Decorator_pattern.
You can create an DAO interface for data access, something like:
Value get(long id);
And firstly create a direct DB implementation, then create a Cache implementation which calls underlying DAO instance, in this case it should be the DB implementation.
The Cache implementation will try to get value from its own managed Cache, and from underlaying DAO if it fails.
So both of your old application or the REST will only see DAO interface, without knowing any implemntation details, and in future you can change the implementation freely.
The best design pattern for transparently caching HTTP requests is to use an HTTP cache.

Categories