Reading the ActiveMQ documentation (we are using the 5.3 release), I find a section about the possibility of using a JDBC persistence adapter with ActiveMQ.
What are the benefits? Does it provide any gain in performance or reliability? When should I use it?
In my opinion, you would use JDBC persistence if you wanted to have a failover broker and you could not use the file system. The JDBC persistence was significantly slower (during our tests) than journaling to the file system. For a single broker, the journaled file system is best.
If you are running two brokers in an active/passive failover, the two brokers must have access to the same journal / data store so that the passive broker can detect and take over if the primary fails. If you are using the journaled file system, then the files must be on a shared network drive of some sort, using NFS, WinShare, iSCSI, etc. This usually requires a higher-end NAS device if you want to eliminate the file share as a single point of failure.
The other option is that you can point both brokers to the database, which most applications already have access to. The tradeoff is usually simplicity at the price of performance, as the journaled JDBC persistence was slower in our tests.
We run ActiveMQ in an active/passive broker pair with journaled persistence via an NFS mount to a dedicated NAS device, and it works very well for us. We are able to process over 600 msgs/sec through our system with no issues.
Hey, the use of journaled JDBC seems to be better than using JDBC persistence only since the journaling is very much faster than JDBC persistence. It is better than just journalled persistence only cos' you have an additional backup of the messages in the db. Journalled JDBC has the additional advantage that the same data in journal is persisted to the db later and this can be accessed by developers when needed!
However, when you are using master/slave ActiveMQ topology with journalled JDBC, you might end up loosing messages since you might have messages in journal that are not yet into the DB!
If you have a redelivery plugin policy in place and use a master/slave setup, the scheduler is used for the redelivery.
As of today, the scheduler can only be setup on a file database, not on the JDBC. If you do not pay attention to that, you will take all messages that are in redelivery out of the HA scenario and local to the broker.
https://issues.apache.org/jira/browse/AMQ-5238 is an issue in Apache issue tracker that asks for a JDBC persistence adapter for schedulerdb. You can place a vote for it to make it happen.
Actually, even on the top AMQ HA solution, LevelDB+ZooKeeper, the scheduler is taken out of the game and documented to create issues (http://activemq.apache.org/replicated-leveldb-store.html at end of page).
In a JDBC scenario, therefor it can be considered unsafe and unsupported, but at least not clearly documented, how to setup the datastore for the redelivery policy.
Related
My application runs with more than 100 transactions per second in production. I would like to know the configuration should be used to achieve this.
In non Prod environment i am using the cluster with DCAwareLoadBalancingPolocy and consistency level as LOCAL_QUORUM.
All remaining configuration is left as default.
Is the default configuration enough or i need to specify all the connection options like pooling options, socket options, consistency level, etc.,
PS:
Cassandra version 3
Please suggest how to scale it.
The Java driver defaults are quite good, especially for that load. You need to use DCAware/TokenAware load balancing policy that is default. You may tune the connection pooling to allow more "in-flight" requests per single connection. You need to have only single instance of the Session class per application to avoid opening too many connections to cluster. The real performance gain comes from using the asynchronous operations, and having lower consistency level, like, LOCAL_ONE (but this is application specific).
Question: What is the failover strategy that spring batch supports best? Resource usage, failover mechanism have to be focussed on. Any suggestions?
Usecase - Spring batch has to be run to read a file(that will be put on the server by another application) from the server and process it.
Environment is clustered. So, there could be multiple server instances that could trigger the batch jobs trying to read the same file on arrival.
My thoughts: Polling can be done to check the arrival of the file and call the spring batch job. Since it is clustered, we could use active/passive strategy to poll. The other types such as roundrobin or time slicing can also be used.
Pardon me if I am not clear. I can explain if something is unclear.
As I understand from here
http://static.springsource.org/spring-batch/reference/html/scalability.html
the better approach would be to have just one poller and than distribute the job to the cluster through one of the mechanisms provided by spring Batch (I think the one named Remote Chunks is the best choice here).
I don't think you should worry about the clustering strategy as this is handled either by Spring Batch or by other clustering distribution mechanisms.
I have a requirement to produce a prototype (running in a J2EE compatible application server with MySQL) demonstrating the following
Demonstrate ability to distribute a transaction over multiple database located at different sites globally (Application managed data replication)
Demonstrate ability to write a transaction to a database from a choice of a number of database clusters located at multiple locations. The selection of which database to write to is based on user location. (Database managed data replication)
I have the option to choose either a Spring stack or a Java EE stack (EJB etc). It would be useful to know of your opinions as to which stack is better at supporting distributed transactions on multiple database clusters.
If possible, could you also please point me to any resources you think would be useful to learn of how to implement the above using either of the two stacks. I think seeing examples of both would help in understanding how they work and probably be in a better position to decide which stack to use.
I have seen a lot of sites by searching on Google but most seem to be outdated (i.e. pre EJB 3 and pre Spring 3)
Thanks
I would use the JavaEE stack the following way:
configure a XA DataSource for each database server
according to user's location, a Stateless EJB looks up the corresponding DataSource and get the connection from it
when broadcasting a transaction into all servers, a Stateless EJB has to iterate on all configured DataSources to execute one or more queries on them, but in a single transaction
In case of a technical failure, the transaction is rolled back on all concerned servers. In case of a business failure, the code can trigger a rollback thanks to context.setRollbackOnly().
That way, you benefit from JavaEE automatic distributed transaction demarcation first, and then you can use more complex patterns if you need to manage transaction manually.
BUT the more servers you have enlisted in your transaction, the longest the two-phase commit operation will last, moreover if you have high latency between systems. And I doubt MySQL is the best relational database implementation to do such complex distributed transactions.
My DAL is implemented with Hibernate and I want to use EHCache as its second level cache with its distributed capabilities (for scalability and HA).
Seeing as EHCache provides distributed caching only with Terracotta my question is what is the role of the Terracotta server instance? Does it also hold data? Does it only coordinate the distribution between the partitioned cache parts?
My confusion derives mainly from this explanation regarding TSA which says the server holds the data but I think that maybe in my scenario the cache and the Terracotta server are sort of merged. Am I correct?
If the server does hold data then why shouldn't the bottleneck just move from the db to the Terracotta server?
Update:
Affe's answer answered the second part of my question which was the important part but just in case someone comes by looking for the first part I'll say that the TC server has to hold all the data that the EHCache in memory holds and so if you want a distributed cache (not replicated) then the L2 (TC server) must hold all the objects itself as well.
Thanks in advance,
Ittai
The idea is it's still significantly faster to contact the terracotta cluster via the terracotta driver and do what's basically a Map lookup, than to acquire a database connection and execute an SQL statement. Even if that does become the application's choke point, overall throughput would be expected to still be significantly higher than a JDBC Connection + SQL choke point. Open connections and open cursors are big resource hogs in the database, an open socket to the terracotta cluster is not!
You can get ehcache clustered without using terracotta. They have documentation for doing it via RMI, JGroups and JMS. We are using JMS since we have a significant JMS infrastructure to handle the communication already. I don't know how well it will scale in the long term, but our current concern is just HA.
Is it safe to run a database connection pool (like Commons DBCP or c3p0) as part of an application deployed to an application server like Glassfish or Websphere? Are there any additional steps over a standalone application that should be taken to ensure safety or performance?
Update, clarification of reason - the use case I have in mind could need new data sources to be defined at runtime by skilled end users - changing the data sources is part of the application's functionality, if you like. I don't think I can create abnd use container-managed pools on the fly?
AFAIK it works, but it will of course escape the app. server management features.
Also, I'm not entirely sure how undeployment or redeployment happens and whether the connections are correctly disposed. But that can be considered as a minor safety detail: if disposed improperly, connections will simply time out I guess. I'm also not entirely sure whether it works for XA data source which integrates with the distributed transaction manager.
That said, using the app. server pool is usually a matter of configuring the JNDI name in a configuration file. Then you get monitoring, configuration from the admin console, load management, etc. for free.
In fact, you can create container-managed datasources depending on AS you use.
For example, Weblogic has an extensive management API that is used, for example, by their own WLST (Weblogic Shell) to configure servers by scripts. This is, of course, Java API. It has methods to create and configure datasources too.
Another route is JMX-based configuration. All modern AS expose themselves as JMX containers. You can create datasources via JMX too.
All you need is to grant your application admin privileges (i.e. provide with username/password).
The benefit of container-managed DS is it can be clustered. Also, it can be managed by human being using standard AS UI.
If that doesn't work for you, why, sure you can create application-managed DS any time and in any numbers. Just keep in mind that it will be bound to a specific managed server (unless you implement a manual clustering of it's definition).
I don't see why you'd want to. Why not use the connection pool that the app server provides for you?
Update: I don't believe it's possible to create new pools on the fly without having to bounce the app server, but I could be wrong. If that's correct, I don't believe that Commons DBCP or C3P0 will help.