One of our application is expected to have a significant load increase soon and I am in the process of evaluating Hazelcast distributed collections to help us eliminate some existing database bottlenecks.
Multiple instances of our application are running on a bunch of different hosts for horizontal scaling. Different modules of the application get deployed to multiple Webshere Application Servers to spread the load to multiple JVMs. A typical work flow consist in:
A message gets pushed to an MDB from an Webshere MQ queue
The MDB parses the message and saves it to the database
The MDB extracts from the message a special key identifying related messages, and inserts that key into a special locking table so once such a key is picked up by a node it will process all related messages on that node. Processing all related messages in sequence is crucial to our application.
This table is one of the things we want to replace with a hazelcast blocking queue.
The same MDB sends a notification in a Webshere MQ topic informing the other JVMs that work arrived for further processing. This topic we consider replacing with a Hazelcast topic but this is optional
All the above flow happens in the same XA transaction so once the other JVMs receive the notification it is certain that the locking table entry is there available for pick up.
The receiving JVMs once they gets the notification will jump on the locking table trying to lock a key and process all the messages belonging to that key. There is a constant flow of messages so there are always keys ready for pick up by all running JVMs.
We noticed part of our stress tests that because of the multiple threads trying to lock keys at the same time the database starts being under an increasing pressure affecting the overall performance of our application.
There are a few such semaphore tables controlling the in sequence processing and this is what we consider moving to an in memory data grid.
The above is pretty much our story. In theory it seems like a good idea and I hope to achieve a performance increase not necessarily because reducing network traffic as this will happen anyway but at least by spreading the pressure on more than one resource.
I tried to google about how to set up an XA transnational context in which JMS, DB and Hazelcast collections take part. Unfortunately Hazelcast documentation about XA is just a few rows of code and nothing more. I am sure I am not the only one facing this problem and I hope for some inputs here. No need for a working solution, just a link to a good example or some more how to tips documentation to get me moving would be enough.
Thanks u in advance
If you use JTXA and the Hazelcast Resource Adapter (github.com/hazelcast/hazelcast-ra), Hazelcast will be part of the overall JTXA transaction which can include any type of transaction.
I suggest you take a look at the XA test classes:
https://github.com/hazelcast/hazelcast/tree/master/hazelcast/src/test/java/com/hazelcast/xa
Also, there are a few code samples here:
https://github.com/hazelcast/hazelcast-code-samples/tree/master/transactions
Related
I have a server application A that produces records as requests arrive. I want these records to be persisted in a database. However, I don't want to let application A threads spend time persisting the records by communicating directly with the database. Therefore, I thought about using a simple producers-consumers architecture where application A threads produce records and, another application B threads are the consumers that persist the records to the database.
I'm looking for the "best" way to share these records between applications A and B. An important requirement is that application A threads will always be able to send records to the IPC system (e.g. queue but that may be some other solution). Therefore, I think the records must always be stored locally so that application A threads will be able to send record event if network is down.
The initial idea that came to my mind was to use a local message queue (e.g. ActiveMQ). Do you think a local message queue is appropriate? If yes, do you recommend a specific message queue implementation? Note that both applications are written in Java.
Thanks, Mickael
For this type of needs Queueing solution seems to be the best fit as the producer and consumer of the events can work in isolation. There are many solutions out there, and I have personally worked with RabbitMQ and ActiveMQ. Both are equally good. I don't wish to compare their performance characteristics here but RabbitMQ is written in Erlang which a language tailer-made for building real time applications.
Since you're already on Java platform ActiveMQ might be a better option and is capable producing high throughput. With a solution like this, the consumer does not have to be online all the time. Based on how critical your events data are, you may also want to have persistent queues and messages so that in the event of a message broker failure, you can still recover important "event" messages your application A produced.
If there are many applications producing events and later if you wish to scale out(or horizontally scale) the broker service because it's getting a bottleneck, both of the above solutions provide clustering services.
Last but not least, if you want to share these events between different platforms you may wish to share messages in AMQP format, which is a platform-independent wire-level protocol to share messages between heterogenous systems, and I'm not sure if this is requirement for you. RabbitMQ and ActiveMQ both support AMQP. Both of these solutions also support MQTT which is a lightweight messaging protocol but it seems that you don't wish to use MQTT.
There are other products such as HornetQ and Apache Qpid which are also production ready solutions but I have not used them personally.
I think queueing solution is a the best approach in terms of maintainability, loose coupling nature of participating applications and performance.
I have been tasked to develop the architecture for a data transformation pipeline.
Essentially, data comes in at one end and is routed through various internal systems acquiring different forms before ending up in its destination.
The main objectives are -
Fault Tolerant. The message should be recoverable if one of the intermediate systems were down.
Replay/ Resequence - The message can be replayed from any stage and it should be possible to recreate the events in an idempotent manner.
I have a few custom solutions in mind to address
Implement a checkpoint system where a message can be logged at both entry and exit points at each checkpoint so we know where failure happens.
Implement a recovery mechanism that can go to the logged storage ( database, log file etc.. ) and reconstruct events programmatically.
However, I have a feeling this is a fairly standard problem with well defined solutions.
So, I would welcome any thoughts on a suitable architecture to go with, any tools/packages/patterns to refer to etc..
Thanks
Akka is obvious choice. Of course Scala version is more powerful, but even with Java bindings you can achieve a lot.
I think you can follow CQRS approach and use Akka Persistence module. In this case it's easy to replay any sequence of events, because you always have a persistent journal.
Generally Actor Model provides you fault-tolerance using supervision.
Akka Clustering will give you scalability you need.
Really awesome example of using Akka Clustering with Akka Persistence and Cassandra - https://github.com/boldradius/akka-dddd-template (only Scala unfortunately).
One common solution is JMS, where a central component (the JMS Broker) keeps a transactional store of pending messages. Because it does nothing other than that, it can have a high uptime (uptime can further be increased with a failover cluster, in which case you'll likely its persistence store to be a failover cluster, too).
Sending a JMS message can be made transactional, as can consuming a message. These transaction can be synchronized with database transactions through XA-transactions, which does its utmost get as close to exactly-once delivery as possible, but is rather heavy machinery.
In many cases (idempotent receiver), at-least-once delivery is sufficient. This can be accomplished by sending the message with a synchronous transaction (that is, the sender only succeeds once the broker has acknowledged receipt of the message), and consuming a message only after it has been processed.
Requirement : I have 4 servers : A,B,C,D. They all connect to data provider, get the data and persist it into mongodb for N mins. So that if, next time, same request arrives to another server, it takes data from mongodb only instead of making a call to data provider.
|A|
|B| |data provider|
|C|
|D|
But if, |data provider| response slow, there is a possibility that 2 different requests for same resource arrive to A, B. I want one request waiting until the response of first request is received. I am using queue for this which is fine for single server. But now I need need distributed cache due to multiple servers.
Implementation : After reading few articles over the net, I got to know that Distributed Cache in Java can be implemented using ehcache RMI replication. But I have few doubts before going ahead with ehchache. (Although there are more solutions like JCS etc, I decided to pick ehcache on the basis other answers on StackOverflow)
Doubts
What if one of the servers gets down? Does ehcache handles this automatically?
Interesting situation, but I fail too see how an extra cache (of any kind) would help solve the problem. Ultimately your problem boils down to one of coordination between servers and a cache has little to add there.
Instead I would either use a queue shared between the four servers where only one request for a resource is allowed on at a time. Another possibility is a shared Map where each server will lock a resource name while retrieving it. Other servers can then wait on this lock and once it is released try to retrieve the resource from MongoDB.
I haven't tried using it, but a combination of redis and redisson looks like a good fit for such a task.
Ehcache with RMI replication is NOT a distributed cache and will not help in your situation because there will not be any shared state on which to queue/isolate your accesses.
Distributed Ehcache - that is backed by Terracotta - can help as you could combine strong consistency with a CacheLoader to obtain that only one thread across servers does load a given resource. But unless you are ready to stick with Terracotta 3.7.x, this is no longer an open source option.
But as Martin said, this may not be the best answer to your use case, as I feel you already use MongoDB as your fast access storage, which makes the cache redundant.
I have a web application i am rewriting that currently performs a large amount of audit data sql writes. Every step of user interaction results in a method being executed that writes some information to a database.
This has the potential to impact users by causing the interaction to stop due to database problems.
Ideally I want to move this is a message based approach where if data needs to be written it is fired off too a queue, where a consumer picks these up and writes them to the database. It is not essential data, and loss is acceptable if the server goes down.
I'm just a little confused if I should try and use an embedded JMS queue and broker, or a Java queue. Or something I'm not familiar with (suggestions?)
What is the best approach?
More info:
The app uses spring and is running on websphere 6. All message communication is local, it will not talk to another server.
I think logging with JMS is overkill, and especially if loggin is the only reason for using JMS.
Have a look at DBAppender, you can log directly to the database. If performance is your concern you can log asynchronously using Logback.
If you still want to go JMS way then Logback has JMS Queue & Topic appenders
A plain queue will suffice based on your description of the problem. You can have a fixed size queue and discard messages if it fills too quickly since you say they are not critical.
Things to consider:
Is this functionality required by other apps too, now or in the
future.
Are the rate of producing messages so huge that it can start
consuming lot of heap memory when large number of users are logged
in. Important if messages should not be lost.
I'm not sure if that is best practice inside a Java EE container however.
Since you already run on a WebSphere machine, you do have a JMS broker going (SIBus). The easiest way to kick off asynchronous things are to send JMS messages and have a MDB reading them off - and doing database insertions. You might have some issues spawning own threads in WebSphere can still utilise the initial context for JNDI resources.
In a non Java EE case, I would have used a something like a plain LinkedBlockingQueue or any blocking queue, and just have a thread polling that queue for new messages to insert into a database.
I would uses JMS queue only if there are different servers involved. So in your case I would do it in simple plain pure java with some Java queue.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm looking for (simple) examples of problems for which JMS is a good solution, and also reasons why JMS is a good solution in these cases. In the past I've simply used the database as a means of passing messages from A to B when the message cannot necessarily be processed by B immediately.
A hypothetical example of such a system is where all newly registered users should be sent a welcome e-mail within 24 hours of registration. For the sake of argument, assume the DB does not record the time when each user registered, but instead a reference (foreign key) to each new user is stored in the pending_email table. The e-mail sender job runs once every 24 hours, sends an e-mail to all the users in this table, then deletes all the pending_email records.
This seems like the kind of problem for which JMS should be used, but it's not clear to me what benefit JMS would have over the approach I've described. One advantage of the DB approach is that the messages are persistent. I understand that JMS message queues can also be persisted, but in that case there seems to be little difference between JMS and the "database as message queue" approach I've described?
What am I missing?
- Don
JMS and messaging is really about 2 totally different things.
publish and subscribe (sending a message to as many consumers as are interested - a bit like sending an email to a mailing list, the sender does not need to know who is subscribed
high performance reliable load balancing (message queues)
See more info on how a queue compares to a topic
The case you are talking about is the second case, where yes you can use a database table to kinda simulate a message queue.
The main difference is a JMS message queue is a high performance highly concurrent load balancer designed for huge throughput; you can send usually tens of thousands of messages per second to many concurrent consumers in many processes and threads. The reason for this is that a message queue is basically highly asynchronous - a good JMS provider will stream messages ahead of time to each consumer so that there are thousands of messages available to be processed in RAM as soon as a consumer is available. This leads to massive throughtput and very low latency.
e.g. imagine writing a web load balancer using a database table :)
When using a database table, typically one thread tends to lock the whole table so you tend to get very low throughput when trying to implement a high performance load balancer.
But like most middleware it all depends on what you need; if you've a low throughput system with only a few messages per second - feel free to use a database table as a queue. But if you need low latency and high throughput - then JMS queues are highly recommended.
In my opinion JMS and other message-based systems are intended to solve problems that need:
Asynchronous communications : An application need to notify another that an event has occurred with no need to wait for a response.
Reliability. Ensure once-and-only-once message delivery. With your DB approach you have to "reinvent the wheel", specially if you have several clients reading the messages.
Loose coupling. Not all systems can communicate using a database. So JMS is pretty good to be used in heterogeneous environments with decoupled systems that can communicate over system boundaries.
The JMS implementation is "push", in the sense that you don't have to poll the queue to discover new messages, but you register a callback that gets called as soon as a new message arrives.
to address the original comment. what was originally described is the gist of (point-to-point) JMS. the benefits of JMS are, however:
you don't need to write the code yourself (and possibly screw up the logic so that it's not quite as persistent as you think it is). also, third-party impl might be more scalable than simple database approach.
jms handles publish/subscribe, which is a bit more complicated that the point-to-point example you gave
you are not tied to a specific implementation, and can swap it out if your needs change in the future, w/out messing w/ your java code.
One advantage of JMS is to enable asynchronous processing which can by done by database solution as well. However following are some other benefit of JMS over database solution
a) The consumer of the message can be in a remote location. Exposing database for remote access is dangerous. You can workaround this by providing additional service for reading messages from database, that requires more effort.
b) In the case of database the message consumer has to poll the database for messages where as JMS provides callback when a message is arrived (as sk mentioned)
c) Load balancing - if there are lot of messages coming it is easy to have pool of message processors in JMS.
d) In general implementation via JMS will be simpler and take less effort than database route
JMS is an API used to transfer messages between two or more clients. It's specs are defined under JSR 914.
The major advantage of JMS is the decoupled nature of communicating entities - Sender need not have information about the receivers. Other advantages include the ability to integrate heterogeneous platforms, reduce system bottlenecks, increase scalability, and respond more quickly to change.
JMS are just kind of interfaces/APIs and the concrete classes must be implemented. These are already implemented by various organizations/Providers. they are called JMS providers. Example is WebSphere by IBM or FioranoMQ by Fiorano Softwares or ActiveMQ by Apache, HornetQ, OpenMQ etc. .Other terminologies used are Admin Objects(Topics,Queues,ConnectionFactories),JMS producer/Publisher, JMS client and the message itself.
So coming to your question - what is JMS good for?
I would like to give a practical example to illustrate it's importance.
Day Trading
There is this feature called LVC(Last value cache)
In Trading share prices are published by a publisher at regular intervals. Each share has an associated Topic to which it is published to. Now if you know what a Topic is then you must know messages are not saved like queues. Messages are published to the subscribers alive at the time the message was published(Exception being Durables subscribers which get all the messages published from the time it was created but then again we don't want to get too old stock prices which discard the possibility of using it). So if a client want to know a stock price he create a subscriber and then he has to wait till next stock price is published(which is again not what we want). This is where LVC comes into picture. Each LVC message has an associated key. If a messages is sent with a LVC key(for a particular stock) and then another update message with same key them the later overrides the previous one. When ever a subscriber subscribes to a topic(which has LVC enabled) the subscriber will get all the messages with distinct LVC keys. If we keep a distinct key per listed company then when client subscribes to it it will get the latest stock price and eventually all the updates.
Ofcourse this is one of the factors other that reliability,security etc which makes JMS so powerful.
Guido has the full definition. From my experience all of these are important for a good fit.
One of the uses I've seen is for order distribution in warehouses. Imagine an office supply company that has a fair number of warehouses that supply large offices with office supplies. Those orders would come into a central location and then be batched up for the correct warehouse to distribute. The warehouses don't have or want high speed connections in most cases so the orders are pushed down to them over dialup modems and this is where asynchronous comes in. The phone lines are not really that important either so half the orders may get in and this is where reliability is important.
The key advantage is decoupling unrelated systems rather than have them share comon databases or building custom services to pass data around.
Banks are a keen example, with intraday messaging being used to pass around live data changes as they happen. It's very easy for the source system to throw a message "over the wall"; the downside is there's very little in the way of contract between these systems, and you normally see hospitalisation being implemented on the consumer's side. It's almost too loosly coupled.
Other advantages are down to the support for JMS out of the box for many application servers, etc. and all the tools around that: durability, monitoring, reporting and throttling.
There's a nice write-up with some examples here: http://www.winslam.com/laramee/jms/index.html
The 'database as message queue' solution may be heavy for the task. The JMS solution is less tightly coupled in that the message sender does not need to know anything about the recipient. This could be accomplished with some additional abstraction in the 'database as message queue' as well so it is not a huge win...Also, you can use the queue in a 'publish and subscribe' way which can be handy depending on what you are trying to accomplish. It is also a nice way to further decouple your components. If all of your communication is within one system and/or having a log that is immediately available to an application is very important, your method seems good. If you are communicating between separate systems JMS is a good choice.
JMS in combination with JTA (Java Transaction API) and JPA (Java persistence API) can be very useful. With a simple annotation you can put several database actions + message sending/receiving in the same transaction. So if one of them fails everything gets rolled back using the same transaction mechanism.