Within a Java EE 5 environment I have the problem to ensure the existence of some data written by another part before continue processing my own data.
Historically (J2EE time), it was done by putting the data object to be processed into an internal JMS queue after waiting for e.g. 500ms via Thread.sleep.
But this does not feel like the best way to handle that problem, so I have 2 questions:
Is there any problem with using the sleep method within an Java EE context?
What is a reasonable solution to delaying some processing within an Java EE 5 application?
Edit:
I should have mentioned, that my processing takes place while handling objects from a JMS queue via an MDB.
And it may be the case, that the data for which I'm waiting never shows up, so there must be some sort of timeout, after which I can do some special processing with my data.
You can use EJB TimerService feature. Using threads in a managed environment should be avoided.
I agree with #dkaustubh about timers and avoiding threads manipulation in JavaEE.
Another possibility is to use JMS queue with delayed delivery. Although it is not a part of JavaEE API, most of messaging systems vendors supports it. check here.
I think, its possible with some advanced Threading approach. More than thinking on manual synchronizations and thread management, you can always use the Java Concurrent package.
Future can be one of the ways to do this. Please refer to Java Concurrent package.
Use notifications and Object#wait() / Object#notifyAll()
i.e. Multithreaded, the producer notifies the consumer.
Related
I work on a data processing application in which concurrency is achieved by putting several units of work on a message queue that multiple instances of a message driven bean (MDB) listen to. Other than achieving concurrency in this manner, we do not have any specific reason to use the messaging infrastructure and MDBs.
This led me to think why the same could not have been achieved using multiple threads.
So my question is, in what situations can asynchronous messaging (e.g. JMS) be used as an alternative to mutithreading as a means to achieve concurrency ? What are some advantages/disadvantages of using one approach over another.
It can't be used as an alternative to multithreading, it is a way of of implementing multithreading. There are three basic kinds of solutions here:
You are responsible for both ends of the queue;
You are responsible for sending data; or
You are responsible for receiving data.
Receiving data is the kicker here because there's really no way of doing that without some form of multithreading/multiprocessing otherwise you'll only be processing one request at a time. Sending data without multithreading is much more viable but there you're only really pushing the responsibility for dealing with those messages to an external system. So it's not an alternative to multithreading.
In your case with message driven beans, the container is creating and managing threads for you so it's not an alternative to multithreading, you're simply using someone else's implementation.
There are two additional bonuses that I don't think has been mentioned: Transactions and durability.
While it isn't required and quite often isn't the default configuration, JMS providers can be configured to persist the messages and also to participate in a XA transaction with little or no code changes.
In an EJB container, actually, there is no alternative, since you're not allowed to create your own threads in an EJB container. JMS is doing all of that work for you, at a cost of running it through the queue processor. You could also create a Java Connector, which has a more intimate relationship with the container (and thus, can have threads), but it's a lot more work.
If the overhead of using the JMS queue isn't having a performance impact, then it's the easiest solution.
Performance-wise multi-threading should be faster than any messaging, because you add an additional network layer with messaging.
Application-wise messaging helps you to avoid locking and data sharing issues as there is no common object.
From a scaling perspective messaging is a lot better as you can configure just more nodes on several server by configuring the message service instead of changing the application.
Messaging can reduce number of errors in multithreaded applications greatly, since it reduces risk of data races. It also simplifies adding new threads without changing the rest of app.
Although I think JMS is slightly misused here. java.util.concurrent's thread-safe queues and libraries like jetlang may provide you better performance.
Using multi-threading you can achieve concurrency by sharing core of CPU. But if you use JMS instead you can balance the load and can delegate the task to other system.
e.g. Suppose your application demands to send email on completion of certain task. And you want to send email concurrently. Either you can pull a thread and process it asynchronously. Or you can delegate this task of mail sending to other system using JMS. No of receiver threads can be configurable in jms. Also multiple nodes can listen to same JMS queue which balance the loads. And you can use further applications like persistent queue, transaction managed queue as per application.
In simple words, JMS can be better alternative to multi-threading depends on application architecture
Is it a good idea to use ThreadLocal as a context for data in web application?
That's what it was made for. But take care to remove the ThreadLocal on the end of the context, else you might run in memory leaks, or at least hold unused data for too long.
ThreadLocals are also very fast, you can think of it as a HashMap<Thread,Object>, which is always queried with Thread.getCurrentThread().
That depends on the scope of the data. The ThreadLocal will be specific to the request thread, not specific to the user's session (each request, a different request processing thread may be used). Hence it will be important to remove the data as the request processing is completing (so that it doesn't bleed over into some other user's session when that same thread services their request).
If you are completing a request/response pair with a single thread, then it works fine in my experience. However, "event driven" webapps are coming into vogue with the rise of ajax and high performance containers. These event driven models allow for a request thread to be returned to their thread pool, e.g. during I/O events, so that the thread is not occupied waiting for an external service call to return. As a result, a single logical request may be serviced by multiple different threads. Event driven architecture, coupled with NIO on the server side can yield highly improved throughput.
With that said, if your application doesn't have this architecture, it seems reasonable to me.
If you're not familiar with this model, take a look at Tomcat 6's "comet" and Jetty 6's Continuations. These are vendor-specific implementations of asynchronous I/O pending official Servlet 3.0 support. Note that Tomcat 7 claims to be fully 3.0 compliant now.
ThreadLocal in a multithreaded program is much the same as a static/global in a non-threaded program. That is to say, use of ThreadLocal is an abomination.
In general I would say no. Use frameworks to do that for you.
In the web-tier of a web application use the session context (or other on top framework specific contexts) to store data and state over request scope.
If you introduce a business layer it should not be dependent on a specific web-context of course. spring and Java EE provide solutions for security, transactions and persistence as a context.
If you touch this manually you should be really careful; it can lead to cleanup problems, memory leaks, and strange bugs...
The application has a CPU intensive long process that currently runs on one server (an EJB method) serially when the client requests it.
It’s theoretically possible (from a conceptual point of view) to split that process in N chunks and execute them in parallel, as long as the output of all parallel jobs can be collected and joined together before sending it back to the client that initiated the process. I’d like to use this parallelization to optimize performance.
How can I implement this parallelization with EJBs? I know that we should not create threads in a EJB method. Instead, we should publish messages (one per job) to be consumed by message driven beans (MDBs). But then it would not be a synchronous call anymore. And being synchronous seems to be a requirement in this case since I need to collect the output of all jobs before sending it back to the client.
Is there a solution for this?
There are all sorts of ways to do this.
One, you can use an EJB Timer to create a run-once process that will start immediately. This is a good technique to spawn processes in the background. A EJB Timer is associated with a specific Session Bean implementation. You can either add an EJB Timer to every Session Bean that you want to be able to do this, or you can have a single Session Bean that can then call your application logic through some dispatch mechanism.
For me, I pass a serializable blob of parameters along with a class name that meets a specific interface to a generic Session Bean that then executes the class. This way I can easily background most anything.
One caveat about the EJB Timer is that EJB Timers are persistent. Once you create an EJB Timer is stays in the container until its job is finished or canceled. The gotcha on this is that if you have a long running process, and the server goes down, when it restarts the process will continue and pick back up. Mind this can be a good thing, but only if your process is prepared to be restarted. But if your have a simple process iterating through "10,000 items", if the server goes down on item 9,999, when it comes back up you can easily see it simply starting over at item 1. It's all workable, just a caveat to be aware of.
Another way to background something is you can use a JMS queue. Put a message on the queue, and the handler runs aysnchronously from the rest of your application.
The clever part here, and something I has also done leveraging the work with the Timer Bean, is you can control how many "jobs" will run based on how many MDB instances you configure the system to have.
So, for the specific task of running a process in multiple, parallel chunks, I take the task, break it up in to "pieces", and then send each piece on the Message Queue, where the MDBs execute them. If I allow 10 instances of the MDB, I can have 10 "parts" of any task running simultaneously.
This actually works surprisingly well. There's a little overhead it splitting the process up and routing it through the JMS queue, but that's all basically "start up time". Once it gets going, you get a real benefit.
Another benefit of using the Message Queue is you can have your actual long running processes executing on a separate machine, or you can readily create a cluster of machines to handle these processes. Yet, the interface is the same, and the code doesn't know the difference.
I've found once you've relegated a long running process to the background, you can pay the price of having less-that-instant access to that process. That is, there's no reason to monitor the executing classes themselves directly, just have them publish interesting information and statistic to the database, or JMX, or whatever rather than having something that can monitor the object directly because it shares the same memory space.
I was easily able to set up a framework that lets task run either on the EJB Timer or on the MDB scatter queue, the tasks are the same, and I could monitor their progress, stop them, etc.
You could combine the scatter technique to create several EJB Timer jobs. One of the free advantages of the MDB is it acts as a thread pool which can throttle your jobs (so you don't suddenly saturate your system with too many background processes). You get this "for free" just by leveraging the EJB management features in the container.
Finally, Java EE 6 has a new "asynchronous" (or something) qualifier for Session Bean methods. I do not know the details on how this works, as I've yet to play with a new Java EE 6 container. But I imagine you're probably not going to want to change containers just for this facility.
This particular question has come up on multiple occasions and I will summarize that there are several possible solutions, only 1 of which I would recommend.
Use a WorkManager from the commonj API. It allows for managed threads in a Java EE container and is specifically designed to fit your use case. If you are using WebSphere or WebLogic, these API's are already available in your server. For others your will have to put a third party solution in yourself.
WorkManager info
Related questions
Why Spawning threads is discouraged
An EJB is a ultimately a transactional component for a client-server system providing request/reply semantics. If you find yourself in the position that you need to pigeonhole a long-running transaction within the bounds of a request/reply cycle, then somewhere your system architect(ure) has taken the wrong turn.
The situation you describe is cleanly and correctly handled by an event based architecture with a messaging back end. Initial event initiates the process (which can then be trivially parallelized by having the workers subscribe to the event topic) and the aggregating process itself raises an event on its completion. You can still squeeze these sequence within the bounds of a request/reply cycle, but you will by necessity violate the letter and spirit of the Java EE system architecture specs.
Back to the Future - Java EE 7 has lot more Concurrency support via ManagedThreadFactory, ManagedExecutor service etc (JSR 236: Concurrency Utilities for Java EE) with which you can create your own 'managed'Threads .It is no longer a taboo in EE AS supporting it (Wildfly ?) via usining the ManagedThread* API's
More details
https://jcp.org/aboutJava/communityprocess/ec-public/materials/2013-01-1516/JSR236-EC-F2F-Jan2013.pdf
http://docs.oracle.com/javaee/7/tutorial/doc/concurrency-utilities002.htm
I once participated in a project where EJB transactions ran for up to 5 hours at a time. Aargh!
This same application also had a BEA specialist consultant who approved that they started additional threads from the transactions. While it's disrecommended in the specs and elsewhere, it doesn't automatically result in failure. You need to be aware that your extra threads are outside the container's control and thus if something goes wrong it's your fault. But if you can assure that the number of threads started in the worst case doesn't exceed reasonable limits, and that they all terminate cleanly within reasonable time, then it is quite possible to work like this. In fact, in your case it sounds like the almost-only solution.
There are some slightly esoteric solutions possible where your EJB app reaches out to another app for a service, which then does the multithreading in itself before returning to the EJB caller. But this is essentially just shifting the problem around.
You may, however, consider a thread pooling solution to keep an upper limit on the number of threads spawned. If you have too many threads your application will behave horribly.
You've analyzed the situation quite well, and no, there is not patern for this that match the EJB model.
Creating threads is mainly forbidden because it bypass the app. server thread management strategy and also because of the transactions.
I worked on a project with similar requireements and I decided to spawn additional threads (going against the sepc then). The operation to parallelized was read-only, so it worked regarding the transaction (the thread would basically have not transaction associated to them). I also knew that I wouldn't spawn too many threads per EJB calls, so the number of threads was not an issue. But if your threads are supposed to modify data, then you break the transactional model of the EJB seriously. But if your operation in pure computing, that might be ok.
Hope it helps...
I work on a data processing application in which concurrency is achieved by putting several units of work on a message queue that multiple instances of a message driven bean (MDB) listen to. Other than achieving concurrency in this manner, we do not have any specific reason to use the messaging infrastructure and MDBs.
This led me to think why the same could not have been achieved using multiple threads.
So my question is, in what situations can asynchronous messaging (e.g. JMS) be used as an alternative to mutithreading as a means to achieve concurrency ? What are some advantages/disadvantages of using one approach over another.
It can't be used as an alternative to multithreading, it is a way of of implementing multithreading. There are three basic kinds of solutions here:
You are responsible for both ends of the queue;
You are responsible for sending data; or
You are responsible for receiving data.
Receiving data is the kicker here because there's really no way of doing that without some form of multithreading/multiprocessing otherwise you'll only be processing one request at a time. Sending data without multithreading is much more viable but there you're only really pushing the responsibility for dealing with those messages to an external system. So it's not an alternative to multithreading.
In your case with message driven beans, the container is creating and managing threads for you so it's not an alternative to multithreading, you're simply using someone else's implementation.
There are two additional bonuses that I don't think has been mentioned: Transactions and durability.
While it isn't required and quite often isn't the default configuration, JMS providers can be configured to persist the messages and also to participate in a XA transaction with little or no code changes.
In an EJB container, actually, there is no alternative, since you're not allowed to create your own threads in an EJB container. JMS is doing all of that work for you, at a cost of running it through the queue processor. You could also create a Java Connector, which has a more intimate relationship with the container (and thus, can have threads), but it's a lot more work.
If the overhead of using the JMS queue isn't having a performance impact, then it's the easiest solution.
Performance-wise multi-threading should be faster than any messaging, because you add an additional network layer with messaging.
Application-wise messaging helps you to avoid locking and data sharing issues as there is no common object.
From a scaling perspective messaging is a lot better as you can configure just more nodes on several server by configuring the message service instead of changing the application.
Messaging can reduce number of errors in multithreaded applications greatly, since it reduces risk of data races. It also simplifies adding new threads without changing the rest of app.
Although I think JMS is slightly misused here. java.util.concurrent's thread-safe queues and libraries like jetlang may provide you better performance.
Using multi-threading you can achieve concurrency by sharing core of CPU. But if you use JMS instead you can balance the load and can delegate the task to other system.
e.g. Suppose your application demands to send email on completion of certain task. And you want to send email concurrently. Either you can pull a thread and process it asynchronously. Or you can delegate this task of mail sending to other system using JMS. No of receiver threads can be configurable in jms. Also multiple nodes can listen to same JMS queue which balance the loads. And you can use further applications like persistent queue, transaction managed queue as per application.
In simple words, JMS can be better alternative to multi-threading depends on application architecture
I am looking for lightweight messaging framework in Java. My task is to process events in a SEDA’s manner: I know that some stages of the processing could be completed quickly, and others not, and would like to decouple these stages of processing.
Let’s say I have components A and B and processing engine (be this container or whatever else) invokes component A, which in turn invokes component B. I do not care if execution time of component B will be 2s, but I do care if execution time of component A is below 50ms, for example. Therefore, it seems most reasonable for component A to submit a message to B, which B will process at the desired time.
I am aware of different JMS implementations and Apache ActiveMQ: they are too heavyweight for this. I searched for some lightweight messaging (with really basic features like messages serialization and simplest routing) to no avail.
Do you have anything to recommend in this issue?
Do you need any kind of persistence (e.g. if your JVM dies in between processing thousands of messages) and do you need messages to traverse to any other JVMs?
If its all in a single JVM and you don't need to worry about transactions, recovery or message loss if a JVM dies - then as Chris says above, Executors are fine.
ActiveMQ is pretty lightweight; you can use it in a single JVM only with no persistence if you want to; you can then enable transactions / persistence / recovery / remoting (working with multiple JVMs) as and when you need it. But if you need none of these things then its overkill - just use Executors.
Incidentally another option if you are not sure which steps might need persistence/reliability or load balancing to multiple JVMs would be to hide the use of middleware completely so you can switch between in memory SEDA queues with executors to JMS/ActiveMQ as and when you need to.
e.g. it might be that some steps need to be reliable & recoverable (so needing some kind of persistence) and other times you don't.
Really lightweight? Executors. :-) So you set up an executor (B, in your description), and A simply submits tasks to the executor.
I think Apache Camel covers all your needs. It's works within the JVM and supports SEDA style (http://camel.apache.org/seda.html) and simpe routing. Can be used on it's own, or with spring, with a JMS provider or other adaptors.
Sorry for resurrecting an old thread, but maybe it helps somebody else reading it... I think FFMQ is a good candidate for a lightweight messaging framework.
UPDATE: however I'm not sure if it supports redelivery delays (the dead-letter-queue problem). I would find this usable even for lightweight providers. But I guess it could be possible with a combination of MessageSelector query and message properties.
For help to somebody else read this thread:
One of the lightest messaging framework is Mbasseder.
MBassador is a very light-weight message (event) bus implementation following the publish subscribe pattern. It is designed for ease of use and aims to be feature rich and extensible while preserving resource efficiency and performance.
The core of MBassador's high performance is a specialized data structure that minimizes lock contention such that performance degradation of concurrent access is minimal.
Features: Declarative listener definition via annotations, sync and/or async event delivery, weak-references, message filtering