How can an EJB parallelize a long, CPU intensive process?

How can an EJB parallelize a long, CPU intensive process? - java

The application has a CPU intensive long process that currently runs on one server (an EJB method) serially when the client requests it.
It’s theoretically possible (from a conceptual point of view) to split that process in N chunks and execute them in parallel, as long as the output of all parallel jobs can be collected and joined together before sending it back to the client that initiated the process. I’d like to use this parallelization to optimize performance.
How can I implement this parallelization with EJBs? I know that we should not create threads in a EJB method. Instead, we should publish messages (one per job) to be consumed by message driven beans (MDBs). But then it would not be a synchronous call anymore. And being synchronous seems to be a requirement in this case since I need to collect the output of all jobs before sending it back to the client.
Is there a solution for this?

There are all sorts of ways to do this.
One, you can use an EJB Timer to create a run-once process that will start immediately. This is a good technique to spawn processes in the background. A EJB Timer is associated with a specific Session Bean implementation. You can either add an EJB Timer to every Session Bean that you want to be able to do this, or you can have a single Session Bean that can then call your application logic through some dispatch mechanism.
For me, I pass a serializable blob of parameters along with a class name that meets a specific interface to a generic Session Bean that then executes the class. This way I can easily background most anything.
One caveat about the EJB Timer is that EJB Timers are persistent. Once you create an EJB Timer is stays in the container until its job is finished or canceled. The gotcha on this is that if you have a long running process, and the server goes down, when it restarts the process will continue and pick back up. Mind this can be a good thing, but only if your process is prepared to be restarted. But if your have a simple process iterating through "10,000 items", if the server goes down on item 9,999, when it comes back up you can easily see it simply starting over at item 1. It's all workable, just a caveat to be aware of.
Another way to background something is you can use a JMS queue. Put a message on the queue, and the handler runs aysnchronously from the rest of your application.
The clever part here, and something I has also done leveraging the work with the Timer Bean, is you can control how many "jobs" will run based on how many MDB instances you configure the system to have.
So, for the specific task of running a process in multiple, parallel chunks, I take the task, break it up in to "pieces", and then send each piece on the Message Queue, where the MDBs execute them. If I allow 10 instances of the MDB, I can have 10 "parts" of any task running simultaneously.
This actually works surprisingly well. There's a little overhead it splitting the process up and routing it through the JMS queue, but that's all basically "start up time". Once it gets going, you get a real benefit.
Another benefit of using the Message Queue is you can have your actual long running processes executing on a separate machine, or you can readily create a cluster of machines to handle these processes. Yet, the interface is the same, and the code doesn't know the difference.
I've found once you've relegated a long running process to the background, you can pay the price of having less-that-instant access to that process. That is, there's no reason to monitor the executing classes themselves directly, just have them publish interesting information and statistic to the database, or JMX, or whatever rather than having something that can monitor the object directly because it shares the same memory space.
I was easily able to set up a framework that lets task run either on the EJB Timer or on the MDB scatter queue, the tasks are the same, and I could monitor their progress, stop them, etc.
You could combine the scatter technique to create several EJB Timer jobs. One of the free advantages of the MDB is it acts as a thread pool which can throttle your jobs (so you don't suddenly saturate your system with too many background processes). You get this "for free" just by leveraging the EJB management features in the container.
Finally, Java EE 6 has a new "asynchronous" (or something) qualifier for Session Bean methods. I do not know the details on how this works, as I've yet to play with a new Java EE 6 container. But I imagine you're probably not going to want to change containers just for this facility.

This particular question has come up on multiple occasions and I will summarize that there are several possible solutions, only 1 of which I would recommend.
Use a WorkManager from the commonj API. It allows for managed threads in a Java EE container and is specifically designed to fit your use case. If you are using WebSphere or WebLogic, these API's are already available in your server. For others your will have to put a third party solution in yourself.
WorkManager info
Related questions
Why Spawning threads is discouraged

An EJB is a ultimately a transactional component for a client-server system providing request/reply semantics. If you find yourself in the position that you need to pigeonhole a long-running transaction within the bounds of a request/reply cycle, then somewhere your system architect(ure) has taken the wrong turn.
The situation you describe is cleanly and correctly handled by an event based architecture with a messaging back end. Initial event initiates the process (which can then be trivially parallelized by having the workers subscribe to the event topic) and the aggregating process itself raises an event on its completion. You can still squeeze these sequence within the bounds of a request/reply cycle, but you will by necessity violate the letter and spirit of the Java EE system architecture specs.

Back to the Future - Java EE 7 has lot more Concurrency support via ManagedThreadFactory, ManagedExecutor service etc (JSR 236: Concurrency Utilities for Java EE) with which you can create your own 'managed'Threads .It is no longer a taboo in EE AS supporting it (Wildfly ?) via usining the ManagedThread* API's
More details
https://jcp.org/aboutJava/communityprocess/ec-public/materials/2013-01-1516/JSR236-EC-F2F-Jan2013.pdf
http://docs.oracle.com/javaee/7/tutorial/doc/concurrency-utilities002.htm

I once participated in a project where EJB transactions ran for up to 5 hours at a time. Aargh!
This same application also had a BEA specialist consultant who approved that they started additional threads from the transactions. While it's disrecommended in the specs and elsewhere, it doesn't automatically result in failure. You need to be aware that your extra threads are outside the container's control and thus if something goes wrong it's your fault. But if you can assure that the number of threads started in the worst case doesn't exceed reasonable limits, and that they all terminate cleanly within reasonable time, then it is quite possible to work like this. In fact, in your case it sounds like the almost-only solution.
There are some slightly esoteric solutions possible where your EJB app reaches out to another app for a service, which then does the multithreading in itself before returning to the EJB caller. But this is essentially just shifting the problem around.
You may, however, consider a thread pooling solution to keep an upper limit on the number of threads spawned. If you have too many threads your application will behave horribly.

You've analyzed the situation quite well, and no, there is not patern for this that match the EJB model.
Creating threads is mainly forbidden because it bypass the app. server thread management strategy and also because of the transactions.
I worked on a project with similar requireements and I decided to spawn additional threads (going against the sepc then). The operation to parallelized was read-only, so it worked regarding the transaction (the thread would basically have not transaction associated to them). I also knew that I wouldn't spawn too many threads per EJB calls, so the number of threads was not an issue. But if your threads are supposed to modify data, then you break the transactional model of the EJB seriously. But if your operation in pure computing, that might be ok.
Hope it helps...

Related

How to delay processing reasonably in Java EE context?

Within a Java EE 5 environment I have the problem to ensure the existence of some data written by another part before continue processing my own data.
Historically (J2EE time), it was done by putting the data object to be processed into an internal JMS queue after waiting for e.g. 500ms via Thread.sleep.
But this does not feel like the best way to handle that problem, so I have 2 questions:
Is there any problem with using the sleep method within an Java EE context?
What is a reasonable solution to delaying some processing within an Java EE 5 application?
Edit:
I should have mentioned, that my processing takes place while handling objects from a JMS queue via an MDB.
And it may be the case, that the data for which I'm waiting never shows up, so there must be some sort of timeout, after which I can do some special processing with my data.

You can use EJB TimerService feature. Using threads in a managed environment should be avoided.

I agree with #dkaustubh about timers and avoiding threads manipulation in JavaEE.
Another possibility is to use JMS queue with delayed delivery. Although it is not a part of JavaEE API, most of messaging systems vendors supports it. check here.

I think, its possible with some advanced Threading approach. More than thinking on manual synchronizations and thread management, you can always use the Java Concurrent package.
Future can be one of the ways to do this. Please refer to Java Concurrent package.

Use notifications and Object#wait() / Object#notifyAll()
i.e. Multithreaded, the producer notifies the consumer.

Good or bad idea: Multiple threads in a multi-user servlet-based web application with Java

I am currently building a java-servlet-based web application that should offer its service to quite a lot of users (don't ask me how much "a lot" is :-) - I don't know yet).
However, while the application is being used, there might occur some long-taking processing on the serverside.
In order to avoid bad UI responsiveness, I decided to move these processing operations into their own threads.
This means that once a user is logged in, it can happen that 1-10 threads run in the background (per user!).
I once heard that using multiple threads in a web application is a "bad idea".
Is this true and if yes: Why?
Update: I forgot to mention that my application heavily relies on ajax calls. Every user action causes a new ajax call. So, when the main servlet thread is busy, the ajax call takes very long to process. That's why I want to use multiple threads.

It is a bad idea to manually create the threads yourself. This has been discussed a lot here in SO. See this question for example.
Another question discusses alternative solutions.

The "bad idea" isn't multiple threads. Java EE was originally written so multi-threading was in the hands of the app server, so users were discouraged from starting their own threads.
I think what you really want is asynchronous processing for long-running tasks so users won't have to wait for them to finish before proceeding.
You could do that with JMS and stay within the lines in the Java EE coloring book. I think that it's safer to do on your own, now that there are new classes and constructs in the java.util.concurrent package.
It's still not an easy thing to do. Multi-threaded code isn't trivial. But I think it's easier than it used to be in Java.
Part of the problem might be that you're asking that servlet to do too much. Servlets should listen for HTTP request and orchestrate getting a response from other classes, not do all the processing themselves. Perhaps your servlet is telling you that it's time to refactor a bit. This will help your testing, since you'll be able to unit test those asynch classes without having a servlet/JSP engine running.
AJAX calls to services via HTTP need not block. If the service can return a token, a la FedEx, that tells the app when and how to get the response, there's no reason why the service can't process asynchronously. It's an implementation detail for the services that you should hide from clients.

1.
Brilliant Idea.
It's not common, but it's nothing wrong.
If you think asynchronous tasks are needed for better user experiences. Just use it.
2.
You need to be careful with it.
2.1.
Creating and destroying threads add a lot of overhead to your server.
You'd better use a executor, like java.util.concurrent.ThreadPoolExecutor.
2.2.
Don't just use Executors.newFixedThreadPool(). It is for beginners and hides dangerous details.
You need to know the edge behavior of ThreadPoolExecutor, and configure it properly.
How much threads are enough for your task? You need to calculate it out.
What would happen if there is no free theads in your pool? Different configurations can make it wait, cache, or abandon new tasks. What should you expect?
What would happen if a task runs for too long(such as an infinite loop)? There is no real timeout-and-exit mechanism in java. How do you prevent these.

If the application requires it, then I say go ahead and do the background threads, but, since you don't know how many users you will have, you are taking a great risk that you will overwhelm your server. You might consider some alternatives, if they will work in your situation. Can you run the background tasks completely offline, e.g. in a batch job? Can you limit the number of threads that each logged in user will need? How will you get the results of the background threads back to the user?

This is a bad idea for three main reasons:
Excessive number of running threads can kill system resources and cause some strange things such as starvation and priority inversion. Often this can be solved with a thread pool.
User session duration is unpredictable. The user can fire an action and go for a coffee, or he/she might complain for the delay an redo the action. This can cause creation of multiple background jobs, so requires complex control, and when we talk about threads, we never know for sure if we didn't left race conditions or unantecipated scenarios.
Most likely servlets will have some interaction with the threads. Now suppose your application needs to be scaled, so you use a clustered container (after all, you have "a lot" of users). The container can passivate a session and restore it in another node. But your threads will remain in the initial node, so the link between session and threads will be broken. This ends in unexpected exceptions and error 500 - server failure.
I think the best solution is to design your application so that it won't create so many background threads.
But if you insist or really need it, try using Java EE message driven beans (MDBs) and make your servlet invoke it using JMS, like #duffymo said.
The challenge is how to make communication between MDBs and user sessions. Perhaps your servlet can create a JMS queue or topic and send it to MDBs for them to reply, but I don't know if the servlet side of JMS connection can be passivated and restored.
Another forms of communication would be JNDI or an external database or file, but this requires polling, which might be unresponsive or CPU-excessive.

Java : Creating a Thread Pool in the Server application

I want to create a ThreadPool for a series of database calls(serial). We want to save those milliseconds. So we don't want to waste time in executing the database queries in serial.
I'm working on a server application which already have many parallel nodes. In one of those nodes there are a series of database calls. I want to introduce parallelism inside a node that already running in parallel with other nodes.
Is thread pool executor a good choice? I don't know how many queries I'll be running. It depends on the state of the request object. So I can't fix the queue size of the thread pool.
This is the example that I have found.
Is this efficient? Is there any other alternative? Any suggestions will be appreciated.

Spawning your own threads in a Java EE environment is usually a bad idea. Sometimes it has to be done, but you shouldn't do it if there's an alternative. I'm not sure exactly what you're trying to do, and what version of Java EE you're on, but if it's 6, then maybe you could use an asynchronous EJB.

The standard solution for your problem is using JMS. Each query should be wrapped into command. Command should be sent as JMS message to queue. MDB (message driven bean) should receive them message and perform query asynchronously.
This approach has yet another advantage: if your are working with several physical servers the work will be distributed among them, so the system will be more robust.

Is it a good idea to use ThreadLocal as a context for data?

Is it a good idea to use ThreadLocal as a context for data in web application?

That's what it was made for. But take care to remove the ThreadLocal on the end of the context, else you might run in memory leaks, or at least hold unused data for too long.
ThreadLocals are also very fast, you can think of it as a HashMap<Thread,Object>, which is always queried with Thread.getCurrentThread().

That depends on the scope of the data. The ThreadLocal will be specific to the request thread, not specific to the user's session (each request, a different request processing thread may be used). Hence it will be important to remove the data as the request processing is completing (so that it doesn't bleed over into some other user's session when that same thread services their request).

If you are completing a request/response pair with a single thread, then it works fine in my experience. However, "event driven" webapps are coming into vogue with the rise of ajax and high performance containers. These event driven models allow for a request thread to be returned to their thread pool, e.g. during I/O events, so that the thread is not occupied waiting for an external service call to return. As a result, a single logical request may be serviced by multiple different threads. Event driven architecture, coupled with NIO on the server side can yield highly improved throughput.
With that said, if your application doesn't have this architecture, it seems reasonable to me.
If you're not familiar with this model, take a look at Tomcat 6's "comet" and Jetty 6's Continuations. These are vendor-specific implementations of asynchronous I/O pending official Servlet 3.0 support. Note that Tomcat 7 claims to be fully 3.0 compliant now.

ThreadLocal in a multithreaded program is much the same as a static/global in a non-threaded program. That is to say, use of ThreadLocal is an abomination.

In general I would say no. Use frameworks to do that for you.
In the web-tier of a web application use the session context (or other on top framework specific contexts) to store data and state over request scope.
If you introduce a business layer it should not be dependent on a specific web-context of course. spring and Java EE provide solutions for security, transactions and persistence as a context.
If you touch this manually you should be really careful; it can lead to cleanup problems, memory leaks, and strange bugs...

JMS alternative? something for decoupling sending emails from http reqs

we have a web application that does various things and sometimes emails users depending on a given action. I want to decouple the http request threads from actually sending the email in case there is some trouble with the SMTP server or a backlog. In the past I've used JMS for this and had no problem with it. However at the moment for the web app we're doing JMS just feels a bit of an over kill right now (in terms of setup etc) and I was wondering what other alternative there are out there.
Ideally I just like something that I can run in-process (JVM/Tomcat), but when the servlet context is unloaded any pending items in the queue would be swapped to disk/db. I could of course just code something together involving an in memory Q, but I'm looking to gain the benfit of opensource projects, so wondering whats out there if anything.
If JMS really is the answer anyone know of somethign that could fit our simple requirements.
thanks

I'm using JMS for something similar. Our reasons for using JMS:
We already had a JMS server for something else (so it was just adding a new queue)
We wanted our application be decoupled from the processing process, so errors on either side would stay on their side
The app could drop the message in a queue, commit, and go on. No need to worry about how to persist the messages, how to start over after a crash, etc. JMS does all that for you.

I would think spring integration would work in this case as well.
http://www.springsource.org/spring-integration

Wow, this issue comes up a lot. CommonJ WorkManagager is what you are looking for. A Tomcat implementation can be found here. It allows you to safely create threads in a Java EE environment but is much lighter weight than using JMS (which will obviously work as well).

Beyond JMS, for short messages you could also use Amazon Simple Queue Service (SQS).
While you might think it an overkill too, consider the fact there's minimal maintenance required, scales nicely, has ultra-high availability, and doesn't cost all that much.
No cost for creating new queues etc; or having account. As far as I recall, it's purely based on number of operations you do (sending messages, polling/retrieving).
Main limitation really is the message size (there are others, like not guaranteeing ordering due to distributed nature etc); but that might work as is. Or for larger messages, using related AWS service, s3, for storing actual body, and just passing headers through SQS.

You could use a scheduler. Have a look at Quartz.
The idea is that you schedule a job to start at regular intervals. All requests need to be persisted somewhere. The scheduled job will read them and process them. You need to define the interval between two subsequent jobs to fit your needs.
This is the recommended way of doing things. Full-fledged application servers offer Java EE Timers for this, but these aren't available in Tomcat. Quartz is fine though and you could avoid starting your own threads, which will cause mess in some situations (e.g. in application updates).

I agree that JMS is overkill for this.
You can just send the e-mail in a separate thread (i.e. separate from the request handling thread). The only thing to be careful about is that if your app gets any kind of traffic at all, you may want to use a thread pool to avoid resource depletion issues. The java.util.concurrent package has some nice stuff for thread pools.

Since you say the app "sometimes" emails users it doesn't sound like you're talking about a high volume of mail. A quick and dirty solution would be to just Runtime.getRuntime().exec():
sendmail recipient#domain.com
and dump the message into the resulting Process's getOutputStream(). After that it's sendmail's problem.
Figure a minute to see if you have sendmail available on the server, about fifteen minutes to throw together a test if you do, and nothing to install assuming you found sendmail. A few more minutes to construct the email headers properly (easy - here are some examples) and you're done.
Hope this helps...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.