java API or framework for queue processing - java

i need an open-source java API or framework for processing items in a queue. i can develop something myself, but do not want to re-invent the wheel (and i don't have much experience in multi-threading). is there such a thing?
the closest solution that i can think of is a business process management (BPM) solution.
right now, i am using multiple Quartz jobs to process the items in my queue. it is not really working out because of scalability and concurrency issues.

Sounds like you'd want to use an Executor

A queue of what sort? How many items? Is Quartz not working out because it's too big or too small?
I'd give some serious thought to using message queues in something like OpenMQ.

You can use JMS with ActiveMQ and can create optimized queue system as well as ESB. And want to manage workflow based system then tpdi is right. Use JBoss jbpm.
You can process JMS messages with ThreadPool also. In this case, you can use Executors.

Would the actor model fit your process? It's based around the idea of asynchronously passing messages between other actors. So you can set up a simple state machine to model your process and have all the transitions handled concurrently.

You need to determine if the problem in is the framework you are using or your code. I suggest you measure how fast your application is running and how fast your framework will go if its not doing anything at all. (just passing trivial tasks around) You should be able to perform between 100K to 1 million tasks per second using your in process framework. Even using JMS you should be able to achieve 10K messages per second. If you need to do closer to 10 million tasks per second, I suggest you try grouping your tasks together so each task does more work.
I would be very surprised if your framework was the bottleneck in which case I would suggest using an Executor.
If the framework isn't the cause of your scalability and concurrency issues (which is more likely) you need to restructure your code so it can run for longer periods of time without inter dependencies. i.e. you have to fix your code, a framework won't do that for you.

I know it is 5 years late, but this might help someone else that has been driven into this question.
Nowadays, there is http://queues.io and it contains a whole lot of queuing (and messaging) frameworks...

Related

Parallel job execution with split-and-aggregate in Java

We are working on rewrite of an existing application, and need support for high number of read/write to database. For this, we are proceeding with sharding on MySQL. Since we are allowing bulk APIs for read/write, this would mean parallel execution of queries on different shards.
Can you suggest frameworks which would support the same in Java, mainly focussing on split-and-aggregate jobs. Basically I will define two interfaces ReadTask and WriteTask, and implementation of these tasks will be jobs and they would be submitted as a list for parallel execution.
I might not have termed this question in the right way, but I hope you got the context from the description. Let me know if there is any info needed for answer.
BLUF: This sounds like a common processing pattern in Akka.
This sounds like a Scatter-Gather patterned API.
If you have 1 job, you should first answer if that job will touch only one shard or more? If it will touch many shards you may choose to reject it (allowing only single-shard actions) or you may choose to break it up (scatter) it across other workers.
Akka gives you APIs, especially the Streaming API, that talk about this style of work. Akka is best expressed in Scala, but it has a Java API that gives you all the functionality of the Scala one. That you are talking about "mapping" and "reducing" (or "folding") data, these are functional operations and Scala gives you the functional idioms.
If you scatter it across other workers, you'll need to communicate the manifest of jobs to the gather side of the system.
Hope that's helpful.
You can use the ThreadPoolExecutor & Executors(factory) in Java to create Thread pools to which you can submit your read & write tasks. It allows for Runnable & Callable based on your situation.

How to delay processing reasonably in Java EE context?

Within a Java EE 5 environment I have the problem to ensure the existence of some data written by another part before continue processing my own data.
Historically (J2EE time), it was done by putting the data object to be processed into an internal JMS queue after waiting for e.g. 500ms via Thread.sleep.
But this does not feel like the best way to handle that problem, so I have 2 questions:
Is there any problem with using the sleep method within an Java EE context?
What is a reasonable solution to delaying some processing within an Java EE 5 application?
Edit:
I should have mentioned, that my processing takes place while handling objects from a JMS queue via an MDB.
And it may be the case, that the data for which I'm waiting never shows up, so there must be some sort of timeout, after which I can do some special processing with my data.
You can use EJB TimerService feature. Using threads in a managed environment should be avoided.
I agree with #dkaustubh about timers and avoiding threads manipulation in JavaEE.
Another possibility is to use JMS queue with delayed delivery. Although it is not a part of JavaEE API, most of messaging systems vendors supports it. check here.
I think, its possible with some advanced Threading approach. More than thinking on manual synchronizations and thread management, you can always use the Java Concurrent package.
Future can be one of the ways to do this. Please refer to Java Concurrent package.
Use notifications and Object#wait() / Object#notifyAll()
i.e. Multithreaded, the producer notifies the consumer.

Divide process workflow between remote workers

I need to develop a Java platform to download and process information from Twitter. The basic idea is to have a centralized controller to generate tasks (id and keywords basically) and send this tasks to remote workers (one per computer). I need to receive an status report periodically to know about the status of both, the task and the worker. I'll have at least 60 workers (ten times more in a near future).
My initial idea was to use RMI but I need to communicate in both directions and I don't feel comfortable with RMI. The other approach was to use SSLSockets to send serialized objects but I would have to control a lot of errors and add a lot of code to monitor tasks and workers. Some people told me about use a framework like Spring Batch, Gigaspaces or Quartz.
What do you think would be the best option for this project? By the time being I've read a lot of good things about Gigaspaces but I don't find a good tutorial about how to implement it and Quartz seems promising. What do you think? Is it worth using any of them?
It's not easy to tell you to go for a technology based on your question. GigaSpaces is certainly up to the job but so is Spring Batch. Quartz is just the scheduling part of your question and not so much the remoting and the distribution of workload.
GigaSpaces is a fully fledged application platform to handle scenario's where parallelism, high throughput and scalability is a factor. Spring Batch can definitely also do the job, but unlike GigaSpaces, it is not an application platform. So you would still need to deploy your application somewhere.
However, GigaSpaces is a commericial product (free version available) but there are other frameworks that can help you such as Storm Project (http://storm-project.net/) and Hazelcast (www.hazelcast.com) also come to mind.
So without clarifying your use case it's hard to give a single answer. It all depends on what exactly you want and how you want to use it, now and in the future.

Writing Java server to handle multiple simultaneous clients

I need some advice in building a Java server that handles multiple clients at the same time. The clients need to remain connected for fairly long periods of time. I'm currently using blocking IO and spawning a thread to read from each client that connects to the server, but this is obviously not scalable.
I've found a few options, including using Selector or Executor with fixed size thread pools. I am not too familiar with either one, so which would be the best solution here? Thanks!
It depends on your definition of scalable. The system you have described with a single thread per connection is scalable up to hundreds may be even a couple of thousand concurrent connections, it will hit a wall at some point.
Your question says that your clients connect and stay connected for an extended period of time, it would be possible to have a single IO thread to handle the reading and writing, but have the processing of the request dispatched to another thread using an Executor.
There are frameworks/servers that are already written to handle this sort of event driven design. Have a look at:
Netty recently used by twitter in there query server
Jetty (not to be confused with Netty) capable of NIO and very scalable, might be to HTTP focused
MINA
Grizzly
It's worth noting that the world is full of failed startups & software products that had really scalable architecture. Scaling is a nice problem to have, better to have the problem than not to have it and no customers.
using multiple threads is scalable. Apache for example does this, and some sites using it get many visitors. However, another approach would indeed be using selector, though I have no experience using it.
After all, this seems like a question, which religion is the best.
there's a lot of framework for this kind of job, examples
Netty
Apache MINA
Independently of scalability every server application has it's limits. By using blocking IO, one of your limits will be the number of threads that the VM can spawn because the approach you take is "one-thread-per-client". With NIO (of which Selector is one of the classes), the approach is "one-thread-per-request" which will run out of threads much latter.
Horizontal scalability ( http://en.wikipedia.org/wiki/Scalability#Scale_horizontally_vs._vertically ) of your app will not depend on either of these choices.

JMS alternative? something for decoupling sending emails from http reqs

we have a web application that does various things and sometimes emails users depending on a given action. I want to decouple the http request threads from actually sending the email in case there is some trouble with the SMTP server or a backlog. In the past I've used JMS for this and had no problem with it. However at the moment for the web app we're doing JMS just feels a bit of an over kill right now (in terms of setup etc) and I was wondering what other alternative there are out there.
Ideally I just like something that I can run in-process (JVM/Tomcat), but when the servlet context is unloaded any pending items in the queue would be swapped to disk/db. I could of course just code something together involving an in memory Q, but I'm looking to gain the benfit of opensource projects, so wondering whats out there if anything.
If JMS really is the answer anyone know of somethign that could fit our simple requirements.
thanks
I'm using JMS for something similar. Our reasons for using JMS:
We already had a JMS server for something else (so it was just adding a new queue)
We wanted our application be decoupled from the processing process, so errors on either side would stay on their side
The app could drop the message in a queue, commit, and go on. No need to worry about how to persist the messages, how to start over after a crash, etc. JMS does all that for you.
I would think spring integration would work in this case as well.
http://www.springsource.org/spring-integration
Wow, this issue comes up a lot. CommonJ WorkManagager is what you are looking for. A Tomcat implementation can be found here. It allows you to safely create threads in a Java EE environment but is much lighter weight than using JMS (which will obviously work as well).
Beyond JMS, for short messages you could also use Amazon Simple Queue Service (SQS).
While you might think it an overkill too, consider the fact there's minimal maintenance required, scales nicely, has ultra-high availability, and doesn't cost all that much.
No cost for creating new queues etc; or having account. As far as I recall, it's purely based on number of operations you do (sending messages, polling/retrieving).
Main limitation really is the message size (there are others, like not guaranteeing ordering due to distributed nature etc); but that might work as is. Or for larger messages, using related AWS service, s3, for storing actual body, and just passing headers through SQS.
You could use a scheduler. Have a look at Quartz.
The idea is that you schedule a job to start at regular intervals. All requests need to be persisted somewhere. The scheduled job will read them and process them. You need to define the interval between two subsequent jobs to fit your needs.
This is the recommended way of doing things. Full-fledged application servers offer Java EE Timers for this, but these aren't available in Tomcat. Quartz is fine though and you could avoid starting your own threads, which will cause mess in some situations (e.g. in application updates).
I agree that JMS is overkill for this.
You can just send the e-mail in a separate thread (i.e. separate from the request handling thread). The only thing to be careful about is that if your app gets any kind of traffic at all, you may want to use a thread pool to avoid resource depletion issues. The java.util.concurrent package has some nice stuff for thread pools.
Since you say the app "sometimes" emails users it doesn't sound like you're talking about a high volume of mail. A quick and dirty solution would be to just Runtime.getRuntime().exec():
sendmail recipient#domain.com
and dump the message into the resulting Process's getOutputStream(). After that it's sendmail's problem.
Figure a minute to see if you have sendmail available on the server, about fifteen minutes to throw together a test if you do, and nothing to install assuming you found sendmail. A few more minutes to construct the email headers properly (easy - here are some examples) and you're done.
Hope this helps...

Categories