I'm trying to learn activemq + camel in order to apply it on a real world. I need to consume a queue, process the message and move it to another queue.
My concern regards to performance. I'll need to handle at least 100.000 messages daily. Right now I don't want to deal with vertical or horizontal scaling (we can't spent more money, until people gets convinced that the technology is good).
So, I thought about starting a few threads, that will poll the queue, consume, process and move the messages to other queue. The quantity of threads will depend on how the hardware responds to increasing levels of load.
My first question is: Is this a good approach (starting paralel threads to consume the queue)?
My second question is: I started my learning by reading Camel In Action. I don't know if I'm missing something, but I'm a little bit confused about how to build the consumer. By adapting FtpToJMSExample book example, I came out with the code bellow. In real world, I'll not create connections for each thread. I'll use a connection pool provided by the application server (glassfish).
public class JMSToJMSExample {
public static void main(String args[]) throws Exception {
CamelContext context = new DefaultCamelContext();
ConnectionFactory connectionFactory = new ActiveMQConnectionFactory("vm://localhost");
context.addComponent("jms", JmsComponent.jmsComponentAutoAcknowledge(connectionFactory));
context.addRoutes(new RouteBuilder() {
public void configure() {
from("jms:in")
.process(new CustomProcessor())
.to("jms:out");
} });
context.start();
Thread.sleep(10000);
context.stop();
}
}
It works fine. But, the book calls it as a "polling" solution. I was expecting something like a while loop, so while the queue has messages, it keeps consuming. Ok, the example is polling a queue, but my point with the above example is that, if I reduce the sleep period, it will exit without processing all the messages that it could.
But anyway, I think that is better to establish a long time running thread, instead of asking the connection pool to give me a connection every time that the threads wake-up.
Please, since I'm learning, could you give some example of how to create a thread to poll a jms queue until it gets empty, not polling by time/period?
TIA,
Bob
1) use seda http://camel.apache.org/seda.html for concurrent processing
from("jms:in")
.to("seda:seda1");
from("seda:sead1?concurrentConsumers=10")
.process(new CustomProcessor())
.to("jms:out");
2) read http://camel.apache.org/running-camel-standalone-and-have-it-keep-running.html about how to keep Camel running
a couple of things....
to your question, "Is this a good approach (starting paralel threads to consume the queue)?"
absolutely, this is a very common design pattern
a route will continue to consume as long as your CamelContext is running (you don't need a loop or a timer), see http://camel.apache.org/running-camel-standalone-and-have-it-keep-running.html
your route should look something like this, notice the maxConcurrentConsumers property indicates the desire for multiple consumer threads to pull message from the IN queue, process them and push them to the OUT queue,
from("jms:in?maxConcurrentConsumers=10")
.process(new CustomProcessor())
.to("jms:out);
see http://camel.apache.org/activemq.html for more details
Related
I have an Java-Akka based application where one Akka actor tells another Akka actor to do a certain jobs and it starts doing the job in the command prompt but If I gave him 10 jobs it starts all the jobs at a time in 10 command prompt.
If i'll be having 100+ jobs than my system will be hanged.
So how can I make my application to do the job 1 at a time and all the other jobs should will get the CPU in FIFO(first in first out) manner.
The question is not quite clear but I try to answer with my understanding.
So, it looks like you use actor as a job dispatcher which translates job messages to calls for some "job executor system". Each incoming message is translated to some call.
If this call is synchronous (which smells when working with actors of course but just for understanding) then no problem in your case, your actor waits until call is complete, then proceed with next message in its mailbox.
If that call is asynchronous which I guess what you have then all the messages will be handled one by one without waiting for each other.
So you need to throttle the messages handling in order to have at most one message being processed at a time. This can be archived by "pull" pattern which is described here.
You basically allocate one master actor which has a queue with incoming messages (jobs) and one worker actor which asks for job when it is free of jobs. Be careful with the queue in master actor - you probably don't want it to grow too much, think about monitoring and applying back-pressure, which is another big topic covered by akka-stream.
Background
At a high level, I have a Java application in which certain events should trigger a certain action to be taken for the current user. However, the events may be very frequent, and the action is always the same. So when the first event happens, I would like to schedule the action for some point in the near future (e.g. 5 minutes). During that window of time, subsequent events should take no action, because the application sees that there's already an action scheduled. Once the scheduled action executes, we're back to Step 1 and the next event starts the cycle over again.
My thought is to implement this filtering and throttling mechanism by embedding an in-memory ActiveMQ instance within the application itself (I don't care about queue persistence).
I believe that JMS 2.0 supports this concept of delayed delivery, with delayed messages sitting in a "staging queue" until it's time for delivery to the real destination. However, I also believe that ActiveMQ does not yet support the JMS 2.0 spec... so I'm thinking about mimicking the same behavior with time-to-live (TTL) values and Dead Letter Queue (DLQ) handling.
Basically, my message producer code would put messages on a dummy staging queue from which no consumers ever pull anything. Messages would be placed with a 5-minute TTL value, and upon expiration ActiveMQ would dump them into a DLQ. That's the queue from which my message consumers would actually consume the messages.
Question
I don't think I want to actually consume from the "default" DLQ, because I have no idea what other internal things ActiveMQ might dump there that are completely unrelated to my application code. So I think it would be best for my dummy staging queue to have its own custom DLQ. I've only seen one page of ActiveMQ documentation which discusses DLQ config, and it only addresses XML config files for a standalone ActiveMQ installation (not an in-memory broker embedded within an app).
Is it possible to programmatically configure a custom DLQ at runtime for a queue in an embedded ActiveMQ instance?
I'd also be interested to hear alternative suggestions if you think I'm on the wrong track. I'm much more familiar with JMS than AMQP, so I don't know if this is much easier with Qpid or some other Java-embeddable AMQP broker. Whatever Apache Camel actually is (!), I believe it's supposed to excel at this sort of thing, but that learning curve might be gross overkill for this use case.
Although you're worried that Camel might be gross overkill for this usecase, I think that ActiveMQ is already gross overkill for the usecase you've described.
You're looking to schedule something to happen 5 minutes after an event happens, and for it to consume only the first event and ignore all the ones between the first one and when the 5 minutes are up, right? Why not just schedule your processing method for 5 minutes from now via ScheduledExecutorService or your favorite scheduling mechanism, and save the event in a HashMap<User, Event> member variable. If any more events come in for this user before the processing method fires, you'll just see that you already have an event stored and not store the new one, so you'll ignore all but the first. At the end of your processing method, delete the event for this user from your HashMap, and the next event to come in will be stored and scheduled.
Running ActiveMQ just to get this behavior seems like way more than you need. Or if not, can you explain why?
EDIT:
If you do go down this path, don't use the message TTL to expire your messages; just have the (one and only) consumer read them into memory and use the in-memory solution described above to only process (at most) one batch every 5 minutes. Either have a single queue with message selectors, or use dynamic queues, one per user. You don't need the DLQ to implement the delay, and even if you could get it to do that, it won't give you the functionality of batching everything so you only run once per 5 minutes. This isn't a path you want to go down, even if you figure out how.
A simple solution is keeping track of the pending actions in a concurrent structure and use a ScheduledExecutorService to execute them:
private static final Object RUNNING = new Object();
private final ConcurrentMap<UserId, Object> pendingActions =
new ConcurrentHashMap<>();
private ScheduledExecutorService ses = Executors.newScheduledThreadPool(10);
public void takeAction(final UserId id) {
Object running = pendingActions.putIfAbsent(id, RUNNING); // atomic
if(running == null) { // no pending action for this user
ses.schedule(new Runnable() {
#Override
public void run() {
doWork();
pendingActions.remove(id);
}
}, 5, TimeUnit.MINUTES);
}
}
With Camel this could be easily achieved with an Aggregator component with the parameter completionInterval , so on every five minutes you can check if the list aggregated messages is empty, if it's not fire a message to the route responsible for you user action and empty the list. You do need to maintain the whole list of exchanges, just the state (user action planned or not).
I come from a Perl background and am writing my first Java MVC web application using Spring.
My webapp allows users to submit orders which the app processes synchronously by calling a third-party SOAP service. The next phase of the project is to allow users to submit bulk orders (e.g. a CSV containing 500 rows) and process them asynchronously. Here is a snippet of my existing controller:
#Controller
#Service
#RequestMapping(value = "/orders")
public class OrderController {
#Autowired
OrderService orderService;
#RequestMapping(value="/new", method = RequestMethod.POST)
public String processNewOrder(#ModelAttribute("order") Order order, Map<String, Object> map) {
OrderStatus orderStatus = orderService.processNewOrder(order);
map.put("orderStatus", orderStatus);
return "new";
}
}
I plan to create a new #RequestMapping to deal with the incoming CSV and modify the OrderService to be able to break the CSV apart and persist the individual orders to the database.
My question is: what is the best approach to creating background workers in an MVC Spring app? Ideally I would have 5 threads processing these orders, and most likely from a queue. I have read about #Async or submitting a Runnable to a SimpleAsyncTaskExecutor bean and am not sure which way to go. Some examples would really help me.
I think Spring Batch is overkill and not really what you are looking for. It's more for batch processing like writing all the orders to a file then processing all at once, whereas this seems to be more like asynchronous processing where you just want to have a 'queue' of work and process it that way.
If this is indeed the case, I would look into using a pub/sub model using JMS. There are several JMS providers, for instance Apache ActiveMQ or Pivotal RabitMQ. In essence your OrderService would break the CSV into units of work, push them into a JMS Queue, and you would have multiple Consumers setup to read from the Queue and perform the work task. There are lots of ways to configure this, but I would simply make a class to hold your worker threads and make the number of threads be configurable. The other added benefits here are:
You can externalize the Consumer code, and even make it run on totally different hardware if you like.
MQ is a pretty well-known process, and there are a LOT of commercial offerings. This means you could easily write your order processing system in C# using MQ to move the messages over, or even use Spring Batch if you like. Heck, there is even MQ Series for host, so you could have your order processing occur on mainframe in COBOL if it suited your fancy.
It's stupidly simply to add more consumers or producers. Simply subscribe to the Queue and away they go!
Depending on the product used, the Queue maintains state so messages are not "lost". If all the consumers go offline, the Queue will simply backup and store the messages until the consumers come back.
The queues are also usually more robust. The Producer can go down and the consumers you not even flinch. The consumers can go down and the producer doesn't even need to know.
There are some downsides, though. You now have an additional point of failure. You will probably want to monitor the queue depths, and will need to provision enough space to store the messages when you are caching messages. Also, if timing of the processing could be an issue, you may need to monitor how quick things are getting processed in the queue to make sure it's not backing up too much or breaking any SLA that might be in place.
Edit: Adding example...
If I had a threaded class, for example this:
public class MyWorkerThread implements Runnable {
private boolean run = true;
public void run() {
while (run) {
// Do work here...
}
// Do any thread cooldown procedures here, like stop listening to the Queue.
}
public void setRunning(boolean runState) {
run = runState;
}
}
Then I would start the threads using a class like this:
#Service("MyThreadManagerService")
public class MyThreadManagerServiceImpl implements MyThreadManagerService {
private Thread[] workers;
private int workerPoolSize = 5;
/**
* This gets ran after any constructors and setters, but before anything else
*/
#PostConstruct
private void init() {
workers = new Thread[workerPoolSize];
for (int i=0; i < workerPoolSize; i++) {
workers[i] = new Thread(new MyWorkerThread()); // however you build your worker threads
workers[i].start();
}
}
/**
* This gets ran just before the class is destroyed. You could use this to
* shut down the threads
*/
#PreDestroy
public void dismantle() {
// Tell each worker to stop
for (Thread worker : workers) {
worker.setRunning(false);
}
// Now join with each thread to make sure we give them time to stop gracefully
for (Thread worker : workers) {
worker.join(); // May want to use the one that allows a millis for a timeout
}
}
/**
* Sets the size of the worker pool.
*/
public void setWorkerPoolSize(int newSize) {
workerPoolSize = newSize;
}
}
Now you have a nice service class you can add methods to to monitor, restart, stop, etc., all your worker threads. I made it an #Service because it felt more right than a simple #Component, but technically it can be anything as long as Spring knows to pick it up when you are autowiring. The init() method on the service class is what starts up the threads and the dismantle() is used to gracefully stop them and wait for them to finish. They use the #PostConstruct and #PreDestroy annotations, so you can name them whatever you want. You would probably have a constructor on your MyWorkerThread to setup the Queues and such. Also, as a disclaimer, this was all written from memory so there may be some mild compiling issues or method names may be slightly off.
There may be classes already available to do this sort of thing, but I have never seen one myself. Is someone knows of a better way using off-the-shelf parts I would love to get better educated.
If your order size can grow in future and You want a scalable solution I would suggest you to go with Spring-Batch framework. I find it very easy to integrate with spring-mvc and with minimal configuration you can easily achieve a very robust parallel processing architecture.Use Spring-batch-partitioning.Hope this helps you!! Feel free to ask if you need help regarding integration with Spring MVC.
I have to write heavy load system, with pretty easy task to do. So i decided to split this tasks into multiple workers in different locations (or clouds). To communicate i want to use rabbitmq queue.
In my system there will be two kinds of software nodes: schedulers and workers. Schedulers will take user input from queue_input, split it into smaller task and put this smaller task into workers_queue. Workers reads this queue and 'do the thing'. I used round-robbin load balancing here - and all works pretty well, as long, as some worker crashed. Then i loose information about task completion (it's not allowed to do single operation twice, each task contains a pack of 50 iterations of doing worker-code with diffirent data).
I consider something like technical_queue - another channel to scheduler-worker communication, and I wonder, how to design it in a good way. I used tutorials from rabbitmq page, so my worker thread looks like :
while(true) {
message = consume(QUEUE,...);
handle(message); //do 50 simple tasks in loop for data in message
}
How can i handle second queue? Another thread we some while(true) {} loop?, or is there a better sollution to this? Maybe should I reuse existing queue with topic exchange? (but i wanted to have independent way of communication, while handling the task, which may take some time.
You should probably take a look at spring-amqp (doc). I hate to tell you to add a layer but that spring library takes care of the threading issues and management of threads with its SimpleMessageListenerContainer. Each container goes to a queue and you can specify # of threads (ie workers) per queue.
Alternatively you can make your own using an ExecutorService but you will probably end up rewriting what SimpleMessageListenerContainer does. Also you just could execute (via OS or batch scripts) more processes and that will add more consumers to each queue.
As far as queue topology is concerned it is entirely dependent on business logic/concerns and generally less on performance needs. More often you had more queues for business reasons and more workers for performance reasons but if a queue gets backed up with the same type of message considering giving that type of message its own queue. What your describing sounds like two queues with multiple consumer on your worker queue.
Other than the threading issue and queue topology I'm not entirely sure what else you are asking.
I would recommend you create a second queue consumer
consumer1 -> queue_process
consumer2 -> queue_process
Both consumers should make listening to the same queue.
Greetings I hope will help
I have a very complex system (100+ threads) which need to send email without blocking. My solution to the problem was to implement a class called EmailQueueSender which is started at the beginning of execution and has a ScheduledExecutorService which looks at an internal queue every 500ms and if size()>0 it empties it.
While this is going on there's a synchronized static method called addEmailToQueue(String[]) which accepts an email containing body,subject..etc as an array. The system does work, and my other threads can move on after adding their email to queue without blocking or even worrying if the email was successfully sent...it just seems to be a little messy...or hackish...Every programmer gets this feeling in their stomach when they know they're doing something wrong or there's a better way. That said, can someone slap me on the wrist and suggest a more efficient way to accomplish this?
Thanks!
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
this class alone will probably handle most of the stuff you need.
just put the sending code in a runnable and add it with the execute method.
the getQueue method will allow you to retrieve the current list of waiting items so you can save it when restarting the sender service without losing emails
If you are using Java 6, then you can make heavy use of the primitives in the java.util.concurrent package.
Having a separate thread that handles the real sending is completely normal. Instead of polling a queue, I would rather use a BlockingQueue as you can use a blocking take() instead of busy-waiting.
If you are interested in whether the e-mail was successfully sent, your append method could return a Future so that you can pass the return value on once you have sent the message.
Instead of having an array of Strings, I would recommend creating a (almost trivial) Java class to hold the values. Object creation is cheap these days.
Im not sure if this would work for your application, but sounds like it would. A ThreadPoolExecutor (an ExecutorService-implementation) can take a BlockingQueue as argument, and you can simply add new threads to the queue. When you are done you simply terminate the ThreadPoolExecutor.
private BlockingQueue<Runnable> queue;
...
ThreadPoolExecutor executor = new ThreadPoolExecutor(10, 10, new Long(1000),
TimeUnit.MILLISECONDS, this.queue);
You can keep a count of all the threads added to the queue. When you think you are done (the queue is empty, perhaps?) simply compare this to
if (issuedThreads == pool.getCompletedTaskCount()) {
pool.shutdown();
}
If the two match, you are done. Another way to terminate the pool is to wait a second in a loop:
try {
while (!this.pool.awaitTermination(1000, TimeUnit.MILLISECONDS));
} catch (InterruptedException e) {//log exception...}
There might be a full blown mail package out there already, but I would probably start with Spring's support for email and job scheduling. Fire a new job for each email to be sent, and let the timing of the executor send the jobs and worry about how many need to be done. No queuing involved.
Underneath the framework, Spring is using Java Mail for the email part, and lets you choose between ThreadPoolExecutor (as mention by #Lorenzo) or Quartz. Quartz is better in my opinion, because you can even set it up so that it fires your jobs at fixed points in time like cron jobs (eg. at midnight). The advantage of using Spring is that it greatly simplifies working with these packages, so that your job is even easier.
There are many packages and tools that will help with this, but the generic name for cases like this, extensively studied in computer science, is producer-consumer problem. There are various well-known solutions for it, which could be considered 'design patterns'.