I come from a Perl background and am writing my first Java MVC web application using Spring.
My webapp allows users to submit orders which the app processes synchronously by calling a third-party SOAP service. The next phase of the project is to allow users to submit bulk orders (e.g. a CSV containing 500 rows) and process them asynchronously. Here is a snippet of my existing controller:
#Controller
#Service
#RequestMapping(value = "/orders")
public class OrderController {
#Autowired
OrderService orderService;
#RequestMapping(value="/new", method = RequestMethod.POST)
public String processNewOrder(#ModelAttribute("order") Order order, Map<String, Object> map) {
OrderStatus orderStatus = orderService.processNewOrder(order);
map.put("orderStatus", orderStatus);
return "new";
}
}
I plan to create a new #RequestMapping to deal with the incoming CSV and modify the OrderService to be able to break the CSV apart and persist the individual orders to the database.
My question is: what is the best approach to creating background workers in an MVC Spring app? Ideally I would have 5 threads processing these orders, and most likely from a queue. I have read about #Async or submitting a Runnable to a SimpleAsyncTaskExecutor bean and am not sure which way to go. Some examples would really help me.
I think Spring Batch is overkill and not really what you are looking for. It's more for batch processing like writing all the orders to a file then processing all at once, whereas this seems to be more like asynchronous processing where you just want to have a 'queue' of work and process it that way.
If this is indeed the case, I would look into using a pub/sub model using JMS. There are several JMS providers, for instance Apache ActiveMQ or Pivotal RabitMQ. In essence your OrderService would break the CSV into units of work, push them into a JMS Queue, and you would have multiple Consumers setup to read from the Queue and perform the work task. There are lots of ways to configure this, but I would simply make a class to hold your worker threads and make the number of threads be configurable. The other added benefits here are:
You can externalize the Consumer code, and even make it run on totally different hardware if you like.
MQ is a pretty well-known process, and there are a LOT of commercial offerings. This means you could easily write your order processing system in C# using MQ to move the messages over, or even use Spring Batch if you like. Heck, there is even MQ Series for host, so you could have your order processing occur on mainframe in COBOL if it suited your fancy.
It's stupidly simply to add more consumers or producers. Simply subscribe to the Queue and away they go!
Depending on the product used, the Queue maintains state so messages are not "lost". If all the consumers go offline, the Queue will simply backup and store the messages until the consumers come back.
The queues are also usually more robust. The Producer can go down and the consumers you not even flinch. The consumers can go down and the producer doesn't even need to know.
There are some downsides, though. You now have an additional point of failure. You will probably want to monitor the queue depths, and will need to provision enough space to store the messages when you are caching messages. Also, if timing of the processing could be an issue, you may need to monitor how quick things are getting processed in the queue to make sure it's not backing up too much or breaking any SLA that might be in place.
Edit: Adding example...
If I had a threaded class, for example this:
public class MyWorkerThread implements Runnable {
private boolean run = true;
public void run() {
while (run) {
// Do work here...
}
// Do any thread cooldown procedures here, like stop listening to the Queue.
}
public void setRunning(boolean runState) {
run = runState;
}
}
Then I would start the threads using a class like this:
#Service("MyThreadManagerService")
public class MyThreadManagerServiceImpl implements MyThreadManagerService {
private Thread[] workers;
private int workerPoolSize = 5;
/**
* This gets ran after any constructors and setters, but before anything else
*/
#PostConstruct
private void init() {
workers = new Thread[workerPoolSize];
for (int i=0; i < workerPoolSize; i++) {
workers[i] = new Thread(new MyWorkerThread()); // however you build your worker threads
workers[i].start();
}
}
/**
* This gets ran just before the class is destroyed. You could use this to
* shut down the threads
*/
#PreDestroy
public void dismantle() {
// Tell each worker to stop
for (Thread worker : workers) {
worker.setRunning(false);
}
// Now join with each thread to make sure we give them time to stop gracefully
for (Thread worker : workers) {
worker.join(); // May want to use the one that allows a millis for a timeout
}
}
/**
* Sets the size of the worker pool.
*/
public void setWorkerPoolSize(int newSize) {
workerPoolSize = newSize;
}
}
Now you have a nice service class you can add methods to to monitor, restart, stop, etc., all your worker threads. I made it an #Service because it felt more right than a simple #Component, but technically it can be anything as long as Spring knows to pick it up when you are autowiring. The init() method on the service class is what starts up the threads and the dismantle() is used to gracefully stop them and wait for them to finish. They use the #PostConstruct and #PreDestroy annotations, so you can name them whatever you want. You would probably have a constructor on your MyWorkerThread to setup the Queues and such. Also, as a disclaimer, this was all written from memory so there may be some mild compiling issues or method names may be slightly off.
There may be classes already available to do this sort of thing, but I have never seen one myself. Is someone knows of a better way using off-the-shelf parts I would love to get better educated.
If your order size can grow in future and You want a scalable solution I would suggest you to go with Spring-Batch framework. I find it very easy to integrate with spring-mvc and with minimal configuration you can easily achieve a very robust parallel processing architecture.Use Spring-batch-partitioning.Hope this helps you!! Feel free to ask if you need help regarding integration with Spring MVC.
Related
Can I make concurrent calls using Spring JMSTemplate?
I want to make 4 external service calls in parallel and am exploring using Spring's JMSTemplate to perform these calls in parallel and wait for the execution to complete.
The other option that I am looking at is to use ExecutorService.
Is there any advantage using one over the other?
JMSTemplate is thread-safe, so making parallel calls to it is not a problem.
Messaging services are usually fast enough for most tasks and can receive your messages with minimal latency, so adding an ExecutorService doesn't seem as the first thing you usually need. What you really need is to correctly configure your JMS connections pool and give it enough open connections (four in your case) so it could handle your parallel requests with no blocking.
You only need ExecutorService in case you don't care about guaranteed delivery and your program needs extremely high speed that your messaging service cannot deliver, which is highly unlikely.
As for receiving replies from your external service, you need to use JMS Request/Reply pattern (you can find examples in this article). Happily, as you're using Spring, you could make Spring Integration do lots of work for you. You need to configure outbound-gateway to send messages and inbound-gateway to receive responses. Since version 2.2 you can also use reply-listener to simplify things on your client side. All these components are covered in the official documentation (with examples as well).
So need to talk to more than two JMS queues (send and or receive) parallel using asynchronous methods. Best option is usng #Asynch at method level
This example contains RestTemplate , But in your case create JmsTemplate beans.
Prerequisites:- Please create proper JMS Beans to connect to the queue. Proper use of this will help to invoke two queues paralleley. Its works for sure because already I have implemented. I just giving the skeleton due to Copyright issues.
More details: Spring Boot + Spring Asynch
https://spring.io/guides/gs/async-method/
Step1: Create a Service Class where JMS Queue
#EnableAsynch
public class JMSApplication {
#Autowired
JmsService jmsService;
public void invokeMe(){
// Start the clock
long start = System.currentTimeMillis();
// Kick of multiple, asynchronous lookups
Future<Object> queue1= jmsService.findqueue1();
Future<Object> queue2= jmsService.findqueue2();
// Wait until they are all done
while (!(queue1.isDone() && queue2.isDone())) {
Thread.sleep(10); //10-millisecond pause between each check
}
// Print results, including elapsed time
System.out.println("Elapsed time: " + (System.currentTimeMillis() - start));
System.out.println(queue1.get());
System.out.println(queue2.get());
}
}
Step2: Write the Service Method which will contain the business logic
for Jms
#Service
public Class JmsService{
#Asynch
public Object findqueue1(){
//Logic to invoke the JMS queue
}
#Asynch
public Object findqueue2(){
//Logic to invoke the JMS queue
}
}
I'm writing a JEE7/Glassfish 4 application that reads data from an external queue (RabbitMQ) and processes it. It needs a method (I suppose an EJB method) that contains a loop that never exits that reads the queue. I suppose since this loop never exits, it needs to be on a separate thread. My question is, what is the correct way to do this in a JEE7 application?
This may be obvious, but the ReadQueue() method needs to start automatically when the app starts and must keep running permanently.
Is the ManagedExecutorService appropriate for this?
ManagedExecutorService is exactly what you want to use for this.
The availability of this service in JEE is a great benefit. In the past, we basically just ignored the guidelines and managed all of this stuff ourselves.
The MES allows you to capture the context information of the invoking component, and tie your task in to the life cycle of the container. These are both very important in the JEE environment.
As to where to start the task, you basically have two options.
One, you can use a ServletContextListener, and have that kick off the task during container startup.
Two, you can use an #Singleton EJB, and tie in to its lifecycle methods to start your task.
If you start the task up from the ServletContextListener, then the task will run as if it's in the WAR environment. If you start it up from the #Singleton, it will run within the Session Beans environment (this mostly relates to how the JNDI appears).
Either way, you only need to worry about starting the task via these mechanisms. You should rely on the ManagedTaskListener.taskAborted interface method to shut your task down.
In theory you can work with the Thread.interrupt that is sent to your task during shut down. I've never had good luck with that myself, I rely on an external mechanism to tell the long running tasks to shut off.
I wish I could give first hand experience with this new facility, but I haven't had an opportunity to try it out yet. But from the spec, this is what you want to do.
To start a thread with an infinite loop that polls the queue periodically is usually not a good idea. The nature of queues suggests an async, event-driven processing. For such problems in the JEE world you have MDBs. The only issue here is that MDB requires a JMS queue provider but RabbitMQ is using a different protocol (AMQP). You need a JMS-AMQP bridge to make this work. Could be Qpid JMS but no guarantee that it will work.
Here is one way to create a thread that never exits:
public class HelloRunnable implements Runnable {
public void run() {
while (true) {
// do ReadQueue() here
}
}
public static void main(String args[]) {
(new Thread(new HelloRunnable())).start();
}
}
I'm trying to learn activemq + camel in order to apply it on a real world. I need to consume a queue, process the message and move it to another queue.
My concern regards to performance. I'll need to handle at least 100.000 messages daily. Right now I don't want to deal with vertical or horizontal scaling (we can't spent more money, until people gets convinced that the technology is good).
So, I thought about starting a few threads, that will poll the queue, consume, process and move the messages to other queue. The quantity of threads will depend on how the hardware responds to increasing levels of load.
My first question is: Is this a good approach (starting paralel threads to consume the queue)?
My second question is: I started my learning by reading Camel In Action. I don't know if I'm missing something, but I'm a little bit confused about how to build the consumer. By adapting FtpToJMSExample book example, I came out with the code bellow. In real world, I'll not create connections for each thread. I'll use a connection pool provided by the application server (glassfish).
public class JMSToJMSExample {
public static void main(String args[]) throws Exception {
CamelContext context = new DefaultCamelContext();
ConnectionFactory connectionFactory = new ActiveMQConnectionFactory("vm://localhost");
context.addComponent("jms", JmsComponent.jmsComponentAutoAcknowledge(connectionFactory));
context.addRoutes(new RouteBuilder() {
public void configure() {
from("jms:in")
.process(new CustomProcessor())
.to("jms:out");
} });
context.start();
Thread.sleep(10000);
context.stop();
}
}
It works fine. But, the book calls it as a "polling" solution. I was expecting something like a while loop, so while the queue has messages, it keeps consuming. Ok, the example is polling a queue, but my point with the above example is that, if I reduce the sleep period, it will exit without processing all the messages that it could.
But anyway, I think that is better to establish a long time running thread, instead of asking the connection pool to give me a connection every time that the threads wake-up.
Please, since I'm learning, could you give some example of how to create a thread to poll a jms queue until it gets empty, not polling by time/period?
TIA,
Bob
1) use seda http://camel.apache.org/seda.html for concurrent processing
from("jms:in")
.to("seda:seda1");
from("seda:sead1?concurrentConsumers=10")
.process(new CustomProcessor())
.to("jms:out");
2) read http://camel.apache.org/running-camel-standalone-and-have-it-keep-running.html about how to keep Camel running
a couple of things....
to your question, "Is this a good approach (starting paralel threads to consume the queue)?"
absolutely, this is a very common design pattern
a route will continue to consume as long as your CamelContext is running (you don't need a loop or a timer), see http://camel.apache.org/running-camel-standalone-and-have-it-keep-running.html
your route should look something like this, notice the maxConcurrentConsumers property indicates the desire for multiple consumer threads to pull message from the IN queue, process them and push them to the OUT queue,
from("jms:in?maxConcurrentConsumers=10")
.process(new CustomProcessor())
.to("jms:out);
see http://camel.apache.org/activemq.html for more details
I have a Spring-MVC, Hibernate, (Postgres 9 db) Web app. An admin user can send in a request to process nearly 200,000 records (each record collected from various tables via joins). Such operation is requested on a weekly or monthly basis (OR whenever the data reaches to a limit of around 200,000/100,000 records). On the database end, i am correctly implementing batching.
PROBLEM: Such a long running request holds up the server thread and that causes the the normal users to suffer.
REQUIREMENT: The high response time of this request is not an issue. Whats required is not make other users suffer because of this time consuming process.
MY SOLUTION:
Implementing threadpool using Spring taskExecutor abstraction. So i can initialize my threadpool with say 5 or 6 threads and break the 200,000 records into smaller chunks say of size 1000 each. I can queue in these chunks. To further allow the normal users to have a faster db access, maybe I can make every runnable thread sleep for 2 or 3 secs.
Advantages of this approach i see is: Instead of executing a huge db interacting request in one go, we have a asynchronous design spanning over a larger time. Thus behaving like multiple normal user requests.
Can some experienced people please give their opinion on this?
I have also read about implementing the same beahviour with a Message Oriented Middleware like JMS/AMQP OR Quartz Scheduling. But frankly speaking, i think internally they are also gonna do the same thing i.e making a thread pool and queueing in the jobs. So why not go with the Spring taskexecutors instead of adding a completely new infrastructure in my web app just for this feature?
Please share your views on this and let me know if there is other better ways to do this?
Once again: the time to completely process all the records in not a concern, whats required is that normal users accessing the web app during that time should not suffer in any way.
You can parallelize the tasks and wait for all of them to finish before returning the call. For this, you want to use ExecutorCompletionService which is available in Java standard since 5.0
In short, you use your container's service locator to create an instance of ExecutorCompletionService
ExecutorCompletionService<List<MyResult>> queue = new ExecutorCompletionService<List<MyResult>>(executor);
// do this in a loop
queue.submit(aCallable);
//after looping
queue.take().get(); //take will block till all threads finish
If you do not want to wait then, you can process the jobs in the background without blocking the current thread but then you will need some mechanism to inform the client when the job has finished. That can be through JMS or if you have an ajax client then, it can poll for updates.
Quartz also has a job scheduling mechanism but, Java provides a standard way.
EDIT:
I might have misunderstood the question. If you do not want a faster response but rather you want to throttle the CPU, use this approach
You can make an inner class like this PollingThread where batches containing java.util.UUID for each job and the number of PollingThreads are defined in the outer class. This will keep going forever and can be tuned to keep your CPUs free to handle other requests
class PollingThread implements Runnable {
#SuppressWarnings("unchecked")
public void run(){
Thread.currentThread().setName("MyPollingThread");
while (!Thread.interrupted()) {
try {
synchronized (incomingList) {
if (incomingList.size() == 0) {
// incoming is empty, wait for some time
} else {
//clear the original
list = (LinkedHashSet<UUID>)
incomingList.clone();
incomingList.clear();
}
}
if (list != null && list.size() > 0) {
processJobs(list);
}
// Sleep for some time
try {
Thread.sleep(seconds * 1000);
} catch (InterruptedException e) {
//ignore
}
} catch (Throwable e) {
//ignore
}
}
}
}
Huge-db-operations are usually triggered at wee hours, where user traffic is pretty less. (Say something like 1 Am to 2 Am.. ) Once you find that out, you can simply schedule a job to run at that time. Quartz can come in handy here, with time based triggers. (Note: Manually triggering a job is also possible.)
The processed result could now be stored in different table(s). (I'll refer to it as result tables) Later when a user wants this result, the db operations would be against these result tables which have minimal records and hardly any joins would be involved.
instead of adding a completely new infrastructure in my web app just for this feature?
Quartz.jar is ~ 350 kb and adding this dependency shouldn't be a problem. Also note that there's no reason this need to be as a web-app. These few classes that do ETL could be placed in a standalone module.The request from the web-app needs to only fetch from the result tables
All these apart, if you already had a master-slave db model(discuss on that with your dba) then you could do the huge-db operations with the slave-db rather than the master, which normal users would be pointed to.
I'm thinking of using Java's TaskExecutor to fire off asynchronous database writes. Understandably threads don't come for free, but assuming I'm using a fixed threadpool size of say 5-10, how is this a bad idea?
Our application reads from a very large file using a buffer and flushes this information to a database after performing some data manipulation. Using asynchronous writes seems ideal here so that we can continue working on the file. What am I missing? Why doesn't every application use asynchronous writes?
Why doesn't every application use asynchronous writes?
It's often necessary/usefull/easier to deal with a write failure in a synchronous manner.
I'm not sure a threadpool is even necessary. I would consider using a dedicated databaseWriter thread which does all writing and error handling for you. Something like:
public class AsyncDatabaseWriter implements Runnable {
private LinkedBlockingQueue<Data> queue = ....
private volatile boolean terminate = false;
public void run() {
while(!terminate) {
Data data = queue.take();
// write to database
}
}
public void ScheduleWrite(Data data) {
queue.add(data);
}
}
I personally fancy the style of using a Proxy for threading out operations which might take a long time. I'm not saying this approach is better than using executors in any way, just adding it as an alternative.
Idea is not bad at all. Actually I just tried it yesterday because I needed to create a copy of online database which has 5 different categories with like 60000 items each.
By moving parse/save operation of each category into the parallel tasks and partitioning each category import into smaller batches run in parallel I reduced the total import time from several hours (estimated) to 26 minutes. Along the way I found good piece of code for splitting the collection: http://www.vogella.de/articles/JavaAlgorithmsPartitionCollection/article.html
I used ThreadPoolTaskExecutor to run tasks. Your tasks are just simple implementation of Callable interface.
why doesn't every application use asynchronous writes? - erm because every application does a different thing.
can you believe some applications don't even use a database OMG!!!!!!!!!
seriously though, given as you don't say what your failure strategies are - sounds like it could be reasonable. What happens if the write fails? or the db does away somehow
some databases - like sybase - have (or at least had) a thing where they really don't like multiple writers to a single table - all the writers ended up blocking each other - so maybe it wont actually make much difference...