I'm developing an application which processes asynchronous requests that takes on an average 10 minutes to finish. The server is written using Spring Boot and has 4 replicas and there's a load balancer. In case one of these server crashes while processing certain number of requests, I want these failed requests to restart on the remaining servers in a load balanced way.
Note: There's a common database in which we create a unique entry for every incoming request, and delete that entry when that request is processed successfully.
Constraints:
We can't wait for the server to restart.
There's no extra server to keep watch of these servers.
There's no leader/slave architecture among the servers.
Can someone please help me with this problem?
One solution would be to use a message queue to handle the requests. I would recommend using Apache Kafka (Spring for Apache Kafka) and propose the following solution:
Create 4 Kafka topics.
Whenever each of the 4 replicas receives a request, publish it on one of the 4 topics (randomly) instead of simply handling it.
Each replica will connect to Kafka and consume from one topic. If you let Kafka manage your topics, whenever one replica would crash, one of the other 3 will pick up its topic and start consuming requests in its place.
When the crashed replica restarts and connects to Kafka, it can start consuming again from its topic (this auto-balancing is already implemented in Kafka).
Another advantage of this solution is that you can, if you want to, stop using the database to store requests, as Kafka can act as your database in this case.
Related
Scenario: Currently, we have a Primary cluster, and we have Producer and consumer, which are working as expected. We have to implement a secondary Kafka DR cluster in another data center. I have a couple of ideas, but not sure how to proceed with?
Question: How to automate the producer switch over from Primary cluster to the secondary cluster if the Primary cluster/Broker goes down?
Any sample code will be helpful.
You can use a Load Balancer in front of your producer. The Load Balancer can switch to the secondary cluster if the brokers in primary arent available.
You can also implement the failover without a load balancer.
As a next step you have to configure in your code an exception handling which indicates a reconnect by yourself. So the producer are then still able to ingest.
Consumers can subscribe to super topics (several topics). This can be done with a regular expression.
For the HA Scenario you need 2 Kafka Clusters and Mirrormaker 2.0.
The Failover happens on client side Producer / Consumer.
To your question:
If you have 3 brokers and 2 ISR are configured, a maximum of 1 broker can fail. If 2 brokers fail, the high availability is no longer guaranteed. This means that you can build an exception handling which intercepts the error not enough replicas available and on this basis carries out a reconnect.
If you use the scenario with the load balancer, make sure that the load balancer is configured in the passtrough-. In this way, the amount of code can be reduced.
Application1: source system sending 10 requests per sec to apache camel(ActiveMQ).
Application2: Apache camel which receives request from Application1 and sends it to downstream system Application3(10 requests/sec).
Application3: The downstream system(post API) gets 10 requests per sec from apache camel and processes the request.
Problem statement: Application3 has DB updates and processing tasks to be handled because of the 10 requests at a time to application3 the duplicates are been generated during processing and DB updates.
Please suggest the way I can add a delay of 1sec between each request either at apache camel /the downstream system.
Thanks in advance.
There is a Delay EIP in Camel you can use to delay every message (probably in Application 2) for a fixed or even calculated amount of time. Make sure that you only have 1 Consumer in Application 2.
There is also a Throttle EIP in Camel, but it keeps all messages that are hold back in memory. In your case it is better to slow down the consumption of Application 2 to avoid overloading Application 3.
I guess that Application 3 get duplicates because it can't keep up with the requests and then Application 2 gets Timeouts and its error handling sends the requests again.
However, when you use messaging, you have to prepare for duplicates. In error cases you can easily receive duplicates and therefore you have to make Application 3 idempotent.
Of course, Camel can also help you with this thanks to the Idempotent Consumer.
I am working on application that writes to Kafka queue which is read by other application. When I am unable to send message on Kafka due to network or other reason, I need to write messages during Kafka down time to other place e.g Oracle or local file system, so that I don't loose messages generated during down time.Problem with oracle or other DB is it too can go down. Is there any recommendations about how could I achieve fail safe during Kafka down time.
Number of messages generated are approx 20-25 million per day. For messages stored during downtime I am planning to have batch job to re send them to destination application once target application is up again.
Thank you
You can push those messages into a cloud based messaging service like SQS. It supports around 3K messages per second.
There is also a connector that allows you to push back the messages into Kafka directly, with no other headaches.
If you can't export the data out of your local network, then maybe a cluster of RabbitMQ instances may help, although it wouldn't be a plug & play solution.
RabbitMQ RPC
I decided to use RabbitMQ RPC as described here.
My Setup
Incoming web requests (on Tomcat) will dispatch RPC requests over RabbitMQ to different services and assemble the results. I use one reply queue with one custom consumer that listens to all RPC responses and collects them with their correlation id in a simple hash map. Nothing fancy there.
This works great in a simple integration test on controller level.
Problem
When I try to do this in a web project deployed on Tomcat, Tomcat refuses to shut down. jstack and some debugging learned me a thread is spawn to listen for the RPC response and is blocking Tomcat from shutting down gracefully. I guess this is because the created thread is created on application level instead of request level and is not managed by Tomcat. When I set breakpoints in Servlet.destroy() or ServletContextListener.contextDestroyed(ServletContextEvent sce), they are not reached, so I see no way to manually clean things up.
Alternative
As an alternative, I could use a new reply queue (and simple QueueingConsumer) for each web request. I've tested this, it works and Tomcat shuts down as it should. But I'm wondering if this is the way to go.. Can a RabbitMQ cluster deal with thousands (or even millions) of short living queues/consumers? I can imagine queues aren't that big, but still.. constantly broadcasting to all cluster nodes.. the total memory footprint..
Question
So in short, is it wise do create a queue for each incoming web request or how should I setup RabbitMQ with one queue and consumer so Tomcat can shutdown gracefully?
I found a solution for my problem:
The Java client is creating his own threads. There is the possibility to add your own ExecutorService when creating a new connection. Doing so in the ServletContextListener.initialized() method, one can keep track of the ExecutorService and shut it down manually in the ServletContextListener.destroyed() method.
executorService.shutdown();
executorService.awaitTermination(20, TimeUnit.SECONDS);
I used Executors.newCachedThreadPool(); as the threads have many short executions, and they get cleaned up when being idle for more then 60s.
This is the link to the RabbitMQ Google group thread (thx to Michael Klishin for showing me the right direction)
I have 4 queues in ActiveMQ and messages from each queue should be sent out to external service, for picking up the messages from queue I am using Apache Camel and I am throttling the messages.
But my problem here is for different queues I have different social hours. For e.g.
Queue 1 messages should be sent only between 6 AM to 5 PM,
Queue 2 messages should be sent only between 10 AM to 10 PM like that.
So I want to know how can we handle this using Apache camel throttling. Or please suggest me some solution.
Let me know if anyone not cleared with my problem. Thanks in advance.
Camel allows you to associate route(s) with route policies. And we have an out of the box policy that is based on camel-quartz and is scheduled based. This allows you to setup policies for opening hours of your routes.
The doc starts here: http://camel.apache.org/routepolicy. And there is links from that page to the the scheduler based policies.
Mind there is a ticket - http://issues.apache.org/jira/browse/CAMEL-5929 - about if you restart the app server, then the route is not started if you start within the opening hours. eg your have 12pm-6pm. And you restart the app at 3pm (eg in between). Then the route i started on the next day. The ticket is there to allow you to configure to force start if being started within the opening window.
Set up one route per queue/interval.
Use Quartz timers triggered on those hours that should start/stop the routes.
You can let the Quartz routes use the control bus pattern to start/stop the queue routes.