Issue with multiple threads in Apache Camel

Issue with multiple threads in Apache Camel - java

I'm trying to realize a stress test in a Camel project that receives a key to decrypt some query parameters. However, when I add multiple vusers, the sequence of threads seems to lose the order.
Screenshot:
The thread 7 enter in the middle of the sequence of thread 4, is there anyway to control this? In these cases where the sequence is broken, I can't decrypt data cause in the thread I have other key to open.
I'm using direct: in my route, I've tried to use seda with no concurrentConsumers and the process become to slow, utilizando concurrentConsumers parameter the same error.

I solved using SEDA component and multiple consumers, apparently this component controls the various consumers and only starts consumption when the previous consumer is finished.
My route:
from("seda:route?multipleConsumers=true")
.to("toRoute")

Related

Size of event bus in vert.x

I am using vert.x to read a file and transform and then push to kafka.
I am using 2 verticles, without using any worker thread (I dont want to change the order of logs in the file).
Verticle 1 : Read the file and filter
Verticle 2 : Publish to kafka
Each files contain approximately 120000 lines
However, I observed that after sometime i stop observing logs from verticle 1.
I am suspecting that event bus is getting full, so Consumer is still consuming, but producer thread is waiting for event bus to get empty.
So My questions are
1. What is the default size of event bus? In Docs it says
DEFAULT_ACCEPT_BACKLOG
The default accept backlog = 1024
2. How do I confirm my suspicion that publisher thread is blocked?

VertX uses Netty's SingleThreadEventLoop internally for its event bus, maximum pending tasks allowed is Integer.MAX_VALUE which is probably 2 billion messages.
You may have to try VertxOptions.setWarningExceptionTime(long warningExceptionTime) to set the value lower than default (5sec) to see if there is any warning about blocked thread.

To complement #iwat answer, in the version I am using, it looks like the max size is read from a system property:
protected static final int DEFAULT_MAX_PENDING_TASKS = Math.max(16, SystemPropertyUtil.getInt("io.netty.eventLoop.maxPendingTasks", 2147483647));
So you can control the size of the queues in front of the Verticles by setting that system property.
If the event bus is full (the queue in NioEventLoop reaches the max size), the task will be rejected. So if you hit that, you should start to see error responses to your messages, you should not see any blocked producers.

I'm not sure the accept-backlog setting has any effect on the eventbus, given the documentation it might have something to do with the netserver, but from a short scan of the code I haven't found any use in the eventbus.
The event bus however does deliver the message immediately, messages don't get queued up somewhere (at least that's what I understand from the code). So regarding your first question, it doesn't have any size, at least not when running locally (don't know about the clustered version, but I assume that doesn't apply in your case anyway)
To confirm an (eventloop) thread is actually blocked is easy, there should be tons of exceptions in your log stating the event loop is blocked.
I guess your problem is somewhere else, but that's actually hard to tell without any code or meaningful logs.

I would like to make a question to the comunity and get as many feedbacks as possible about an strategy I have been thinking, oriented to resolve some issues of performance in my project.
The context:
We have an important process that perform 4 steps.
An entity status change and its persistence
If 1 ends OK. Entity is exported into a CSV file.
If 2 ends OK. Entity is exported into another CSV. This one with way more Info.
If 3 ends OK. The last CSV is sent by mail
Steps 1 and 2 are linked and they are critical.
Steps 3 and 4 are not critical. Doesn't even care if they ends successfully.
Performance of 1-2 is fine, but 3-4 in some escenarios are just insanely slow. Mostly cause step 3.
If we execute all the steps as a sequence, some times step 3 causes a timeout. Client do not get any response about steps 1 and 2 (the important ones) and user don't know whats going on.
This case made me think in JMS queues in order to delegate the last 2 steps to another app/process. Deallocate the notification from the business logic. Second export and mailing will be processed when posible and probably in parallel. I could also split it in 2 queues: exports, mail notification.
Our webapp runs into a WebLogic 11 cluster, so I could use its implementation.
What do you think about the strategy? Is WebLogic JMS implementation anything good? Should I check another implementation? ActiveMQ, RabbitMQ,...
I have also thinking on tiketing system implementation with spring-tasks.
At this point I have to point at spring-batch. Its usage is limited. We have already so many jobs focused on important processes of data consolidation and the window of time for allocation of more jobs is limited. Plus the impact of to try to process all items massively at once.
May be we could if we find out a way to use the multithread of spring-batch but we didn't find yet the way to fit oír requirements into such strategy.
Thank you in advance and excuse my english. I promise to keep working hard on it :-).

One problem to consider is data integrity. If step n fails, does step n-1 need to be reversed? Is there any ordering dependencies that you need to be aware of? And are you writing to the same or different CSV? If the same, then might have contention issues.
Now, back to the original problem. I would consider Java executors, using 4 fixed-sized pools and move the task through the pools as successes occur:
Submit step 1 to pool 1, getting a Future back, which will be used to check for completion.
When step 1 completes, you submit step 2 to pool 2.
When step 2 completes, you now can return a result to the caller. The call to this point has been waiting (likely with a timeout so it doesn't hang around forever) but now the critical tasks are done.
After returning to the client, submit step 3 to pool 3.
When step 3 completes, submit step to pool 4.
The pools themselves, while fixed sized, could be larger for pool 1/2 to get maximum throughput (and to get back to your client as quickly as possible) and pool 3/4 could be smaller but still large enough to get the work done.
You could do something similar with JMS, but the issues are similar: you need to have multiple listeners or multiple threads per listener so that you can process at an appropriate speed. You could do steps 1/2 synchronously without a pool, but then you don't get some of the thread management that executors give you. You still need to "schedule" steps 3/4 by putting them on the JMS queue and still have listeners to process them.
The ability to recover from server going down is key here, but Executors/ExecutorService has not persistence, so then I'd definitely be looking at JMS (and then I'd be queuing absolutely everything up, even the first 2 steps) but depending on your use case it might be overkill.

Yes, an event-driven approach where a message bus makes the integration sounds good. They are asynch so you will not have timeout. Of course you will need to use a Topic. WLS has some memory issues when you have too many messages in the server, maybe a different server would work better for separation of concerns and resources.

How to set TOPOLOGY_MAX_SPOUT_PENDING parameter

In my topology, I read trigger messages from a Kafka queue. On receiving the trigger message, I need to emit around 4096 messages to a bolt. In the bolt, after some processing it will publish to another Kafka queue (another topology will consume this later).
I'm trying to set TOPOLOGY_MAX_SPOUT_PENDING parameter to throttle the number of messages going to bolt. But I see it is having no effect. Is it because I'm emitting all the tuples in one nextTuple() method? If so, what should be the work around?

If you are reading from kafka, you should use the KafkaSpout that comes packed with storm. Don't try to implement your own spout, trust me, I use the KafkaSpout in production and it works very smoothly. Each Kafka message generates exactly one tuple.
And as you can see on this nice page from the manual, you can set the topology.max.spout.pending like this:
Config conf = new Config();
conf.setMaxSpoutPending(5000);
StormSubmitter.submitTopology("mytopology", conf, topology);
The topology.max.spout.pending is set per spout, if you have four spouts you will have a maximum of non-complete tuples inside your topology equal to the number of spouts * topology.max.spout.pending.
Another tip, is that you should use the storm UI to see if the topology.max.spout.pending was set properly.
Remember the topology.max.spout.pending is only the number of tuples not unprocessed inside the topology, the topology will never stop consume messages from kafka, at least on a production system... If you want to consume batches of 4096 you need to implement caching logic on your bolts, or use something else than storm (something micro batch oriented).

To make TOPOLOGY_MAX_SPOUT_PENDING you need to enable fault-tolerance mechanism (ie, assigning message IDs in Spouts and anchor and ack in Bolts). Furthermore, if you emit more than one tuple per call to Spout.nextTuple() TOPOLOGY_MAX_SPOUT_PENDING will not work as expected.
It is actually bad practice for some more reasons so emit more than a single tuple per Spout.nextTuple() call (see Why should I not loop or block in Spout.nextTuple() for more details).

Handling Failed calls on the Consumer end (in a Producer/Consumer Model)

Let me try explaining the situation:
There is a messaging system that we are going to incorporate which could either be a Queue or Topic (JMS terms).
1 ) Producer/Publisher : There is a service A. A produces messages and writes to a Queue/Topic
2 ) Consumer/Subscriber : There is a service B. B asynchronously reads messages from Queue/Topic. B then calls a web service and passes the message to it. The webservice takes significant amount of time to process the message. (This action need not be processed real-time.)
The Message Broker is Tibco
My intention is : Not to miss out processing any message from A. Re-process it at a later point in time in case the processing failed for the first time (perhaps as a batch).
Question:
I was thinking of writing the message to a DB before making a webservice call. If the call succeeds, I would mark the message processed. Otherwise failed. Later, in a cron job, I would process all the requests that had initially failed.
Is writing to a DB a typical way of doing this?

Since you have a fail callback, you can just requeue your Message and have your Consumer/Subscriber pick it up and try again. If it failed because of some problem in the web service and you want to wait X time before trying again then you can do either schedule for the web service to be called at a later date for that specific Message (look into ScheduledExecutorService) or do as you described and use a cron job with some database entries.
If you only want it to try again once per message, then keep an internal counter either with the Message or within a Map<Message, Integer> as a counter for each Message.

Crudely put that is the technique, although there could be out-of-the-box solutions available which you can use. Typical ESB solutions support reliable messaging. Have a look at MuleESB or Apache ActiveMQ as well.

It might be interesting to take advantage of the EMS platform your already have (example 1) instead of building a custom solution (example 2).
But it all depends on the implementation language:
Example 1 - EMS is the "keeper" : If I were to solve such problem with TIBCO BusinessWorks, I would use the "JMS transaction" feature of BW. By encompassing the EMS read and the WS call within the same "group", you ask for them to be both applied, or not at all. If the call failed for some reason, the message would be returned to EMS.
Two problems with this solution : You might not have BW, and the first failed operation would block all the rest of the batch process (that may be the desired behavior).
FYI, I understand it is possible to use such feature in "pure java", but I never tried it : http://www.javaworld.com/javaworld/jw-02-2002/jw-0315-jms.html
Example 2 - A DB is the "keeper" : If you go with your "DB" method, your queue/topic customer continuously drops insert data in a DB, and all records represent a task to be executed. This feels an awful lot like the simple "mapping engine" problem every integration middleware aims to make easier. You could solve this with anything from a custom java code and multiples threads (DB inserter, WS job handlers, etc.) to an EAI middleware (like BW) or even a BPM engine (TIBCO has many solutions for that)
Of course, there are also other vendors... EMS is a JMS standard implementation, as you know.

I would recommend using the built in EMS (& JMS) features,as "guaranteed delivery" is what it's built for ;) - no db needed at all...
You need to be aware that the first decision will be:
do you need to deliver in order? (then only 1 JMS Session and Client Ack mode should be used)
how often and in what reoccuring times do you want to retry? (To not make an infinite loop of a message that couldn't be processed by that web service).
This is independent whatever kind of client you use (TIBCO BW or e.g. Java onMessage() in a MDB).
For "in order" delivery: make shure only 1 JMS Session processes the messages and it uses Client acknolwedge mode. After you process the message sucessfully, you need to acknowledge the message with either calling the JMS API "acknowledge()" method or in TIBCO BW by executing the "commit" activity.
In case of an error you don't execute the acknowledge for the method, so the message will be put back in the Queue for redelivery (you can see how many times it was redelivered in the JMS header).
EMS's Explicit Client Acknolwedge mode also enables you to do the same if order is not important and you need a few client threads to process the message.
For controlling how often the message get's processed use:
max redelivery properties of the EMS queue (e.g. you could put the message in the dead
letter queue afer x redelivery to not hold up other messages)
redelivery delay to put a "pause" in between redelivery. This is useful in case the
Web Service needs to recover after a crash and not gets stormed by the same message again and again in high intervall through redelivery.
Hope that helps
Cheers
Seb

Greedy threads are grabbing too many JMS messages under WebLogic

We encountered a problem under WebLogic 8.1 that we lived with but could never fix. We often queue up a hundred or more JMS messages, each of which represents a unit of work. Despite the fact that each message is of the same size and looks the same, one may take only seconds to complete while the next one represents 20 minutes of solid crunching.
Our problem is that each of the message driven beans we have doing the work of these messages ends up on a thread that seems to grab ten messages at a time (we think it is being done as a WebLogic optimization to keep from having to hit the queue over and over again for small messages). Then, as one thread after another finishes all of its small jobs and no new ones come in, we end up with a single thread log jammed on a long running piece of work with up to nine other items sitting waiting on it to finish, despite the fact that other threads are free and could start on those units of work.
Now we are at a point where we are converting to WebLogic 10 so it is a natural point to return to this problem and find out if there is any solution that we could implement so that either: a) each thread only grabs one JMS message at a time to process and leaves all the others waiting in the incoming queue, or b) it would automatically redistribute waiting messages (even ones already assigned to a particular thread) out to free threads. Any ideas?

Enable the Forward Delay and provide an appropriate value. This will cause the JMS Queue to redistribute messages to it's peers if they have not been processed in the configured time.
Taking a single message off the queue every time might be overkill - It's all a balance on the number of messages you are processing and what you gauge as an issue.
There are also multiple issues with JMS on WebLogic 10 depending on your setup. You can save yourself a lot of time and trouble by using the latest MP right from the start.

when a Thread is in 'starvation' after getting the resources they can able to execute.The threads which are in starvation called as "greedy thread"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.