Suggestions on patterns for handling the following scenario:
A single thread that dispatches events to consumers. There is a 1:1 between each event and a consumer (each event is dispatched to a single consumer based on event/consumer id match).
Consumers process events at varying speeds and can consume events in configurable batch sizes (e.g. a consumer could consume 20 events at a time).
The producer thread should always be able to dispatch events to consumers that are capable of consuming. Each consumer maintains a queue of events it has consumed (possibly in batch) and processes these on its own thread, so the hand-off from producer to consumer is asynchronous.
If no consumers can consume at any point in time, what should happen to the dispatch thread?
yield() it
wait() & force consumers to call notify() on it
sleep() for a fixed time period
spin
Any reason to prefer one over the other?
Some pros & cons:
yield is simple
forcing consumers to call notify adds complexity
sleep for a fixed time would suit for non time sensitive requirements
spinning eats up a CPU, unnecessary unless we need as fast as possible event delivery
Any other considerations?
Another way you should consider would be writing it to a BlockingQueue. Let the queue manage requests sent without listeners.
Even better: write a Broker that owns a BlockingQueue and maintains a List of Consumers. Have the Broker notify the List of Consumers when a Producer sends a new Event.
I'd use the PropertyChangeListener and EventObject built into Java Beans since JDK 1.0 to do this in memory.
a) You could choose yield but depending on how good the environment is, this could essentially become a no-op. So this would essentially have the same result as spinning.
b) Sleep is an easy choice but then you should come up with how long to sleep. Doing sleep(0) also will not help as it will be same as doing (a)
The force of notification is more complicated but you have complete control of your flow.
Take a look at JMS. JMS is designed to handle exactly this kind of use case.
A full scale JMS installation might be overkill in your scenario – you don't provide enough information.
Related
I have to write into a file based on the incoming requests. As multiple requests may come simultaneously, I don't want multiple threads trying to overwrite the file content together, which may lead into losing some data.
Hence, I tried collecting all the requests' data using a instance variable of PublishSubject. I subscribed publishSubject during init and this subscription will remain throughout the life-cycle of application. Also I'm observing the same instance on a separate thread (provided by Vertx event loop) which invokes the method responsible for writing the file.
private PublishSubject<FileData> publishSubject = PublishSubject.create();
private void init() {
publishSubject.observeOn(RxHelper.blockingScheduler(vertx)).subscribe(fileData -> writeData(fileData));
}
Later during request handling, I call onNext as below:
handleRequest() {
//do some task
publishSubject.onNext(fileData);
}
I understand that, when I call onNext, the data will be queued up, to be written into the file by the specific thread which was assigned by observeOn operator. However, what I'm trying to understand is
whether this thread gets blocked in WAITING state for only this
task? Or,
will it be used for other activities also when no file
writing happens?
I don't want to end up with one thread from the vertx event loop wasted in waiting state for going with this approach. Also, please suggest any better approach, if available.
Thanks in advance.
Actually RxJava will do it for you, by definition onNext() emissions will act in serial fashion:
Observables must issue notifications to observers serially (not in parallel). They may issue these notifications from different threads, but there must be a formal happens-before relationship between the notifications. (Observable Contract)
So as long as you will run blocking calls inside the onNext() at the subscriber (and will not fork work to a different thread manually) you will be fine, and no parallel writes will be happen.
Actually, you're worries should come from the opposite direction - Backpressure.
You should choose your backpressure strategy here, as if the requests will come faster then you will process them (writing to file) you might overflow the buffer and get into troubles. (consider using Flowable and choose you're backpressure strategy according to your needs.
Regarding your questions, that depends on the Scheduler, you're using RxHelper.blockingScheduler(vertx) which seems like your custom code, so I can't tell, if the scheduler is using shared thread in work queue fashion then it will not stay idle.
Anyhow, Rx will not determine this for you, the scheduler responsibility is to assign the work to some thread according to its logic.
I'd like a quick confirmation of what I suspect this part of the RabbitMQ documentation says:
Callbacks to Consumers are dispatched on a thread separate from the thread managed by the Connection. This means that Consumers can safely call blocking methods on the Connection or Channel, such as queueDeclare, txCommit, basicCancel or basicPublish.
Each Channel has its own dispatch thread. For the most common use case of one Consumer per Channel, this means Consumers do not hold up other Consumers. If you have multiple Consumers per Channel be aware that a long-running Consumer may hold up dispatch of callbacks to other Consumers on that Channel.
I have various commands (messages) coming in through a single inbound queue and channel which has a DefaultConsumer attached to it. Is it correct to assume that there is a threadpool in DefaultConsumer that lets me run application logic straight off the consumer callback method, and I'm not blocking the processing of later commands? And that if it seems like there's a bottleneck, I can just give RMQ a bigger threadpool?
In addition, occasionally there is a basicPublish to the same channel from other threads. I take it that this does hold up the consumers? I guess I should make use of a new channel when doing this?
The thread pool you mentioned is not a part of DefaultConsumer but rather a part of Connection that is shared between its Channels and DefaultConsumers. It allows different consumers be invoked in parallel. See this part of the guide.
So you would expect that by increasing size of the thread pool you can reach higher level of parallelism. However that's not the only factor that influences it.
There's a big caveat: incoming messages flowing though a single channel are processed serially no matter how many threads you have in the thread pool. It's just the way how ConsumerWorkService is implemented.
So to be able to consume incoming messages concurrently you have either to manage multiple channels or to put those messages into a separate thread pool.
Publishes do not use threads from the Connections's thread pool so they do not hold up consumers.
For more details you may check this post.
I have to write heavy load system, with pretty easy task to do. So i decided to split this tasks into multiple workers in different locations (or clouds). To communicate i want to use rabbitmq queue.
In my system there will be two kinds of software nodes: schedulers and workers. Schedulers will take user input from queue_input, split it into smaller task and put this smaller task into workers_queue. Workers reads this queue and 'do the thing'. I used round-robbin load balancing here - and all works pretty well, as long, as some worker crashed. Then i loose information about task completion (it's not allowed to do single operation twice, each task contains a pack of 50 iterations of doing worker-code with diffirent data).
I consider something like technical_queue - another channel to scheduler-worker communication, and I wonder, how to design it in a good way. I used tutorials from rabbitmq page, so my worker thread looks like :
while(true) {
message = consume(QUEUE,...);
handle(message); //do 50 simple tasks in loop for data in message
}
How can i handle second queue? Another thread we some while(true) {} loop?, or is there a better sollution to this? Maybe should I reuse existing queue with topic exchange? (but i wanted to have independent way of communication, while handling the task, which may take some time.
You should probably take a look at spring-amqp (doc). I hate to tell you to add a layer but that spring library takes care of the threading issues and management of threads with its SimpleMessageListenerContainer. Each container goes to a queue and you can specify # of threads (ie workers) per queue.
Alternatively you can make your own using an ExecutorService but you will probably end up rewriting what SimpleMessageListenerContainer does. Also you just could execute (via OS or batch scripts) more processes and that will add more consumers to each queue.
As far as queue topology is concerned it is entirely dependent on business logic/concerns and generally less on performance needs. More often you had more queues for business reasons and more workers for performance reasons but if a queue gets backed up with the same type of message considering giving that type of message its own queue. What your describing sounds like two queues with multiple consumer on your worker queue.
Other than the threading issue and queue topology I'm not entirely sure what else you are asking.
I would recommend you create a second queue consumer
consumer1 -> queue_process
consumer2 -> queue_process
Both consumers should make listening to the same queue.
Greetings I hope will help
I have an application which applies the Producer-Consumer design pattern. IT is written in java. in short, the producers put items in a blocking queue and the consumers takes them from there. the consumers should run until signaled by a producer to stop.
what is the neatest way to deliver this signal from producers to the consumers? the chief designer said he wants to keep producer and consumer separate but I dont see any other other than invoking a method on consumer thread pool?
The Chief Programmer is right. Keeping them separate leads to highly decoupled code which is excellent.
There are several ways to do this. One of them is called Poison Pill. Here's how it works - place a known item on the Queue when the Consumer see that item, they kill themselves or take another action.
This can be tricky if there are multiple Consumers (you mentioned ThreadPool) or bounded Queues. Please look this up in Java Concurrency in Practice by Joshua Bloch. He explained it best.
Send a cancel message through the queue. Your consumers' run methods would look like
while(true) {
Message message = queue.take();
if(message == Message.Cancel) {
queue.offer(message); // so that the other consumers can read the Cancel message
break;
}
}
Create a ConsumerHalter class. Register all consumers that wants to get data from queue to the ConsumerHalter class, and have producer trigger a halt event in the ConsumerHalther class. The ConsumerHalter class then calls onStopConsuming() of each consumers.
I'm new to Akka, and I apologize in advance if this is a basic question. I'm not sure how to use actors to implement the following scenario, or if it's even possible (or desirable).
I have a number of actors (i.e. producers) responsible for maintaining certain pieces of state concurrently, all of which notify another actor (i.e. consumer) when changes occur.
The consumer is to run a certain task repeatedly, a task that requires state from all producers just to start. It must also respond to changes in the state when it receives messages from the producers.
Before considering Akka, I'd kind of rolled my own simple actor model, with each actor running in its own thread. The run() methods would monitor an event queue, so I could have the consumer continually do something similar to this:
while not done
poll the event queue
if something was polled
process the event
if all state is available
do one step of the long running task
The continual polling of the event queue didn't sit well with me, but it at least made progress on the long running task between events.
Is there a best way to use Akka actors to implement this? I could implement a "heartbeat" which sends a message to the consumer (or sent internally to the consumer by itself) to do another step of the long running task, but I don't like the thought of having it be scheduled since the steps in the long running task aren't uniform in duration. I don't want to queue up iterations which keep it too busy so not to quickly respond to messages from the producer. But I also don't want to schedule it too infrequently, so it's sitting idle when it could be making progress...
Or would it be more appropriate to use a Dataflow model of concurrency for this (something I've only ready about)? The consumer can't start until the state is all bound, so it seems natural to define the process in terms of Dataflow variables. But if Dataflow variables can only be bound once, it doesn't seem appropriate for getting repeated updates in state from the consumers.
You can have the produces publish the changes to an Akka EventBus, and have the consumer register to listen for these events, then when it has all it needs, it can process the full chunk, or spawn a new actor that processes the full chunks.