I have object instances of a custom class, and each instance processes messages (via methods) coming through independently for each instance. No instances "talk" to other instances.
My question is, is putting each object in its own thread necessary since each object processes independently real-time messages (logs etc...) coming through anyhow?
Thanks for any responses.
My question is, is putting each object in its own thread necessary
since each object processes independently real-time messages (logs
etc...) coming through anyhow?
You need to process each of the message acquired by each object in new separate thread. This will lead to fast processing of the incoming messages for your object. And since , there is not interaction between each object so no thread synchronization is needed which is good for your application. Or, better that you use pool of threads. Have a look at ThreadPoolExecutor
It is not necessary for each object to have its own thread, however, you may gain improved performance by having more than one message processing thread. The ideal number of threads is not necessarily (or even likely) to be the same as the number of processing objects.
Typically, in a situation like you describe the approach would be to use a task / message processing queue where each object you have adds tasks to the queue, and then multiple threads process items from the queue in order. The number of threads used here is configurable so that the application can be optimized for the platform it is running on.
An easy way to achieve this design is to simply use an ExecutorService as your task queue (in which case your messages themselves must implement Runnable):
// For 2 threads, adjust as appropriate.
ExecutorService executor = Executors.newCachedThreadPool(2);
And then to add a Runnable message:
// Add a message to the queue for concurrent / asynchronous processing
executor.submit(message);
Note that the executor itself should be shared across all of your message handling objects, so that each object is adding messages to the same queue (assuming you have many message handling objects). It is also possible to have a queue per message handling object, but that decision would depend on the number of handling objects and any requirements surrounding how messages are processed.
Related
There are two important fields for controlling the concurrency level in Java GCP PubSub consumer:
Parallel pull count
Number of executor threads
From the official example:
setParallelPullCount determines how many StreamingPull streams the subscriber will open to receive message. It defaults to 1. setExecutorProvider configures an executor for the subscriber to process messages. Here, the subscriber is configured to open 2 streams for receiving messages, each stream creates a new executor with 4 threads to help process the message callbacks. In total 2x4=8 threads are used for message processing.
So parallel pull count, if I'm not mistaken, directly refers to the number of Java executors (=thread pools), and number of executor threads sets the amount of threads per each pool.
Normally I reason about separate thread pools as having different use cases or responsibilities, so we might for example have one unbounded cached thread pool for IO, a fixed thread pool for CPU-bound ops, a single (or low number) threaded pool for async IO notifications, and so on.
But what would be the benefit of having two or more thread pools with identical properties for consuming and processing pubsub messages, compared to simply having a single thread pool with maximum desired number of threads? For example, if I can spare a total of 8 threads on the subscriber, what would be the concrete reason for using 1x8 vs 2x4 combination? (a single pool of 8 threads, versus pull count=2 using 4 threads each)?
The setParallelPullCount option doesn't just refer to the number of Java Executors, it refers to the number of streams created that request messages from the server. The different streams could potentially return a different number of messages due to a variety of factors. One may want to increase parallel pull count in order to process more messages in a single client than can be transmitted on a single stream (10MB/s). This is independent of the choice of whether or not to share executors/thread pools.
Whether or not to share a thread pool across the streams would be handled by calling setExecutorProvider. If you set an ExecutorProvider that returns the same Executor on each call to getExecutor, then the streams share it. If you have it return a new Executor for each call, then they each have their own dedicated Executor. The default ExecutorProvider does the latter.
If one calls setParallelPullCount(X), then setExecutor gets called X times to get an Executor for each stream. The choice between a shared one across all of them or individual ones for each probably doesn't change much the vast majority of the time. If you are trying to keep the number of overall threads relatively low, then sharing a single Executor may be helpful in doing that.
The choice between X Executors with Y threads and one Executor with X*Y threads really comes down to the capability to share such resources if the amount of data coming from each stream is vastly different, which probably isn't going to be the case most of the time. If it is, then a shared Executor means that a particularly saturated stream could "borrow" threads from an unsaturated one. On the other hand, using individual Executors could mean that in such a scenario, messages on the stream with fewer messages are as able to get through as messages on the saturated stream.
I have to write into a file based on the incoming requests. As multiple requests may come simultaneously, I don't want multiple threads trying to overwrite the file content together, which may lead into losing some data.
Hence, I tried collecting all the requests' data using a instance variable of PublishSubject. I subscribed publishSubject during init and this subscription will remain throughout the life-cycle of application. Also I'm observing the same instance on a separate thread (provided by Vertx event loop) which invokes the method responsible for writing the file.
private PublishSubject<FileData> publishSubject = PublishSubject.create();
private void init() {
publishSubject.observeOn(RxHelper.blockingScheduler(vertx)).subscribe(fileData -> writeData(fileData));
}
Later during request handling, I call onNext as below:
handleRequest() {
//do some task
publishSubject.onNext(fileData);
}
I understand that, when I call onNext, the data will be queued up, to be written into the file by the specific thread which was assigned by observeOn operator. However, what I'm trying to understand is
whether this thread gets blocked in WAITING state for only this
task? Or,
will it be used for other activities also when no file
writing happens?
I don't want to end up with one thread from the vertx event loop wasted in waiting state for going with this approach. Also, please suggest any better approach, if available.
Thanks in advance.
Actually RxJava will do it for you, by definition onNext() emissions will act in serial fashion:
Observables must issue notifications to observers serially (not in parallel). They may issue these notifications from different threads, but there must be a formal happens-before relationship between the notifications. (Observable Contract)
So as long as you will run blocking calls inside the onNext() at the subscriber (and will not fork work to a different thread manually) you will be fine, and no parallel writes will be happen.
Actually, you're worries should come from the opposite direction - Backpressure.
You should choose your backpressure strategy here, as if the requests will come faster then you will process them (writing to file) you might overflow the buffer and get into troubles. (consider using Flowable and choose you're backpressure strategy according to your needs.
Regarding your questions, that depends on the Scheduler, you're using RxHelper.blockingScheduler(vertx) which seems like your custom code, so I can't tell, if the scheduler is using shared thread in work queue fashion then it will not stay idle.
Anyhow, Rx will not determine this for you, the scheduler responsibility is to assign the work to some thread according to its logic.
This is more of a Java concurrency design question. I’m working on an application that need to process many messages for many different clients. If two messages have different client names, then they can be processed in parallel. However, if they have the same client name, then they need to be processed in order serially.
What’s the best way to implement this?
My current implementation is pretty simple: I wrote a wrapper class called OrderedExecutorPool. It has a list of single-threaded executors. In its submit method, it does the following to figure out which executor to submit the task to:
int executorNum = Math.abs(clientName.hashCode()) % numExecutors;
executorList.get(executorNum).submit(task);
This ensures that all messages with same clients go to the same executor while still supporting processing messages for different clients in parallel.
There are a couple of problems with this design:
1.) If most client names have same hash code, then only a few executors are doing work
2.) If one client has MANY messages, only one executor may not keep up
Is there an elegant solution to this problem that can fix the shortcomings above?
Edit
clientName is just a String. I'm just invoking the String.hashCode() method on it.
There is no jdk builtin solution that i know of. i've implemented a custom executor solution to this at my current job using this basic logic.
keep an internal map of clientname to work queue (each client has their own queue)
when work comes in for a client, add it to their queue
if this is the first job on the queue, create a Runnable for this clientname/queue and push it into the "real" executor (standard jdk thread pool)
Runnable impl just consumes tasks from a single client queue until empty and then exits
this simple implementation is the "greedy" approach (a client will keep working until its queue is empty). if you have more clients than underlying threads, you may want a more "fair" approach, where a client executes some number of tasks and they re-queues itself in the underlying executor (thus allowing other clients to get some work done).
I have a class that's a listener to a log server. The listener gets notified whenever a log/text is spewed out. I store this text in an arraylist.
I need to process this text (remove duplicate words, store it in a trie, compare it against some patterns etc).
My question is should i be doing this as an when the listener is notified? Or should i be creating a separate thread that handles the processing.
What is the best way to handle this situation?
Sounds like you're trying to solve the Producer Consumer Problem, in which case - Yes, you should be looking at threads.
If, however, you only need to do very basic operations that take less than milliseconds per entry - don't overly complicate things. If you use a TreeSet in conjunction with an ArrayList - it will automatically take care of keeping duplicates out. Simple atomic operations such as validating the log entry aren't such a big deal that they need a seperate thread, unless new text is coming in at such a rapid rate that you need to need a thread to busy itself full time with processing new notifications.
The process that are not related to UI i always run that type of process in separate thread so it will not hang your app screen. So as my point of view you need to go with separate thread.
Such a situation can be solved using Queues. The simplest solution would be to have an unbounded blocking queue (a LinkedTransferQueue is tailored for such a case) and a limited size pool of worker threads.
You would add()/offer() the log entry from the listener's thread and take() for processing with worker threads. take() will block a thread if no log entries are available for processing.
P. S. A LinkedTransferQueue is designed for concurrent usage, no external synchronization is necessary: it's based on weak iterators, just like the Concurrent DS family.
I am also thinking of integrating the disruptor pattern in our application. I am a bit unsure about a few things before I start using the disruptor
I have 3 producers, mainly a FIX thread which de-serialises the requests. Another thread which continously modifies order price as the market moves. Also we have one more thread which is responsible for de-serialising the requests sent from a GUI application. All three threads currently write to a Blocking Queue (hence we see a lot of contention on the queue)
The disruptor talks about a Single writer principle and from what I have read that approach scales the best. Is there any way we could make the above three threads obey the single writer principle?
Also in a typical request/response application, specially in our case we have contention on an in memory cache, as we need to lock the cache when we update the cache with the response, whilst a request might be happening for the same order. How do we handle this through the disruptor, i.e. how do I tie up a response to a particular request? Can I eliminate the lock on the cache if yes how?
Any suggestions/pointers would be highly appreciated. We are currently using Java 1.6
I'm new to distruptor and am trying to understand as much usecases as possible. I have tried to answer your questions.
Yes, Disruptor can be used to sequence calls from multiple
producers. I understand that all 3 threads try to update the state
of a shared object. And a single consumer which takes necessary action on the shared object. Internally you can have the single consumer delegate calls to the appropriate single threaded handler based on responsibility. The
The Disruptor exactly does this. It sequences the calls such that
the state is accessed only by a thread at a time. If there's a specific order in which the event handlers are to be invoked, set up the memory barrier. The latest version of Disruptor has a DSL that lets you setup the order easily.
The Cache can be abstracted and accessed through the Disruptor. At a time, only a
Reader or a Writer would get access to the cache, since all calls to
the cache are sequential.