Real world example of drainTo method behavior - java

I was going through the javadocs and source code for drainTo method present in BlockingQueue interface and LinkedBlockingQueue implementation of the same. My understanding of this method after looking at the source (JDK7), is that the calling thread actually submits a Collection and afterwards acquires a takeLock(), which blocks other consumers. After that till the count of max elements, the items of the nodes are removed from the queue and put in a collection.
What I could appreciate is that it saves the threads from acquiring locks again and again, but pardon my limited knowledge, I could not appreciate the need for the same in real world examples. Could some one please share some real world examples where drainTo behavior is observable ?

Well, I used it in real life code and it looked quite natural to me: a background database thread creates items and puts them into a queue in a loop until either the end of data is reached or a stop signal is detected. On the first item a UI updater is launched using EventQueue.invokeLater. Due to the asynchronous nature and some overhead in this invokeLater mechanism, it will take some time until the UI updater comes to the point where it queries the queue and most likely more than one item may be available.
So it will use drainTo to get all items that are available at this specific point and update a ListDataModel which produces a single event for the added interval. The next update can be triggered using another invokeLater or using a Timer. So drainTo has the semantic of “gimme all items arrived since the last call” here.
On the other hand, polling the queue for single items could lead to a situation that producer and consumer are blocking each other for a short time and every time the consumer asks for a new item, another item is available due to the fact that the consumer has been blocked just long enough for the producer to create and put a new item. So you have to implement your own time limit to avoid blocking the UI thread too long in this case. Using drainTo once and release the event handling thread afterwards is much easier.

Related

Vertx 3.4: Difference between running code immediately and context.runOnContext

I am trying to figure out the basics of Vertx. I was going through standard doc on it here, where I stumbled upon a section on context object. It says that it lets you run your code later by providing a method called runOnContext. The thing I don't understand is, in which case would I choose to invoke a (non-blocking) block of code later? If the code is non-blocking, it will take same amount of time, whether you execute it now or later.
Can anyone please tell me, in which case, context.runOnContext will be helpful?
Most often it will be helpful if you call it from another thread. It will schedule a task for execution by the event loop bound to this context.
If you're already on the event loop, you may also use it when you read items from a queue: instead of processing all items as a single event, you would schedule an event per item in the queue. That would give other kind of events (network, filesystem) a chance to be processed earlier.

Query on RxJava observeOn scheduler thread

I have to write into a file based on the incoming requests. As multiple requests may come simultaneously, I don't want multiple threads trying to overwrite the file content together, which may lead into losing some data.
Hence, I tried collecting all the requests' data using a instance variable of PublishSubject. I subscribed publishSubject during init and this subscription will remain throughout the life-cycle of application. Also I'm observing the same instance on a separate thread (provided by Vertx event loop) which invokes the method responsible for writing the file.
private PublishSubject<FileData> publishSubject = PublishSubject.create();
private void init() {
publishSubject.observeOn(RxHelper.blockingScheduler(vertx)).subscribe(fileData -> writeData(fileData));
}
Later during request handling, I call onNext as below:
handleRequest() {
//do some task
publishSubject.onNext(fileData);
}
I understand that, when I call onNext, the data will be queued up, to be written into the file by the specific thread which was assigned by observeOn operator. However, what I'm trying to understand is
whether this thread gets blocked in WAITING state for only this
task? Or,
will it be used for other activities also when no file
writing happens?
I don't want to end up with one thread from the vertx event loop wasted in waiting state for going with this approach. Also, please suggest any better approach, if available.
Thanks in advance.
Actually RxJava will do it for you, by definition onNext() emissions will act in serial fashion:
Observables must issue notifications to observers serially (not in parallel). They may issue these notifications from different threads, but there must be a formal happens-before relationship between the notifications. (Observable Contract)
So as long as you will run blocking calls inside the onNext() at the subscriber (and will not fork work to a different thread manually) you will be fine, and no parallel writes will be happen.
Actually, you're worries should come from the opposite direction - Backpressure.
You should choose your backpressure strategy here, as if the requests will come faster then you will process them (writing to file) you might overflow the buffer and get into troubles. (consider using Flowable and choose you're backpressure strategy according to your needs.
Regarding your questions, that depends on the Scheduler, you're using RxHelper.blockingScheduler(vertx) which seems like your custom code, so I can't tell, if the scheduler is using shared thread in work queue fashion then it will not stay idle.
Anyhow, Rx will not determine this for you, the scheduler responsibility is to assign the work to some thread according to its logic.

Continuous parsing and processing of text

I have a class that's a listener to a log server. The listener gets notified whenever a log/text is spewed out. I store this text in an arraylist.
I need to process this text (remove duplicate words, store it in a trie, compare it against some patterns etc).
My question is should i be doing this as an when the listener is notified? Or should i be creating a separate thread that handles the processing.
What is the best way to handle this situation?
Sounds like you're trying to solve the Producer Consumer Problem, in which case - Yes, you should be looking at threads.
If, however, you only need to do very basic operations that take less than milliseconds per entry - don't overly complicate things. If you use a TreeSet in conjunction with an ArrayList - it will automatically take care of keeping duplicates out. Simple atomic operations such as validating the log entry aren't such a big deal that they need a seperate thread, unless new text is coming in at such a rapid rate that you need to need a thread to busy itself full time with processing new notifications.
The process that are not related to UI i always run that type of process in separate thread so it will not hang your app screen. So as my point of view you need to go with separate thread.
Such a situation can be solved using Queues. The simplest solution would be to have an unbounded blocking queue (a LinkedTransferQueue is tailored for such a case) and a limited size pool of worker threads.
You would add()/offer() the log entry from the listener's thread and take() for processing with worker threads. take() will block a thread if no log entries are available for processing.
P. S. A LinkedTransferQueue is designed for concurrent usage, no external synchronization is necessary: it's based on weak iterators, just like the Concurrent DS family.

producer-consumer: how to know inform that prodcution completed

i have the following situation:
Read data from database
do work "calculation"
write result to database
I have a thread that reads from the database and puts the generated objects into a BlockingQueue. These objects are extremely heavy weight hence the queue to limit amount of objects in memory.
A multiple threads take objects from the Queue, performs work and put the results in a second queue.
The final thread takes results from second queue and saves result to database.
The problem is how to prevent deadlocks, eg. the "calculation threads" need to know when no more objects will be put into the queue.
Currently I achieve this by passing a references of the threads (callable) to each other and checking thread.isDone() before a poll or offer and then if the element is null. I also check the size of the queue, as long as there are elements in it, the must be consumed. Using take or put leads to deadlocks.
Is there a simpler way to achieve this?
One of the ways to accomplish would be to put a "dummy" or "poison" message as the last message on the queue when you are sure that no more tasks are going to arrive on the queue.. for example after putting the message related to the last row of the db query. So the producer puts a dummy message on the queue, the consumer on receiving this dummy message knows that no more meaningful work is expected in this batch.
Maybe you should take a look at CompletionService
It is designed to combine executor and a queue functionality in one.
Tasks which completed execution will be available from the completions service via
completionServiceInstance.take()
You can then again use another executor for 3. i.e. fill DB with the results, which you will feed with the results taken from the completionServiceInstance.

Multiple SingleThreadExecutors for a given application...a good idea?

This question is about the fallouts of using SingleThreadExecutor (JDK 1.6). Related questions have been asked and answered in this forum before, but I believe the situation I am facing, is a bit different.
Various components of the application (let's call the components C1, C2, C3 etc.) generate (outbound) messages, mostly in response to messages (inbound) that they receive from other components. These outbound messages are kept in queues which are usually ArrayBlockingQueue instances - fairly standard practice perhaps. However, the outbound messages must be processed in the order they are added. I guess use of a SingleThreadExector is the obvious answer here. We end up having a 1:1 situation - one SingleThreadExecutor for one queue (which is dedicated to messages emanating from one component).
Now, the number of components (C1,C2,C3...) is unknown at a given moment. They will come into existence depending on the need of the users (and will be eventually disposed of too). We are talking about 200-300 such components at the peak load. Following the 1:1 design principle stated above, we are going to arrange for 200 SingleThreadExecutors. This is the source of my query here.
I am uncomfortable with the thought of having to create so many SingleThreadExecutors. I would rather try and use a pool of SingleThreadExecutors, if that makes sense and is plausible (any ready-made, seen-before classes/patterns?). I have read many posts on recommended use of SingleThreadExecutor here, but what about a pool of the same?
What do learned women and men here think? I would like to be directed, corrected or simply, admonished :-).
If your requirement is that the messages be processed in the order that they're posted, then you want one and only one SingleThreadExecutor. If you have multiple executors, then messages will be processed out-of-order across the set of executors.
If messages need only be processed in the order that they're received for a single producer, then it makes sense to have one executor per producer. If you try pooling executors, then you're going to have to put a lot of work into ensuring affinity between producer and executor.
Since you indicate that your producers will have defined lifetimes, one thing that you have to ensure is that you properly shut down your executors when they're done.
Messaging and batch jobs is something that has been solved time and time again. I suggest not attempting to solve it again. Instead, look into Quartz, which maintains thread pools, persisting tasks in a database etc. Or, maybe even better look into JMS/ActiveMQ. But, at the very least look into Quartz, if you have not already. Oh, and Spring makes working with Quartz so much easier...
I don't see any problem there. Essentially you have independent queues and each has to be drained sequentially, one thread for each is a natural design. Anything else you can come up with are essentially the same. As an example, when Java NIO first came out, frameworks were written trying to take advantage of it and get away from the thread-per-request model. In the end some authors admitted that to provide a good programming model they are just reimplementing threading all over again.
It's impossible to say whether 300 or even 3000 threads will cause any issues without knowing more about your application. I strongly recommend that you should profile your application before adding more complexity
The first thing that you should check is that number of concurrently running threads should not be much higher than number of cores available to run those threads. The more active threads you have, the more time is wasted managing those threads (context switch is expensive) and the less work gets done.
The easiest way to limit number of running threads is to use semaphore. Acquire semaphore before starting work and release it after the work is done.
Unfortunately limiting number of running threads may not be enough. While it may help, overhead may still be to great, if time spent per context switch is major part of total cost of one unit of work. In this scenario, often the most efficient way is to have fixed number of queues. You get queue from global pool of queues when component initializes using algorithm such as round-robin for queue selection.
If you are in one of those unfortunate cases where most obvious solutions do not work, I would start with something relatively simple: one thread pool, one concurrent queue, lock, list of queues and temporary queue for each thread in pool.
Posting work to queue is simple: add payload and identity of producer.
Processing is relatively straightforward as well. First you get get next item from queue. Then you acquire the lock. While you have lock in place, you check if any of other threads is running task for same producer. If not, you register thread by adding a temporary queue to list of queues. Otherwise you add task to existing temporary queue. Finally you release the lock. Now you either run the task or poll for next and start over depending on whether current thread was registered to run tasks. After running the task, you get lock again and see, if there is more work to be done in temporary queue. If not, remove queue from list. Otherwise get next task. Finally you release the lock. Again, you choose whether to run the task or to start over.

Categories