Multithreaded processing from Java queue(s) - java

Consider an application which uses an in-memory FIFO Java queue to deposit objects that will be subsequently processed by a thread, executing in parallel with many other threads (e.g., as part of a ThreadPool).
Each object must be processed by a "compatible" thread and that compatibility is assured via checking a label associated with the object (i.e., each thread can process certain types of objects, not all of them). If a thread reads an object from the queue and the label it reads is not among the ones it supports, it has to ignore the object.
Some additional characteristics the application has to fulfill:
In-order processing (i.e., all objects with the same label should be processed in the order they are deposited in the queue).
High performance (i.e., capable of processing thousands of rather "heavy" objects per second).
One could use the ConcurrentLinkedQueue, as suggested in an earlier question, but a single queue makes things tricky for segregating the input per label. Alternatively, each thread could be assigned with handling a single label, so it could have its own non-concurrent queue. Or maybe another approach should be followed.
What would be the best way to implement the above specification?

If a thread reads an object from the queue and the label it reads is not among the ones it supports, it has to ignore the object.
Does that mean that some of the tasks that go into the queue will never be performed? And does it mean that there's no way to predict which ones will be ignored and which ones will be processed?
That doesn't sound like a very good design.
If that's not what you meant, then maybe "ignored" was not the right word.
Each object must be processed by a "compatible" thread ... all objects with the same label should be processed in the order they are deposited in the queue
I'm not going to try to guess what could be different about the different kinds of worker threads, but If I was given that requirement, I would have a different thread pool for each different kind of worker. Then it becomes the responsibility of whoever generates the tasks to put them in the right queue.

Why would a thread reject a task based upon the object's label?
The thread that picks up the task from the queue must act like a worker thread. You must design/write in a manner, that the thread picks up the object (or task) from the queue, reads its label, and processes it based on its label.
You must not have a any correspondence between a object type and a thread type. Each thread should be written to handle any object.

Related

Is there a way for a thread to know it has been "interleaved"?

In Java, is there a way for a thread to know it has been "interleaved"?
I would like to send a certain update to my clients (who are handled by individual threads) after their thread has been interleaved by another thread.
In case my use of the term "interleaved" is incorrect, I'm referring to the process where the processor stops running one thread and moves to another one.
So when the processor eventually returns to my thread, I would like a certain update to be sent to my client via the thread.
Apparently there is no simple way to detect that a thread has been interleaved.
Instead, I decided to use an atomic integer to track the amount of updates that were executed by all threads.
I then changed the code within my threads to monitor the amount of changes that had been done (since last notifying the client) and, once a certain threshold had been exceeded, I updated the client.

Trying to understand the mechanics of a synchronous queue

I was trying to read the implementation of Synchronous Queue
It is not so straightforward for me. It seems to be using a linked list where each node is associated with a thread.
And the core part uses a spin loop waiting for tasks to be placed in the queue.
I was wondering why is a spin loop being used instead of something like wait/notify?
Now this way one of the cores is gone due to this constant spin loop, right?
I am trying to understand this point and get a rough understanding of the design of the Synchronous Queue
UPDATE
What is also troubling me is how the waiter threads start/stop.
The point of the SynchronousQueue is to synchronize something which is usually quite asynchronous - one thread placing an item into the queue while another tries to take from it.
The SynchronousQueue is actually not a queue at all. It has no capacity, no internal storage. It only allows taking from the queue when another process is currently trying to put in the queue.
Example:
Process A tries to put in the queue. This blocks for now.
Process B tries to take from the queue. Since someone is trying to put, the item is transferred from A to B, and both are unblocked.
Process B tries to take from the queue, but no one tries to put. So B is now blocked.
Process A now wants to put an item. Now the item is transferred over to B, and A and B are no longer blocked.
About the blocking:
The Sun/Oracle JRE implementation does use polling instead of a wait/notify pattern if you do a timed operation (like "try to take for 1 second"). This makes sense: it periodically retries until the time is up. When you do a non-timed operation (like "take, no matter how long it takes" it does use park, which wakes again if the situation has changed. In neither situation would one of your cores be constantly busy spinning a loop. The for (;;) means "retry indefinately" in this case, it does not mean "constant spinning".

Multiple threads submitting actions to be done in order

A question on using threads in java (disclaimer - I am not very experienced with threads so please allow some leeway).
Overview:
I was wondering whether there was a way for multiple threads to add actions to be performed to a queue which another thread would take care of. It does not matter really what order - more important that the actions in the queue are taken care of one at a time.
Explanation:
I plan to host a small server (using servlets). I want each connection to a client to be handled by a separate thread (so far ok). However, each of these threads/clients will be making changes to a single xml file. However, the changes cannot be done at the same time.
Question:
Could I have each thread submit the changes to be made to a queue which another thread will continuously manage? As I said it does not matter on the order of the changes, just that they do not happen at the same time.
Also, please advise if this is not the best way to do this.
Thank you very much.
This is a reasonable approach. Use an unbounded BlockingQueue (e.g. a LinkedBlockingQueue) - the thread performing IO on the XML file calls take on the queue to remove the next message (blocking if the queue is empty) then processing the message to modify the XML file, while the threads submitting changes to the XML file will call offer on the queue in order to add their messages to it. The BlockingQueue is thread-safe, so there's no need for your threads to perform synchronization on it.
You could have the threads submit tasks to an ExecutorService that has only one thread. Or you could have a lock that allows only one thread to alter the file at once. The later seems more natural, as the file is a shared resource. The queue is the implied queue of threads awaiting a lock.
The Executor interface provides the abstraction you need:
An object that executes submitted Runnable tasks. This interface provides a way of decoupling task submission from the mechanics of how each task will be run, including details of thread use, scheduling, etc. An Executor is normally used instead of explicitly creating threads."
A single-threaded executor service seems like exactly the right tool for the job. See Executors.newSingleThreadExecutor(), whose javadoc says:
Creates an Executor that uses a single worker thread operating off an
unbounded queue. (Note however that if this single thread terminates
due to a failure during execution prior to shutdown, a new one will
take its place if needed to execute subsequent tasks.) Tasks are
guaranteed to execute sequentially, and no more than one task will be
active at any given time. Unlike the otherwise equivalent
newFixedThreadPool(1) the returned executor is guaranteed not to be
reconfigurable to use additional threads.
Note that in a JavaEE context, you need to take into consideration how to terminate the worker thread when your webapp is unloaded. There are other questions here on SO that deal with this.

How do I pull data from another thread or process (Android/Java)

I know of concepts that allow inter-process communication. My program needs to launch a second thread. I know how to pass or "push" data from one thread to another from Java/Android, but I have not seen a lot of information regarding "pulling" data. The child thread needs to grab data on the parent thread every so often. How is this done?
Since threads share memory you can just use a thread safe data structure. Refer to java.util.concurrent for some. Everything in that package is designed for multi threaded situations.
In your case you might want to use a LinkedBlockingQueue. This way the parent thread can put things into the queue, and the child thread can grab it off whenever it likes. It also allows the child thread to block if the Queue is empty.
You may be confusing threads and data. Threads are lines of code execution which may operate on some data but they are not data themselves and they do not contain data. Data is contained in memory and threads are executed by CPU (or vm or whatever level you choose).
You access data in the same way whether it is done in threads or not. That is you use variables or object fields etc. But with threads you need to make sure that there are no race conditions which happen when threads concurrently access the same data.
To summarize, if you have an object that has some method executed by thread, you can still get data from this object in regular way as long as you make sure that only one thread does it at the same time.

Parallel-processing in Java; advice needed i.e. on Runnanble/Callable interfaces

Assume that I have a set of objects that need to be analyzed in two different ways, both of which take relatively long time and involve IO-calls, I am trying to figure out how/if I could go about optimizing this part of my software, especially utilizing the multiple processors (the machine i am sitting on for ex is a 8-core i7 which almost never goes above 10% load during execution).
I am quite new to parallel-programming or multi-threading (not sure what the right term is), so I have read some of the prior questions, particularly paying attention to highly voted and informative answers. I am also in the process of going through the Oracle/Sun tutorial on concurrency.
Here's what I thought out so far;
A thread-safe collection holds the objects to be analyzed
As soon as there are objects in the collection (they come a couple at a time from a series of queries), a thread per object is started
Each specific thread takes care of the initial pre-analysis preparations; and then calls on the analyses.
The two analyses are implemented as Runnables/Callables, and thus called on by the thread when necessary.
And my questions are:
Is this a reasonable scheme, if not, how would you go about doing this?
In order to make sure things don't get out of hand, should I implement a ThreadManager or some thing of that sort, which starts and stops threads, and re-distributes them when they are complete? For example, if i have 256 objects to be analyzed, and 16 threads in total, the ThreadManager assigns the first finished thread to the 17th object to be analyzed etc.
Is there a dramatic difference between Runnable/Callable other than the fact that Callable can return a result? Otherwise should I try to implement my own interface, in that case why?
Thanks,
You could use a BlockingQueue implementation to hold your objects and spawn your threads from there. This interface is based on the producer-consumer principle. The put() method will block if your queue is full until there is some more space and the take() method will block if the queue is empty until there are some objects again in the queue.
An ExecutorService can help you manage your pool of threads.
If you are awaiting a result from your spawned threads then Callable interface is a good idea to use since you can start the computation earlier and work in your code assuming the results in Future-s. As far as the differencies with the Runnable interface, from the Callable javadoc:
The Callable interface is similar to Runnable, in that both are designed for classes whose instances are potentially executed by another thread. A Runnable, however, does not return a result and cannot throw a checked exception.
Some general things you need to consider in your quest for java concurrency:
Visibility is not coming by defacto. volatile, AtomicReference and other objects in the java.util.concurrent.atomic package are your friends.
You need to carefully ensure atomicity of compound actions using synchronization and locks.
Your idea is basically sound. However, rather than creating threads directly, or indirectly through some kind of ThreadManager of your own design, use an Executor from Java's concurrency package. It does everything you need, and other people have already taken the time to write and debug it. An executor manages a queue of tasks, so you don't need to worry about providing the threadsafe queue yourself either.
There's no difference between Callable and Runnable except that the former returns a value. Executors will handle both, and ready them the same.
It's not clear to me whether you're planning to make the preparation step a separate task to the analyses, or fold it into one of them, with that task spawning the other analysis task halfway through. I can't think of any reason to strongly prefer one to the other, but it's a choice you should think about.
The Executors provides factory methods for creating thread pools. Specifically Executors#newFixedThreadPool(int nThreads) creates a thread pool with a fixed size that utilizes an unbounded queue. Also if a thread terminates due to a failure then a new thread will be replaced in its place. So in your specific example of 256 tasks and 16 threads you would call
// create pool
ExecutorService threadPool = Executors.newFixedThreadPool(16);
// submit task.
Runnable task = new Runnable(){};;
threadPool.submit(task);
The important question is determining the proper number of threads for you thread pool. See if this helps Efficient Number of Threads
Sounds reasonable, but it's not as trivial to implement as it may seem.
Maybe you should check the jsr166y project.
That's probably the easiest solution to your problem.

Categories