Best way to reuse a Runnable - java

I have a class that implements Runnable and am currently using an Executor as my thread pool to run tasks (indexing documents into Lucene).
executor.execute(new LuceneDocIndexer(doc, writer));
My issue is that my Runnable class creates many Lucene Field objects and I would rather reuse them then create new ones every call. What's the best way to reuse these objects (Field objects are not thread safe so I cannot simple make them static) - should I create my own ThreadFactory? I notice that after a while the program starts to degrade drastically and the only thing I can think of is it's GC overhead. I am currently trying to profile the project to be sure this is even an issue - but for now lets just assume it is.

Your question asks how to reuse a Runnable, so I am going to ignore the other details adn simply answer that question.
If you are using a ThreadPoolExecutor, you can use the [ThreadPoolExecutor#afterExecute][1] method to return the Runnable object to a pool/queue of 'cached' Runnables.
[1]: http://java.sun.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html#afterExecute(java.lang.Runnable, java.lang.Throwable)

A Runnable object is reusable. It is thread object which is not.
Best way ? it is your way :-)
I think it is more a lucene question than a runnable question.

You might want to do some more benchmarking to nail down what's causing your slowdowns.
I'm willing to bet that your problem is not related to the creation of Field instances. Field doesn't have any finalizers, and they're not designed to be pooled.

For now I have decided to just use a simple Producer->Consumer model. I pass a BlockingQueue to each indexer, rather then a document to index, and then have the main driver of the program add new documents to that queue. The Indexers then feed off that [bounded] queue and reuse the Field objects and share the thread-safe IndexWriter.
I did find a place where I was possibly not calling HttpMethod.releaseConnection() so that could have caused my memory issues (uncertain).

Related

Java program for multithread environment -one thread to add data to collection and another thread to access it

I am new to java multi threading. I would like to write a program in which having 2 threads. One thread adds data to collection structure and other thread is trying to access it. How to implement the same. So do we have to use only concurrenthashmap as collection? Could you please help on this
You are trying to ask a very broad topic through this question. There can be many possibilities depending on the use cases.
But still for a simple put and get scenario, I would recommend using https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html
Blocking Queue is a thread safe collection intended to be used for this purpose.

How to synchronize action of multiple threads in EhCache?

How to synchronize the action of multiple threads and make sure that only one thread can access the resource at a given point in time in EhCache.
I'm not sure I understand your question but let's guess. EhCache (3, I don't know about 2) is fully thread-safe.
However, if you really mean that one entry can be used by only one thread at the time, that's not something EhCache will do as this is a specific need. You need to put your own synchronisation on top of it. Using a lock or a semaphore. For example, you could lock on the retrieved value.
I believe this question is close to another one of yours and cache-through could help you.
See https://stackoverflow.com/a/45801562/18591

Efficient multithreaded array building in Java

I have many threads adding result-like objects to an array, and would like to improve the performance of this area by removing synchronization.
To do this, I would like for each thread to instead post their results to a ThreadLocal array - then once processing is complete, I can combine the arrays for the following phase. Unfortunately, for this purpose ThreadLocal has a glaring issue: I cannot combine the collections at the end, as no thread has access the collection of another.
I can work around this by additionally adding each ThreadLocal array to a list next to the ThreadLocal as they are created, so I have all the lists available later on (this will require synchronization but only needs to happen once for each thread), however in order to avoid a memory leak I will have to somehow get all the threads to return at the end to clean up their ThreadLocal cache... I would much rather the simple process of adding a result be transparent, and not require any follow up work beyond simply adding the result.
Is there a programming pattern or existing ThreadLocal-like object which can solve this issue?
You're right, ThreadLocal objects are designed to be only accessible to the current thread. If you want to communicate across threads you cannot use ThreadLocal and should use a thread-safe data structure instead, such as ConcurrentHashMap or ConcurrentLinkedQueue.
For the use case you're describing it would be easy enough to share a ConcurrentLinkedQueue between your threads and have them all write to the queue as needed. Once they're all done (Thread.join() will wait for them to finish) you can read the queue into whatever other data structure you need.

Java Concurrency: should I synchronize all List and Maps?

So I have a SomeTask class which extends Thread, and it has Map and List fields. What would be the behavior when you don't do Collections.synchronizedXXX and you have multiple thread of SomeTask running?
Once a Map is called from the database (I am using Object Database to directly store POJO), would I need to synchronized the Map object returned from this database as well?
Map SomeTasksOwnMap = Collections.synchronizedMap(MapReturnedFromDatabase);
Collections.synchronizedXXX is required when 2 or more Threads are accessing the same Map/List.
If your task doesn't access other tasks Map/List, then there is no need to synchronize them.
Example.
Task 1 builds a list of numbers divisible exactly by 2.
Task 2 builds a list of numbers divisible exactly by 3.
These two tasks have individual lists that do not require synchronization.
Example require synchronization.
Task 1 and 2 both calculate numbers and store them in a shared list.
To answer the questions: "What would be the behavior when you don't", you could lose one of the writes if it was timed that both threads wanted to write to index 'x'.
You may also have a null value in the list as the size of the array was increased before the write to the location was done.
Basically you would have an inconsistent view.
No. There is nothing in your question that suggests synchronization is required, because as far as I can tell each thread reads only data within itself: You only need synchronization when threads access data in other threads.
As an aside, having SomeTask extends Thread is a poor design - it should extends Runnable, then use new Thread(new SomeTask()).start().
... should I synchronize all List and Maps?
No you shouldn't. Synchronizing things that don't need it is a waste of resources. And for things that do need synchronization, you need to do it the right way. (And the synchronizedXxx wrappers are not always the right way.)
First, you need to identify the data structures that are going to be visible to multiple threads. Data structures that are provably thread confined don't need synchronizing at all.
Second, you need to examine the way that the data structures are used to see if a synchronizedXxx wrapper is sufficient. For instance, these wrappers don't synchronize iteration, and you can get into trouble if one thread changes a collection while another one is iterating it.
Finally, you need to think about whether the synchronized data structures are heavily used by different threads. The synchronzedXxx wrappers can result in a performance bottleneck if the data structure is heavily used. If this is the case, you should consider using one of the ConcurrentYyyy classes instead.

Inter-threads communication

The easiest implementation is when we call from single class main method other classes implementing runnable:
public static void main(String [] args){
// declarations ...
receiver.start();
player.start();
}
Say inside receiver I have while loop which receives a packet value and I want to send that value to the second thread. How to do that?
Just to clarify I don't yet care about one thread controlling another, I just want first thread to share values with second.
And tiny question aside - does JDK 7 Fork really dramatically increases performance for java concurrent api?
Thank You For Your Time,
A simple option is to use a java.util.concurrent.atomic.AtomicReference (or one of the other Atomic... classes). Create a single instance of AtomicReference, and pass it to the code that the various threads run, and set the value from the receiver thread. The other thread(s) can then read the value at their leisure, in a thread-safe manner.
does JDK 7 Fork really dramatically increases performance for java concurrent api?
No, it's just a new API to make some things easier. It's not there to make things faster.
The java.util.concurrent -package contains many helpful interfaces and classes for safely communicating between threads. I'm not sure I understand your use-case here, but if your player (producer) is supposed to pass tasks to the receiver (consumer), you could for example use a BlockingQueue -implementation.

Categories