Using ThreadLocal vs. a HashMap with Thread as a key

Using ThreadLocal vs. a HashMap with Thread as a key - java

I would like to reuse instances of non-thread safe classes for performance reasons in a Servlet. I have two options,
use ThreadLocal where in Java takes care of doing the instance management per thread
use a static HashMap which uses Thread as the HashMap key and the instances are managed at this level
With the ThreadLocal approach there are potentials for memory leaks esp in Servlet enviornment. Because of this, I am thinking of using the 2nd option, I was wondering if anyone has experience in using this approach and any pitfalls of using the same?

Prefer the ThreadLocal approach because it is likely synchronized (or better yet, requires no synchronization) at the correct granularity and no larger.
If you roll your own solution using HashMap you'll have to acquire a lock over the HashMap every time you want to access any thread-local data. Why? Because a new thread could be created and threads can die. These are implicitly adding/removing items from a HashMap, which require synchronization on the full HashMap. You'll also have quite the time keeping object lifetimes straight because HashMap will keep alive all items it contains as long as it is referable from any thread. That is not how ThreadLocal store behaves.

The problem is not ThreadLocal itself, but the way it's being used. See here for a detailed explanation. So, your own implementation won't make a difference.

Related

MultiThreading : Map Performance

In multiThreading I want to use a map which will be updated, which Map will be better considering the performance 1. HashMap 2. ConcurrentHashMap? also, will it perform slow if i make it volatile?
It is going to be used in a Java batch for approx. 20Million records.
Currently i am not sharing this map among threads.
will sharing the map among threads reduce performance?

HashMap will be better performance-wise, as it is not synchronized in any way. ConcurrentHashMap adds overhead to manage concurrent read and - especially - concurrent write access.
That being said, in a multithreaded environment, you are responsible for synchronizing access to HashMap as needed, which will cost performance, too.
Therefore, I would go for HashMap only if the use case allows for very specific optimization of the synchronization logic. Otherwise, ConcurrentHashMap will save you a lot of time working out the synchronization.
However, please note that even with ConcurrentHashMap you will need to carefully consider what level of synchronization you need. ConcurrentHashMap is thread-safe, but not fully synchronized. For instance, if you absolutely need to synchronize each read access with each write access, you will still need custom logic, since for a read operation ConcurrentHashMap will provide the state after the last successfully finished write operation. That is, there might still be an ongoing write operation which will not be seen by the read.
As for volatile, this only ensures that changes to that particular field will be synchronized between threads. Since you will likely not change the reference to the HashMap / ConcurrentHashMap, but work on the instance, the performance overhead will be negligible.

Should I use ThreadLocal in this high traffic multi-threaded scenario?

Scenario
We are developing an API that will handle around 2-3 million hits per hour in a multi-threaded environment. The server is Apache Tomcat 7.0.64.
We have a custom object with lot of data let's call it XYZDataContext. When a new request comes in we associate XYZDataContext object to the request context. One XYZDataContext object per request. We will be spawning various threads in parallel to serve that request to collect/process data from/into XYZDataContext object. Our threads that will process things in parallel need access to this XYZDataContext object and
to avoid passing around of this object everywhere in the application, to various objects/methods/threads,
we are thinking to make it a threadlocal. Threads will use data from XYZDataContext object and will also update data in this object.
When the thread finishes we are planning to merge the data from the updated XYZDataContext object in the spawned child thread into the main thread's XYZDataContext object.
My questions:
Is this a good approach?
Threadpool risks - Tomcat server will maintain a threadpool and I read that using threadlocal with thread pools is a disaster because thread is not GCed per say and is reused so the references to the threadlocal objects will not get GCed and will result in storing huge objects in memory that we don't need anymore eventually resulting into OutOfMemory issues...
UNLESS they are referenced as weak references so that get GCed immediately.
We're using Java 1.7 open JDK. I saw the source code for ThreadLocal and the although the ThreadLocalMap.Entry is a weakreference it's not associated with a ReferenceQueue, and the comment for Entry constructor says "since reference queues are not used, stale entries are guaranteed to be removed only when the table starts running out of space."
I guess this works great in case of caches but is not the best thing in our case. I would like that the threadlocal XYZDataContext object be GCed immediately. Will the ThreadLocal.remove() method be effective here?
Is there any way to enforce emptying the space in the next GC run?
This is a right scenario to use ThreadLocal objects? Or are we abusing the threadlocal concept and using it where it shouldn't be used?

My gut feeling tells me you're on the wrong path. Since you already have a central context object (one for all threads) and you want to access it from multiple threads at the same time I would go with a Singleton hosting the context object and providing threadsafe methods to access it.
Instead of manipulating multiple properties of your context object, I would strongly suggest to do all manipulations at the same time. Best would be if you pass only one object containing all the properties you want to change in your context object.
e.g
Singleton.getInstance().adjustContext(ContextAdjuster contextAdjuster)
You might also want to consider using a threadsafe queue, filling it up with ContextAdjuster objects from your threads and finally processing it in the Context's thread.
Google for things like Concurrent, Blocking and Nonblocking Queue in Java. I am sure you'll find tons of example code.

Efficient multithreaded array building in Java

I have many threads adding result-like objects to an array, and would like to improve the performance of this area by removing synchronization.
To do this, I would like for each thread to instead post their results to a ThreadLocal array - then once processing is complete, I can combine the arrays for the following phase. Unfortunately, for this purpose ThreadLocal has a glaring issue: I cannot combine the collections at the end, as no thread has access the collection of another.
I can work around this by additionally adding each ThreadLocal array to a list next to the ThreadLocal as they are created, so I have all the lists available later on (this will require synchronization but only needs to happen once for each thread), however in order to avoid a memory leak I will have to somehow get all the threads to return at the end to clean up their ThreadLocal cache... I would much rather the simple process of adding a result be transparent, and not require any follow up work beyond simply adding the result.
Is there a programming pattern or existing ThreadLocal-like object which can solve this issue?

You're right, ThreadLocal objects are designed to be only accessible to the current thread. If you want to communicate across threads you cannot use ThreadLocal and should use a thread-safe data structure instead, such as ConcurrentHashMap or ConcurrentLinkedQueue.
For the use case you're describing it would be easy enough to share a ConcurrentLinkedQueue between your threads and have them all write to the queue as needed. Once they're all done (Thread.join() will wait for them to finish) you can read the queue into whatever other data structure you need.

Thread safe implementation for Hash Map

First, I'll describe what I want and then I'll elaborate on the possibilities I am considering. I don't know which is the best so I want some help.
I have a hash map on which I do read and write operations from a Servlet. Now, since this Servlet is on Tomcat, I need the hash map to be thread safe. Basically, when it is being written to, nothing else should write to it and nothing should be able to read it as well.
I have seen ConcurrentHashMap but noticed its get method is not thread-safe. Then, I have seen locks and something called synchronized.
I want to know which is the most reliable way to do it.

ConcurrentHashMap.get() is thread safe.
You can make HashMap thread safe by wrapping it with Collections.synchronizedMap().

EDIT: removed false information
In any case, the synchronized keyword is a safe bet. It blocks any threads from accessing the object while inside a synchronized block.
// Anything can modify map at this point, making it not thread safe
map.get(0);
as opposed to
// Nothing can modify map until the synchronized block is complete
synchronized(map) {
map.get(0);
}

I would like to suggest you to go with ConcurrentHashMap , the requirement that you have mentioned above ,earlier I also had the same type of requirement for our application but we were little more focused on the performance side.
I ran both ConcurrentHashMap and map returned by Colecctions.synchronizedMap(); , under various types of load and launching multiple threads at a time using JMeter and I monitored them using JProfiler .After all these tests we came to conclusion that that map returned by Colecctions.synchronizedMap() was not as efficient in terms of performance in comaprison to ConcurrentHashMap.
I have written a post also on the same about my experience with both.
Thanks

Collections.synchronizedMap(new HashMap<K, V>);
Returns a synchronized (thread-safe) map backed by the specified map. In order to guarantee serial access, it is critical that all access to the backing map is accomplished through the returned map.
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views:

This is the point of ConcurrentHashMap class. It protects your collection, when you have more than 1 thread.

Java avoid race condition WITHOUT synchronized/lock

In order to avoid race condition, we can synchronize the write and access methods on the shared variables, to lock these variables to other threads.
My question is if there are other (better) ways to avoid race condition? Lock make the program slow.
What I found are:
using Atomic classes, if there is only one shared variable.
using a immutable container for multi shared variables and declare this container object with volatile. (I found this method from book "Java Concurrency in Practice")
I'm not sure if they perform faster than syncnronized way, is there any other better methods?
thanks

Avoid state.
Make your application as stateless as it is possible.
Each thread (sequence of actions) should take a context in the beginning and use this context passing it from method to method as a parameter.
When this technique does not solve all your problems, use the Event-Driven mechanism (+Messaging Queue).
When your code has to share something with other components it throws event (message) to some kind of bus (topic, queue, whatever).
Components can register listeners to listen for events and react appropriately.
In this case there are no race conditions (except inserting events to the queue). If you are using ready-to-use queue and not coding it yourself it should be efficient enough.
Also, take a look at the Actors model.

Atomics are indeed more efficient than classic locks due to their non-blocking behavior i.e. a thread waiting to access the memory location will not be context switched, which saves a lot of time.
Probably the best guideline when synchronization is needed is to see how you can reduce the critical section size as much as possible. General ideas include:
Use read-write locks instead of full locks when only a part of the threads need to write.
Find ways to restructure code in order to reduce the size of critical sections.
Use atomics when updating a single variable.
Note that some algorithms and data structures that traditionally need locks have lock-free versions (they are more complicated however).

Well, first off Atomic classes uses locking (via synchronized and volatile keywords) just as you'd do if you did it yourself by hand.
Second, immutability works great for multi-threading, you no longer need monitor locks and such, but that's because you can only read your immutables, you cand modify them.
You can't get rid of synchronized/volatile if you want to avoid race conditions in a multithreaded Java program (i.e. if the multiple threads cand read AND WRITE the same data). Your best bet is, if you want better performance, to avoid at least some of the built in thread safe classes which do sort of a more generic locking, and make your own implementation which is more tied to your context and thus might allow you to use more granullar synchronization & lock aquisition.
Check out this implementation of BlockingCache done by the Ehcache guys;
http://www.massapi.com/source/ehcache-2.4.3/src/net/sf/ehcache/constructs/blocking/BlockingCache.java.html

One of the alternatives is to make shared objects immutable. Check out this post for more details.

You can perform up to 50 million lock/unlocks per second. If you want this to be more efficient I suggest using more course grain locking. i.e. don't lock every little thing, but have locks for larger objects. Once you have much more locks than threads, you are less likely to have contention and having more locks may just add overhead.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.