How to resolve memory leak in multi threading environment? - java

I have a interesting case of memory leak in a multi threading environment.
I have following logic:
public void update(string key, CustomObj NewResource)
{
//fetch old resource for the given key from the concurrent hashmap
// update hashmap with the newResource for the given key
// close the old resource to prevent memory leak
}
public void read (string key)
{
// return resource for the given key
}
Now if I have two threads:
Thread#1: calling update method to update the resource for key K
Thread#2: calling read method to read the resource for the same key K.
Note: CustomObj belongs to a third party library so I cant put finalize method in it for closing it.
Even using synchronization on read and update method wont help because the update thread can close the resource while the read thread is still using it.
Could you please tell how to maintain thread safety without memory leak in this scenario ?

You should never use finalize() for reasons too broad to discuss here.
If several threads can work with one object at the same time, then you can use "reference counting" to track when the resource should be closed.
Every thread/function/etc that currently works with the object increments its "users count" with one when it acquires access to the object. When it stops working with it, then it decrements its "users count" by one. The thread that decremented the count to zero closes the object. You can take advantage of the various "atomic" primitives provided by the java standard library in order to create a lock-free solution.
As this an object from a third pary library, you'll need to create some kind of wrapper to track the references.
PS: Usually it's not a good idea to use objects with shared state across threads - it begs for trouble - synchronization issues, data races, performance lost in synchronization, etc.

Related

Taking lock to serialize

I am new to Java.
I am practicing by writing small programs.
In one of the programs I have an object that holds some configuration.
This configuration can be changed at runtime.
I am saving the configuration to file by serializing the object.
It seems to me that I must take a lock on the object that I am serializing before Serialization, to make sure it wouldn't change during the Serialization.
synchronized (myObject)
{
output.writeObject(myObject);
}
However I've read that one should try to avoid IO operation (such as writing to file) in synchronized block or under any other form of lock. It does make sense, since IO operation might take relatively long time, keeping other threads waiting/blocked.
I wonder, whether there is a way to avoid Serialization under lock...
Any suggestions will be welcome.
A couple of solutions to this problem are:
Only serialize immutable objects
Create a copy of the object and serialize the copy
But what happens in the interval between setting the object and starting to flush it? You could still end up having written another object.
A possible solution could be to lock the object only for writing after having modified it and unlock first when it has been flushed. Locking and unlocking could be done through acquiring and releasing a binary semaphore.
So, acquire() a permit before writing to the object variable and release one after having serialized. This will block only active Modifier-threads and effectively allow further concurrent execution. This will avoid the I/O polling in your example.
One problem is that there could be a second context switch from the Writer-thread - after having written it and just before releasing the lock. But, if you are OK with letting the Modifier-thread(s) wait a bit more, this is then a no worry.
Hope this helps!
You need to execute the serialization process inside lock as your use case required that during write no one should able to modify .
1.Basic solution will be reduce the execution time by Use the transient keyword to reduce the amount of data serialized. Additionally, customized readObject() and writeObject() methods may be beneficial in some cases.
2.If possible you can modify your logic so that you can split your lock So that multiple thread can read /but not able to modify.
3.You can use pattern like use in collection where iteration may take long time so they clone the original object before iteration.

Should I use ThreadLocal in this high traffic multi-threaded scenario?

Scenario
We are developing an API that will handle around 2-3 million hits per hour in a multi-threaded environment. The server is Apache Tomcat 7.0.64.
We have a custom object with lot of data let's call it XYZDataContext. When a new request comes in we associate XYZDataContext object to the request context. One XYZDataContext object per request. We will be spawning various threads in parallel to serve that request to collect/process data from/into XYZDataContext object. Our threads that will process things in parallel need access to this XYZDataContext object and
to avoid passing around of this object everywhere in the application, to various objects/methods/threads,
we are thinking to make it a threadlocal. Threads will use data from XYZDataContext object and will also update data in this object.
When the thread finishes we are planning to merge the data from the updated XYZDataContext object in the spawned child thread into the main thread's XYZDataContext object.
My questions:
Is this a good approach?
Threadpool risks - Tomcat server will maintain a threadpool and I read that using threadlocal with thread pools is a disaster because thread is not GCed per say and is reused so the references to the threadlocal objects will not get GCed and will result in storing huge objects in memory that we don't need anymore eventually resulting into OutOfMemory issues...
UNLESS they are referenced as weak references so that get GCed immediately.
We're using Java 1.7 open JDK. I saw the source code for ThreadLocal and the although the ThreadLocalMap.Entry is a weakreference it's not associated with a ReferenceQueue, and the comment for Entry constructor says "since reference queues are not used, stale entries are guaranteed to be removed only when the table starts running out of space."
I guess this works great in case of caches but is not the best thing in our case. I would like that the threadlocal XYZDataContext object be GCed immediately. Will the ThreadLocal.remove() method be effective here?
Is there any way to enforce emptying the space in the next GC run?
This is a right scenario to use ThreadLocal objects? Or are we abusing the threadlocal concept and using it where it shouldn't be used?
My gut feeling tells me you're on the wrong path. Since you already have a central context object (one for all threads) and you want to access it from multiple threads at the same time I would go with a Singleton hosting the context object and providing threadsafe methods to access it.
Instead of manipulating multiple properties of your context object, I would strongly suggest to do all manipulations at the same time. Best would be if you pass only one object containing all the properties you want to change in your context object.
e.g
Singleton.getInstance().adjustContext(ContextAdjuster contextAdjuster)
You might also want to consider using a threadsafe queue, filling it up with ContextAdjuster objects from your threads and finally processing it in the Context's thread.
Google for things like Concurrent, Blocking and Nonblocking Queue in Java. I am sure you'll find tons of example code.

Non blocking strategy for executing a pair of operations atomically in Java

Lets say I have a Set and another Queue. I want to check in the set if it contains(Element) and if not add(element) to the queue. I want to do the two steps atomically.
One obvious way is to use synchronized blocks or Lock.lock()/unlock() methods. Under thread contention , these will cause context switches. Is there any simple design strategy for achieving this in a non-blocking manner ? may be using some Atomic constructs ?
I don't think you can rely on any mechanism, except the ones you pointed out yourself, simply because you're operating on two structures.
There's decent support for concurrent/atomic operations on one data structure (like "put if not exists" in a ConcurrentHashMap), but for a sequence of operations, you're stuck with either a lock or a synchronized block.
For some operations you can employ what is called a "safe sequence", where concurrent operations may overlap without conflicting. For instance, you might be able to add a member to a set (in theory) without the need to synchronize, since two threads simultaneously adding the same member do not conceptually conflict with each other.
But to query one object and then conditionally operate on a different object is a much more complicated scenario. If your sequence was to query the set, then conditionally insert the member into the set and into the queue, the query and first insert could be replaced with a "compare and swap" operation that syncs without stalling (except perhaps at the memory access level), and then one could insert the member into the queue based on the success of the first operation, only needing to synchronize the queue insert itself. However, this sequence leaves the scenario where another thread could fail the insert and still not find the member in the queue.
Since the contention case is the relevant case you should look at "spin locks". They do not give away the CPU but spin on a flag expecting the flag to be free very soon.
Note however that real spin locks are seldom useful in Java because the normal Lock is quite good. See this blog where someone had first implemented a spinlock in Java only to find that after some corrections (i.e. after making the test correct) spin locks are on par with the standard stuff.
You can use java.util.concurrent.ConcurrentHashMap to get the semantics you want. They have a putIfAbsent that does an atomic insert. You then essentially try to add an element to the map, and if it succeeds, you know that thread that performed the insert is the only one that has, and you can then put the item in the queue safely. The other significant point here is that the operations on a ConcurrentMap insure "happens-before" semantics.
ConcurrentMap<Element,Boolean> set = new ConcurrentHashMap<Element,Boolean>();
Queue<Element> queue = ...;
void maybeAddToQueue(Element e) {
if (set.putIfAbsent(e, true) == null) {
queue.offer(e);
}
}
Note, the actual value type (Boolean) of the map is unimportant here.

How can I run a background thread that cleans up some elements in list regularly?

I am currently implementing cache. I have completed basic implementation, like below. What I want to do is to run a thread that will remove entry that satisfy certain conditions.
class Cache {
int timeLimit = 10; //how long each entry needs to be kept after accessed(marked)
int maxEntries = 10; //maximum number of Entries
HashSet<String> set = new HashSet<String>();
public void add(Entry t){
....
}
public Entry access(String key){
//mark Entry that it has been used
//Since it has been marked, background thread should remove this entry after timeLimit seconds.
return set.get(key);
}
....
}
My question is, how should I implement background thread so that the thread will go around the entries in set and remove the ones that has been marked && (last access time - now)>timeLimit ?
edit
Above is just simplified version of codes, that I did not write synchronized statements.
Why are you reinventing the wheel? EhCache (and any decent cache implementation) will do this for you. Also much more lightweight MapMaker Cache from Guava can automatically remove old entries.
If you really want to implement this yourself, it is not really that simple.
Remember about synchronization. You should use ConcurrentHashMap or synchronized keyword to store entries. This might be really tricky.
You must store last access time somehow of each entry somehow. Every time you access an entry, you must update that timestamp.
Think about eviction policy. If there are more than maxEntries in your cache, which ones to remove first?
Do you really need a background thread?
This is surprising, but EhCache (enterprise ready and proven) does not use background thread to invalidate old entries). Instead it waits until the map is full and removes entries lazily. This looks like a good trade-off as threads are expensive.
If you have a background thread, should there be one per cache or one global? Do you start a new thread while creating a new cache or have a global list of all caches? This is harder than you think...
Once you answer all these questions, the implementation is fairly simple: go through all the entries every second or so and if the condition you've already written is met, remove the entry.
I'd use Guava's Cache type for this, personally. It's already thread-safe and has methods built in for eviction from the cache based on some time limit. If you want a thread to periodically sweep it, you can just do something like this:
new Thread(new Runnable() {
public void run() {
cache.cleanUp();
try { Thread.sleep(MY_SLEEP_DURATION); } catch (Exception e) {};
}
}).start();
I don't imagine you really need a background thread. Instead you can just remove expired entries before or after you perform a lookup. This simplifies the entire implementation and its very hard to tell the difference.
BTW: If you use a LinkedHashMap, you can use it as a LRU cache by overriding removeEldestEntry (see its javadocs for an example)
First of all, your presented code is incomplete because there is no get(key) on HashSet (so I assume you mean some kind of Map instead) and your code does not mention any "marking." There are also many ways to do caching, and it is difficult to pick out the best solution without knowing what you are trying to cache and why.
When implementing a cache, it is usually assumed that the data-structure will be accessed concurrently by multiple threads. So the first thing you will need to do, is to make use of a backing data-structure that is thread-safe. HashMap is not thread-safe, but ConcurrentHashMap is. There are also a number of other concurrent Map implementations out there, namely in Guava, Javolution and high-scale lib. There are other ways to build caches besides maps, and their usefulness depends on your use case. Regardless, you will most likely need to make the backing data-structure thread-safe, even if you decide you don't need the background thread and instead evict expired objects upon attempting to retrieve them from the cache. Or letting the GC remove the entries by using SoftReferences.
Once you have made the internals of your cache thread-safe, you can simply fire up a new (most likely daemonized) thread that periodically sweeps/iterates the cache and removes old entries. The thread would do this in a loop (until interrupted, if you want to be able to stop it again) and then sleep for some amount of time after each sweep.
However, you should consider whether it is worth it for you, to build your own cache implementation. Writing thread-safe code is not easy, and I recommend that you study it before endeavouring to write your own cache implementation. I can recommend the book Java Concurrency in Practice.
The easier way to go about this is, of course, to use an existing cache implementation. There are many options available in Java-land, all with their own unique set of trade-offs.
EhCache and JCS are both general purpose caches that fit most caching needs one would find in a typical "enterprise" application.
Infinispan is a cache that is optimised for distributed use, and can thus cache more data than what can fit on a single machine. I also like its ConcurrentMap based API.
As others have mentioned, Googles Guava library has a Cache API, which is quite useful for smallish in-memory caches.
Since you want to limit the number of entries in the cache, you might be interested in an object-pool instead of a cache.
Apache Commons-Pool is widely used, and has APIs that resemble what you are trying to build yourself.
Stormpot, on the other hand, has a rather different API, and I am pretty much only mentioning it because I wrote it. It's probably not what you want, but who can be sure without knowing what you are trying to cache and why?
First, make access to your collection either synchronized or use ConcurrentHashSet a ConcurrentHashMap based Set as indicated in the comments below.
Second, write your new thread, and implement it as an endless loop that periodically iterates the prior collection and removes the elements. You should write this class in a way that it is initialized with the correct collection in the constructor, so that you do not have to worry about "how do I access the proper collection".

String and concurrency in Java

This maybe a related question: Java assignment issues - Is this atomic?
I have the same class as the OP that acts on a mutable string reference. But set rarely happens. (basically this string is part of a server configuration that only reloads when forced to).
public class Test {
private String s;
public void setS(String str){
s = str;
}
public String getS(){
return s;
}
}
Multiple threads will be pounding this variable to read its value. What is the best method to make it 'safe' while not having to incur the performance degradation by declaring it volatile?
I am currently heading into the direction of ReadWriteLock, but as far as I understand, ReadWrite locks does not make it safe from thread caching? unless some syncronisation happen? Which means I've gone a full circle back to I may as well just use the volatile keyword?
Is my understanding correct? Is there nothing that can 'notify' other threads about an update to a variable in main memory manually such that they can update their local cache just once on a full moon?
volatile on this seems overkill given that the server application is designed to run for months without restart. By that time, it would've served a few million reads. I'm thinking I might as well just set the String as static final and not allow it mutate without a complete application and JVM restart.
Reads and writes to references are atomic. The problems you can incur is attempting to perform a read and a write (an update) or guaranteeing that after a write all thread see this change on the next read. However, only you can say what your requirements are.
When you use volatile, it requires a cache coherent copy be read or written. This doesn't require a copy be made to/from main memory as the caches communicate amongst themselves, even between sockets. There is a performance impact but it doesn't mean the caches are not used.
Even if the access did go all the way to main memory, you could still do millions of accesses per second.
Why a mutable String? Why not a Config class with a simple static String. When config is updated, you change this static reference, which is an atomic operation and won't be a problem for reading threads. You then have no synchronization, no locking penalties.
In order to notify the clients to this server you can use observer pattern, who ever is interested in getting the info of server update can register for your event and server delivers the notification. This shouldnt become a bottleneck as you mentioned the reload is not often.
Now to make this thread safe you can have a separate thread handle the update of server state and if your get you check for the state if state is 'Updating' you wait for it to complete say you went to sleep. Once your update thread is done it should change the state from 'Updating' to 'Updated', once you come out of sleep check for the state if it is 'Updating' then go to sleep or else start servicing the request.
This approach will add an extra if in your code but then it will enable you to reload the cache without forcing application restart.
Also this shouldnt be a bottleneck as server update is not frequent.
Hope this makes some sense.
In order to avoid the volatile keyword, you could add a "memory barrier" method to your Test class that is only called very rarely, for example
public synchronized void sync() {
}
This will force the thread to re-read the field value from main memory.
Also, you would have to change the setter to
public synchronized void setS(String str){
s = str;
}
The synchronized keyword will force the setting thread to write directly to main memory.
See here for a detailed explanation of synchronization and memory barriers.

Categories