Sometimes in java I have objects that are thread unsafe and expensive to create. I would like to create a cache of those objects so I don't need to re-create them but it must also prevent concurrent access to the same object.
For example I might have DateFormat and creating it is too expensive, but I can't share a single DateFormat. For arguments sake assume that I can't use a thread safe DateFormat.
What would be fantastic is to be able to create some cache like this:
Cache<DateFormat> cache = new Cache(() -> dateFormatCreator());
// and now make use of a dateFormat that might be created for this call
// or it might be an existing one from the cache.
cache.withExclusiveAccessToObject(dateFormat -> {
// use the dateFormat here, it is not in use by any other thread at the same time.
// new dateFormats can be created on the fly as needed.
});
I should have also mentioned that ThreadLocal is not ideal as I can not ensure threads are going to be re-used.
I believe there are two paths you can go:
Option 1
Maintain an object-per-thread
This can work if you access the expensive object from a limited well defined set of threads (read, using a thread-pool and not creating threads every time, which is what happens anyway in many applications).
In this case you can use a ThreadLocal. Since within one thread everything is expected to be sequential, you can keep thread-unsafe objects in a thread local.
You can think of ThreadLocal as a map that per thread maintains a dedicated instance of an expensive object.
Option 2
Share one (or N in general) objects between M threads so that N < M. In this case there might be a situation where two threads will try to work with the same object.
I'm not aware of ready solution for this, after all its your objects that you want to maintain, but in general its pretty easy to wrap your own implementation that will provide some sort of locking/synchronized access to the objects for your types of objects.
The range of ideas for implementations can vary. As an idea: You can wrap an actual object with a runtime/build-time generated proxy making it effectively thread safe:
public interface IMyObject {
void inc();
void dec();
}
// this is an object that you would like to make thread safe
public class MyActualObject implements IMyObject {
private int counter = 0;
void inc() {counter++;}
void dec() {counter--;}
}
public class MyThreadSafeProxy implements IMyObject {
private IMyObject realObject;
public MyThreadSafeProxy(IMyObject realObject) {
this.realObject = realObject;
}
#Override
public synchronized void inc() {
realObject.inc();
}
#Override
public syncrhonized void dec() {
realObject.dec();
}
}
Instead of storing MyObject-s you can wrap them in MyThreadSafeProxy
Its possible also to generate such a proxy automatically: See cglib framework or Dynamic Proxies (java.lang.Proxy class)
From my experience usually Option 1 is preferable unless the objects you work with are too expensive so that if there are N threads in the pool, you can't really support N objects in memory.
Related
So I have a non-thread safe API (some vendor software) I'm currently using, and the way we're currently using it is one object per thread. i.e. every thread has:
Foo instance = new Foo();
However, this doesn't appear to work for this particular library. Some non-thread safe bits of it still appear to butt heads, so I'm assuming this library has some static values within it. At a few points where we know that it has issues, we currently are using ReentrantLock to lock the class when need be. I.E.
public class Bar {
protected static final ReentrantLock lock = new ReentrantLock();
public void process() {
Foo instance = new Foo();
boolean locked = false;
try{
if(SomeCondition) {
locked = true;
Bar.lock.lock();
}
*//rest of the processing goes here
} finally {
if(locked){
Bar.lock.unlock();
}
}
}
}
My question is: In such an instance where the class in question is NOT thread safe, even when creating new instances of said class, is it better to use locking, or should I look i be using ThreadLocals instead? Will ThreadLocals even alleviate my actual issue? Does a ThreadLocal version of a class actually force static areas of a class to essentially be non-static?
All a ThreadLocal does is create a lookup where each thread can find its own instance of an object, so that no threads have to share. Conceptually you can think of it as a map keyed by thread id.
Letting each thread use its own objects is a good strategy for some cases, it's called "thread confinement" in the JCIP book. A common example of this is that SimpleDateFormat objects were not designed to be thread-safe and concurrent threads using them generated bad results. Using a ThreadLocal lets each thread use its own DateFormat, see this question for an example.
But if your problem is that the object references static fields, then those static fields exist on the class, not on the instance, so using ThreadLocal doesn't do anything to reduce sharing.
If somehow each of your threads used its own classloader then each would have its own class and the static fields on it would not be shared. Otherwise your locking on the class seems reasonable (though probably not speedy considering all your threads would be contending for the same lock).
The best approach would be working with the vendor to get them to fix their broken code.
ThreadLocal will not solve your problem, ThreadLocal simply store different instance for each thread independently. so in your case if you have shared resource on your 3rd party library level that wouldn't solve the problem.
A simple synchronized monitor will solve the problem, since you want to avoid concurrent access to that library, but be aware of the performance penalty of monitor - only one thread can access the lib concurrently
Just do:
public class Bar {
private static final Object LOCK = new Object();
public void process() {
synchronized(LOCK) {
Foo instance = new Foo();
instance.myMethod();
}
}
Let's consider this situation:
public class A {
private Vector<B> v = new Vector<B>();
}
public class B {
private HashSet<C> hs = new HashSet<C>();
}
public class C {
private String sameString;
public void setSameString(String s){
this.sameString = s;
}
}
My questions are:
Vector is thread-safe so when a thread calls over it, for instance, the get(int index)method Is this thread the only owner ofHashSeths?
If a thread call get(int index) over v and it obtains one B object. Then this thread obtains a C object and invoke setSameString(String s) method, is this write thread-safe? Or mechanism such as Lock are needed?
First of all, take a look at this SO on reasons not to use Vector. That being said:
1) Vector locks on every operation. That means it only allows one thread at a time to call any of its operations (get,set,add,etc.). There is nothing preventing multiple threads from modifying Bs or their members because they can obtain a reference to them at different times. The only guarantee with Vector (or classes that have similar synchronization policies) is that no two threads can concurrently modify the vector and thus get into a race condition (which could throw ConcurrentModificationException and/or lead to undefined behavior);
2) As above, there is nothing preventing multiple threads to access Cs at the same time because they can obtain a reference to them at different times.
If you need to protect the state of an object, you need to do it as close to the state as possible. Java has no concept of a thread owning an object. So in your case, if you want to prevent many threads from calling setSameString concurrently, you need to declare the method synchronized.
I recommend the excellent book by Brian Goetz on concurrency for more on the topic.
In case 2. It's not thread-safe because multiple threads could visit the data at the same time. Consider using read-write lock if you want to achieve better performance. http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html#readLock()
In Collection framework, why is external synchronization is faster than internal one(Vector, HashTable etc)? Even though they both use same mechanism?
What exactly meaning of internal and external synchronizations and how do they differ from each other?
It is really helpful if someone can explain with examples.
What exactly meaning of internal and external synchronizations and how do they differ from each other?
External synchronization is when the caller (you) use the synchronized keyword or other locks to protect against another class being accessed by multiple threads. It is usually used if the class in question is not synchronized itself -- SimpleDateFormat is a prime example. It can also be used if you need signaling between threads -- even when dealing with a concurrent collection.
why is external synchronization is faster than internal one(Vector, HashTable etc)? Even though they both use same mechanism?
External synchronization is not necessarily faster. Typically a class can determine precisely when it needs to synchronize around a critical section of code instead of the caller wrapping all method calls in a synchronized block.
If you are talking about the general recommendation to not use Vector and HashTable and instead use the Collections.synchronizedList(...) or synchronizedMap(...) methods, then this is because Vector and HashTable are seen as old/old-of-date classes. A wrapped ArrayList or HashMap is seen as a better solution.
Sometimes as #Chris pointed out, external synchronization can be faster when you need to make a number of changes to a class one after another. By locking externally once and then performing multiple changes to the class, this works better than each change being locked internally. A single lock being faster than multiple lock calls are made in a row.
It is really helpful if someone can explain with examples.
Instead of Vector, people typically recommend a wrapped ArrayList as having better performance. This wraps the non-synchronized ArrayList class in a wrapper class which external synchronizes it.
List<Foo> list = Collections.synchronizedList(new ArrayList<Foo>());
In terms of internal versus external in general, consider the following class that you want to allow multiple threads to use it concurrently:
public class Foo {
private int count;
public void addToCount() {
count++;
log.info("count increased to " + count);
}
}
You could use external synchronization and wrap every call to addToCount() in a synchronized block:
synchronized (foo) {
foo.addToCount();
}
Or the class itself can use internal synchronization and do the locking for you. This performs better because the logger class does not have to be a part of the lock:
public void addToCount() {
int val;
synchronized (this) {
val = ++count;
}
// this log call should not be synchronized since it does IO
log.info("count increased to " + val);
}
Of course, the Foo class really should use an AtomicInteger in this case and take care of its own reentrance internally:
private final AtomicInteger count = new AtomicInteger(0);
public void addToCount() {
int val = count.incrementAndGet()
log.info("count increased to " + val);
}
Let's say you work in a bank. Every time you need to use the safe, it needs to be unlocked, and then re-locked when you're done using it.
Now let's say that you need to carry 50 boxes into the safe. You have two options:
Carry each box over individually, opening and closing the (extremely heavy) door each time
Lock the front door to the bank and leave the vault open, make 50 trips without touching the internal vault door
Which one is faster? (The first option is internal synchronization, the second option is external synchronization.)
in trying to improve my understanding on concurrency issues, I am looking at the following scenario (Edit: I've changed the example from List to Runtime, which is closer to what I am trying):
public class Example {
private final Object lock = new Object();
private final Runtime runtime = Runtime.getRuntime();
public void add(Object o) {
synchronized (lock) { runtime.exec(program + " -add "+o); }
}
public Object[] getAll() {
synchronized (lock) { return runtime.exec(program + " -list "); }
}
public void remove(Object o) {
synchronized (lock) { runtime.exec(program + " -remove "+o); }
}
}
As it stands, each method is by thread safe when used standalone. Now, what I'm trying to figure out is how to handle where the calling class wishes to call:
for (Object o : example.getAll()) {
// problems if multiple threads perform this operation concurrently
example.remove(b);
}
But as noted, there is no guarantee that the state will be consistent between the call to getAll() and the calls to remove(). If multiple threads call this, I'll be in trouble. So my question is - How should I enable the developer to perform the operation in a thread safe manner? Ideally I wish to enforce the thread safety in a way that makes it difficult for the developer to avoid/miss, but at the same time not complicated to achieve. I can think of three options so far:
A: Make the lock 'this', so the synchronization object is accessible to calling code, which can then wrap the code blocks. Drawback: Hard to enforce at compile time:
synchronized (example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
B: Place the combined code into the Example class - and benefit from being able to optimize the implementation, as in this case. Drawback: Pain to add extensions, and potential mixing unrelated logic:
public class Example {
...
public void removeAll() {
synchronized (lock) { Runtime.exec(program + " -clear"); }
}
}
C: Provide a Closure class. Drawback: Excess code, potentially too generous of a synchronization block, could in fact make deadlocks easier:
public interface ExampleClosure {
public void execute(Example example);
}
public Class Example {
...
public void execute(ExampleClosure closure) {
synchronized (this) { closure.execute(this); }
}
}
example.execute(new ExampleClosure() {
public void execute(Example example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
}
);
Is there something I'm missing? How should synchronization be scoped to ensure the code is thread safe?
Use a ReentrantReadWriteLock which is exposed via the API. That way, if someone needs to synchronize several API calls, they can acquire a lock outside of the method calls.
In general, this is a classic multithreaded design issue. By synchronizing the data structure rather than synchronizing concepts that use the data structure, it's hard to avoid the fact that you essentially have a reference to the data structure without a lock.
I would recommend that locks not be done so close to the data structure. But it's a popular option.
A potential technique to make this style work is to use an editing tree-walker. Essentially, you expose a function that does a callback on each element.
// pointer to function:
// - takes Object by reference and can be safely altered
// - if returns true, Object will be removed from list
typedef bool (*callback_function)(Object *o);
public void editAll(callback_function func) {
synchronized (lock) {
for each element o { if (callback_function(o)) {remove o} } }
}
So then your loop becomes:
bool my_function(Object *o) {
...
if (some condition) return true;
}
...
editAll(my_function);
...
The company I work for (corensic) has test cases extracted from real bugs to verify that Jinx is finding the concurrency errors properly. This type of low level data structure locking without higher level synchronization is pretty common pattern. The tree editing callback seems to be a popular fix for this race condition.
I think everyone is missing his real problem. When iterating over the new array of Object's and trying to remove one at a time the problem is still technically unsafe (though ArrayList implantation would not explode, it just wouldnt have expected results).
Even with CopyOnWriteArrayList there is the possibility that there is an out of date read on the current list to when you are trying to remove.
The two suggestions you offered are fine (A and B). My general suggestion is B. Making a collection thread-safe is very difficult. A good way to do it is to give the client as little functionality as possible (within reason). So offering the removeAll method and removing the getAll method would suffice.
Now you can at the same time say, 'well I want to keep the API the way it is and let the client worry about additional thread-safety'. If thats the case, document thread-safety. Document the fact that a 'lookup and modify' action is both non atomic and non thread-safe.
Today's concurrent list implementations are all thread safe for the single functions that are offered (get, remove add are all thread safe). Compound functions are not though and the best that could be done is documenting how to make them thread safe.
I think j.u.c.CopyOnWriteArrayList is a good example of similar problem you're trying to solve.
JDK had a similar problem with Lists - there were various ways to synchronize on arbitrary methods, but no synchronization on multiple invocations (and that's understandable).
So CopyOnWriteArrayList actually implements the same interface but has a very special contract, and whoever calls it, is aware of it.
Similar with your solution - you should probably implement List (or whatever interface this is) and at the same time define special contracts for existing/new methods. For example, getAll's consistency is not guaranteed, and calls to .remove do not fail if o is null, or isn't inside the list, etc. If users want both combined and safe/consistent options - this class of yours would provide a special method that does exactly that (e.g. safeDeleteAll), leaving other methods close to original contract as possible.
So to answer your question - I would pick option B, but would also implement interface your original object is implementing.
From the Javadoc for List.toArray():
The returned array will be "safe" in
that no references to it are
maintained by this list. (In other
words, this method must allocate a new
array even if this list is backed by
an array). The caller is thus free to
modify the returned array.
Maybe I don't understand what you're trying to accomplish. Do you want the Object[] array to always be in-sync with the current state of the List? In order to achieve that, I think you would have to synchronize on the Example instance itself and hold the lock until your thread is done with its method call AND any Object[] array it is currently using. Otherwise, how will you ever know if the original List has been modified by another thread?
You have to use the appropriate granularity when you choose what to lock. What you're complaining about in your example is too low a level of granularity, where the lock doesn't cover all the methods that have to happen together. You need to make methods that combine all the actions that need to happen together within the same lock.
Locks are reentrant so the high-level method can call low-level synchronized methods without a problem.
In Java, the idiomatic way to declare critical sections in the code is the following:
private void doSomething() {
// thread-safe code
synchronized(this) {
// thread-unsafe code
}
// thread-safe code
}
Almost all blocks synchronize on this, but is there a particular reason for this? Are there other possibilities? Are there any best practices on what object to synchronize on? (such as private instances of Object?)
As earlier answerers have noted, it is best practice to synchronize on an object of limited scope (in other words, pick the most restrictive scope you can get away with, and use that.) In particular, synchronizing on this is a bad idea, unless you intend to allow the users of your class to gain the lock.
A particularly ugly case arises, though, if you choose to synchronize on a java.lang.String. Strings can be (and in practice almost always are) interned. That means that each string of equal content - in the ENTIRE JVM - turns out to be the same string behind the scenes. That means that if you synchronize on any String, another (completely disparate) code section that also locks on a String with the same content, will actually lock your code as well.
I was once troubleshooting a deadlock in a production system and (very painfully) tracked the deadlock to two completely disparate open source packages that each synchronized on an instance of String whose contents were both "LOCK".
First, note that the following code snippets are identical.
public void foo() {
synchronized (this) {
// do something thread-safe
}
}
and:
public synchronized void foo() {
// do something thread-safe
}
do exactly the same thing. No preference for either one of them except for code readability and style.
When you do synchronize methods or blocks of code, it's important to know why you are doing such a thing, and what object exactly you are locking, and for what purpose.
Also note that there are situations in which you will want to client-side synchronize blocks of code in which the monitor you are asking for (i.e. the synchronized object) is not necessarily this, like in this example :
Vector v = getSomeGlobalVector();
synchronized (v) {
// some thread-safe operation on the vector
}
I suggest you get more knowledge about concurrent programming, it will serve you a great deal once you know exactly what's happening behind the scenes. You should check out Concurrent Programming in Java, a great book on the subject. If you want a quick dive-in to the subject, check out Java Concurrency # Sun
I try to avoid synchronizing on this because that would allow everybody from the outside who had a reference to that object to block my synchronization. Instead, I create a local synchronization object:
public class Foo {
private final Object syncObject = new Object();
…
}
Now I can use that object for synchronization without fear of anybody “stealing” the lock.
Just to highlight that there are also ReadWriteLocks available in Java, found as java.util.concurrent.locks.ReadWriteLock.
In most of my usage, I seperate my locking as 'for reading' and 'for updates'. If you simply use a synchronized keyword, all reads to the same method/code block will be 'queued'. Only one thread can access the block at one time.
In most cases, you never have to worry about concurrency issues if you are simply doing reading. It is when you are doing writing that you worry about concurrent updates (resulting in lost of data), or reading during a write (partial updates), that you have to worry about.
Therefore a read/write lock makes more sense to me during multi-threaded programming.
You'll want to synchronize on an object that can serve as a Mutex. If the current instance (the this reference) is suitable (not a Singleton, for instance), you may use it, as in Java any Object may serve as the Mutex.
In other occasions, you may want to share a Mutex between several classes, if instances of these classes may all need access to the same resources.
It depends a lot on the environment you're working in and the type of system you're building. In most Java EE applications I've seen, there's actually no real need for synchronization...
Personally, I think the answers which insist that it is never or only rarely correct to sync on this are misguided. I think it depends on your API. If your class is a threadsafe implementation and you so document it, then you should use this. If the synchronization is not to make each instance of the class as a whole threadsafe in the invocation of it's public methods, then you should use a private internal object. Reusable library components often fall into the former category - you must think carefully before you disallow the user to wrap your API in external synchronization.
In the former case, using this allows multiple methods to be invoked in an atomic manner. One example is PrintWriter, where you may want to output multiple lines (say a stack trace to the console/logger) and guarantee they appear together - in this case the fact that it hides the sync object internally is a real pain. Another such example are the synchronized collection wrappers - there you must synchronize on the collection object itself in order to iterate; since iteration consists of multiple method invocations you cannot protect it totally internally.
In the latter case, I use a plain object:
private Object mutex=new Object();
However, having seen many JVM dumps and stack traces that say a lock is "an instance of java.lang.Object()" I have to say that using an inner class might often be more helpful, as others have suggested.
Anyway, that's my two bits worth.
Edit: One other thing, when synchronizing on this I prefer to sync the methods, and keep the methods very granular. I think it's clearer and more concise.
Synchronization in Java often involves synchronizing operations on the same instance. Synchronizing on this then is very idiomatic since this is a shared reference that is automatically available between different instance methods (or sections of) in a class.
Using another reference specifically for locking, by declaring and initializing a private field Object lock = new Object() for example, is something I never needed or used. I think it is only useful when you need external synchronization on two or more unsynchronized resources inside an object, although I would always try to refactor such a situation into a simpler form.
Anyway, implicit (synchronized method) or explicit synchronized(this) is used a lot, also in the Java libraries. It is a good idiom and, if applicable, should always be your first choice.
On what you synchronize depends on what other threads that might potentially get into conflict with this method call can synchronize.
If this is an object that is used by only one thread and we are accessing a mutable object which is shared between threads, a good candidate is to synchronize over that object - synchronizing on this has no point since another thread that modifies that shared object might not even know this, but does know that object.
On the other hand synchronizing over this makes sense if many threads call methods of this object at the same time, for instance if we are in a singleton.
Note that a syncronized method is often not the best option, since we hold a lock the whole time the method runs. If it contains timeconsuming but thread safe parts, and a not so time consuming thread-unsafe part, synchronizing over the method is very wrong.
Almost all blocks synchronize on this, but is there a particular reason for this? Are there other possibilities?
This declaration synchronizes entire method.
private synchronized void doSomething() {
This declaration synchronized a part of code block instead of entire method.
private void doSomething() {
// thread-safe code
synchronized(this) {
// thread-unsafe code
}
// thread-safe code
}
From oracle documentation page
making these methods synchronized has two effects:
First, it is not possible for two invocations of synchronized methods on the same object to interleave. When one thread is executing a synchronized method for an object, all other threads that invoke synchronized methods for the same object block (suspend execution) until the first thread is done with the object.
Are there other possibilities? Are there any best practices on what object to synchronize on? (such as private instances of Object?)
There are many possibilities and alternatives to synchronization. You can make your code thread safe by using high level concurrency APIs( available since JDK 1.5 release)
Lock objects
Executors
Concurrent collections
Atomic variables
ThreadLocalRandom
Refer to below SE questions for more details:
Synchronization vs Lock
Avoid synchronized(this) in Java?
the Best Practices is to create an object solely to provide the lock:
private final Object lock = new Object();
private void doSomething() {
// thread-safe code
synchronized(lock) {
// thread-unsafe code
}
// thread-safe code
}
By doing this you are safe, that no calling code can ever deadlock your method by an unintentional synchronized(yourObject) line.
(Credits to #jared and #yuval-adam who explained this in more details above.)
My guess is that the popularity of using this in tutorials came from early Sun javadoc: https://docs.oracle.com/javase/tutorial/essential/concurrency/locksync.html
Synchronization includes 3 parts: Atomicity, Visibility and Ordering
Synchronized block is very coarse level of synchronization. It enforces visibility and ordering just as what you expected. But for atomicity, it does not provide much protection. Atomicity requires global knowledge of the program rather than local knowledge. (And that makes multi-threading programming very hard)
Let's say we have a class Account having method deposit and withdraw. They are both synchronized based on a private lock like this:
class Account {
private Object lock = new Object();
void withdraw(int amount) {
synchronized(lock) {
// ...
}
}
void deposit(int amount) {
synchronized(lock) {
// ...
}
}
}
Considering we need to implement a higher-level class which handles transfer, like this:
class AccountManager {
void transfer(Account fromAcc, Account toAcc, int amount) {
if (fromAcc.getBalance() > amount) {
fromAcc.setBalance(fromAcc.getBalance() - amount);
toAcc.setBalance(toAcc.getBalance + amount);
}
}
}
Assuming we have 2 accounts now,
Account john;
Account marry;
If the Account.deposit() and Account.withdraw() are locked with internal lock only. That will cause problem when we have 2 threads working:
// Some thread
void threadA() {
john.withdraw(500);
}
// Another thread
void threadB() {
accountManager.transfer(john, marry, 100);
}
Because it is possible for both threadA and threadB run at the same time. And thread B finishes the conditional check, thread A withdraws, and thread B withdraws again. This means we can withdraw $100 from John even if his account has no enough money. This will break atomicity.
You may propose that: why not adding withdraw() and deposit() to AccountManager then? But under this proposal, we need to create a multi-thread safe Map which maps from different accounts to their locks. We need to delete the lock after execution (otherwise will leak memory). And we also need to ensure no other one accesses the Account.withdraw() directly. This will introduce a lots of subtle bugs.
The correct and most idiomatic way is to expose the lock in the Account. And let the AccountManager to use the lock. But in this case, why not just use the object itself then?
class Account {
synchronized void withdraw(int amount) {
// ...
}
synchronized void deposit(int amount) {
// ...
}
}
class AccountManager {
void transfer(Account fromAcc, Account toAcc, int amount) {
// Ensure locking order to prevent deadlock
Account firstLock = fromAcc.hashCode() < toAcc.hashCode() ? fromAcc : toAcc;
Account secondLock = fromAcc.hashCode() < toAcc.hashCode() ? toAcc : fromAcc;
synchronized(firstLock) {
synchronized(secondLock) {
if (fromAcc.getBalance() > amount) {
fromAcc.setBalance(fromAcc.getBalance() - amount);
toAcc.setBalance(toAcc.getBalance + amount);
}
}
}
}
}
To conclude in simple English, private lock does not work for slightly more complicated multi-threaded program.