Using synchronized/locks in future code - java

We are building a web app with Scala, Play framework, and MongoDB (with ReactiveMongo as our driver). The application architecture is non-blocking end to end.
In some parts of our code, we need to access some non-thread-safe libraries such as Scala's parser combinators, Scala's reflection etc. We are currently enclosing such calls in synchronized blocks. I have two questions:
Are there any gotchas to look out for when using synchronized with future-y code?
Is it better to use locks (such as ReentrantLock) rather than synchronized, from both performance and usability standpoint?

This is an old question)) see here using-actors-instead-of-synchronized for example. In short it would be more advisable to use actors instead of locks:
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) ⇒ log.info("Hello " + who)
}
}
only one message will be processed at any given time, so you can put any not-thread safe code you want instead of log.info, everything will work OK. BTW using ask pattern you can seamlessly integrate your actors into existing code that requires futures.

For me the main problem you will face is that any call to a synchronized or a locked section of code may block and thus paralyze the threads of the execution context. To avoid this issue, you can wrap any call to a potentially blocking method using scala.concurrent.blocking:
import scala.concurrent._
import ExecutionContext.Implicits.global
def synchronizedMethod(s: String) = synchronized{ s.size }
val f = future{
println("Hello")
val i = blocking{ //Adjust the execution context behavior
synchronizedMethod("Hello")
}
println(i)
i
}
Of course, it may be better to consider alternatives like thread-local variables or wrapping invocation to serial code inside an actor.
Finally, I suggest using synchronized instead of locks. For most application (especially if the critical sections are huge), the performance difference is not noticeable.

The examples you mention i.e. reflection and parsing should be reasonably immutable and you shouldn't need to lock, but if you're going to use locks then a synchronized block will do. I don't think there's much of a performance difference between using synchronized vs Lock.

Well I think the easiest and safest way would be (if at all you can) from Thread Confinement.
i.e. each thread creates its own instance of parser combinators etc and then use it.
And in case you need any synchronization (which should be avoided as under traffic it will be the killer), synchornized or ReentrantLock will give almost same performace. It again depends on what objects need to be Guarded on what locks etc. In a web-application, it is discouraged unless absolutely necessary.

Related

What does "inherently thread-safe" mean?

I came across this line "some functions are inherently thread-safe, for example memcpy()"
Wikipedia defines "thread-safe" as:
A piece of code is thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time.
OK. But what does inherently mean? Is it related to inheritance?
It is not related to inheritance. It is an informal expression and means more like
"some functions are thread-safe by their nature". For example a function which does not
touch any shared values/state is thread safe anyway i.e. "is inherently thread-safe".
In this context I interpret it as "without having been designed to achieve it, it still is thread-safe".
There is no direct link to the concept of inheritance, although of course the words are related. This is not an example of inheritance in the object-oriented programming sense, of course. This is just a function, that from its core nature gets the property of being thread-safe.
Of course there's nothing magic about memcpy() being inherently thread-safe, either. Any function without internal state or "shared" side-effects will be so, which is why functional programming, where all functions are supposed to be "pure" and lack side-effects, lends itself so well to parallel programming.
In practice it's hard on typical computers to get "real work" done without side-effects, particularly I/O is very much defined by its side-effects. So even pure functional languages often have some non-functional corners.
Update: Of course, memcpy() is not free from side-effects, it's core purpose is to manipulate memory which, if shared between threads, certainly isn't safe. The assumption has to be that as long as the destination areas are distinct, it doesn't matter if one or more threads run memcpy() in parallel.
Contrast this with e.g. printf(), which generates characters on a single (for the process) output stream. It has to be explicitly implemented (as required by POSIX) to be thread-safe, whereas memcpy() does not.
An inherently thread safe function, is safe without having to take any specific design decisions regarding threading, it is thread safe simply by virtue of the task it performs as opposed to being redesigned to force thread safety. Say I write the very simple function:
int cube(int x)
{
return x*x*x;
}
It is inherently thread safe, as it has no way of reading from or writing to shared memory. However I could also make a function which is not thread safe but make it thread safe through specific design and synchronization. Say I have this similar function to before:
void cubeshare()
{
static int x;
x = x * x * x;
printf("%l", x);
}
This is not thread safe, it is entirely possible it could have the value of x change between each use (well this is actually unlikely in reality as x would get cached but lets say we are not doing any optimization).
We however could make this thread safe like this (this is pseudo code, a real mutex is more complicated):
void cubesharesafe(mutex foo)
{
static int x;
lockmutex(foo);
x = x * x * x;
printf("%l", x);
unlockmutex(foo);
}
This is however not inherently thread safe, we are forcing it to be through redesign. Real examples will often be far more complicated than this but I hope that this gives an idea taken to the most simple possible level. If you have any questions please comment bellow.
In case of memcpy, only a single thread is able to provide writes from a specific source to a specific destination. : thread-safe by initial design so.
Inherently means: without needing to "tune" the base function to achieve the goal, in this case: thread safety.
If multiple threads could interfere the same "channel" at the same time, you would end up with problem of thread-safety related to shared chucks of data.
inherent means Existing in something as a permanent.
it has nothing to do with inheritance..
by default,or already some methods are thread safe...in order to protect or avoid multitasking problems..
vector,hash table..are some of the example classes that are inherently thread safe..
nothing..confusing..there are some functions..which is thread safe by default..

Why Java/.NET allows every object to act as a lock? [duplicate]

Making every object lockable looks like a design mistake:
You add extra cost for every object created, even though you'll actually use it only in a tiny fraction of the objects.
Lock usage become implicit, having lockMap.get(key).lock() is more readable than synchronization on arbitrary objects, eg, synchronize (key) {...}.
Synchronized methods can cause subtle error of users locking the object with the synchronized methods
You can be sure that when passing an object to a 3rd parting API, it's lock is not being used.
eg
class Syncer {
synchronized void foo(){}
}
...
Syncer s = new Syncer();
synchronize(s) {
...
}
// in another thread
s.foo() // oops, waiting for previous section, deadlocks potential
Not to mention the namespace polution for each and every object (in C# at least the methods are static, in Java synchronization primitives have to use await, not to overload wait in Object...)
However I'm sure there is some reason for this design. What is the great benefit of intrinsic locks?
You add extra cost for every object created, even though you'll
actually use it only in a tiny fraction of the objects.
That's determined by the JVM implementation. The JVM specification says, "The association of a monitor with an object may be managed in various ways that are beyond the scope of this specification. For instance, the monitor may be allocated and deallocated at the same time as the object. Alternatively, it may be dynamically allocated at the time when a thread attempts to gain exclusive access to the object and freed at some later time when no thread remains in the monitor for the object."
I haven't looked at much JVM source code yet, but I'd be really surprised if any of the common JVMs handled this inefficiently.
Lock usage become implicit, having lockMap.get(key).lock() is more
readable than synchronization on arbitrary objects, eg, synchronize
(key) {...}.
I completely disagree. Once you know the meaning of synchronize, it's much more readable than a chain of method calls.
Synchronized methods can cause subtle error of users locking the
object with the synchronized methods
That's why you need to know the meaning of synchronize. If you read about what it does, then avoiding these errors becomes fairly trivial. Rule of thumb: Don't use the same lock in multiple places unless those places need to share the same lock. The same thing could be said of any language's lock/mutex strategy.
You can be sure that when passing an object to a 3rd parting API, it's
lock is not being used.
Right. That's usually a good thing. If it's locked, there should be a good reason why it's locked. Other threads (third party or not) need to wait their turns.
If you synchronize on myObject with the intent of allowing other threads to use myObject at the same time, you're doing it wrong. You could just as easily synchronize the same code block using myOtherObject if that would help.
Not to mention the namespace polution for each and every object (in C#
at least the methods are static, in Java synchronization primitives
have to use await, not to overload wait in Object...)
The Object class does include some convenience methods related to synchronization, namely notify(), notifyAll(), and wait(). The fact that you haven't needed to use them doesn't mean they aren't useful. You could just as easily complain about clone(), equals(), toString(), etc.
Actually you only have reference to that monitor in each object; the real monitor object is created only when you use synchronization => not so much memory is lost.
The alternative would be to add manually monitor to those classes that you need; this would complicate the code very much and would be more error-prone. Java has traded performance for productivity.
One benefit is automatic unlock on exit from synchronized block, even by exception.
I assume that like toString(), the designers thought that the benifits outweighed the costs.
Lots of decisions had to be made and a lot of the concepts were untested (Checked exceptions-ack!) but overall I'm sure it's pretty much free and more useful than an explicit "Lock" object.
Also do you add a "Lock" object to the language or the library? Seems like a language construct, but objects in the library very rarely (if ever?) have special treatment, but treating threading more as a library construct might have slowed things down..

In a class that has many instances, is it better to use synchronization, or an atomic variable for fields?

I am writing a class of which will be created quite a few instances. Multiple threads will be using these instances, so the getters and setters of the fields of the class have to be concurrent. The fields are mainly floats. Thing is, I don't know what is more resource-hungry; using a synchronized section, or make the variable something like an AtomicInteger?
You should favor atomic primitives when it is possible to do so. On many architectures, atomic primitives can perform a bit better because the instructions to update them can be executed entirely in user space; I think that synchronized blocks and Locks generally need some support from the operating system kernel to work.
Note my caveat: "when it is possible to do so". You can't use atomic primitives if your classes have operations that need to atomically update more than one field at a time. For example, if a class has to modify a collection and update a counter (for example), that can't be accomplished using atomic primitives alone, so you'd have to use synchronized or some Lock.
The question already has an accepted answer, but as I'm not allowed to write comments yet here we go. My answer is that it depends. If this is critical, measure. The JVM is quite good at optimizing synchronized accesses when there is no (or little) contention, making it much cheaper than if a real kernel mutex had to be used every time. Atomics basically use spin-locks, meaning that they will try to make an atomic change and if they fail they will try again and again until they succeed. This can eat quite a bit of CPU is the resource is heavily contended from many threads.
With low contention atomics may well be the way to go, but in order to be sure try both and measure for your intended application.
I would probably start out with synchronized methods in order to keep the code simple; then measure and make the change to atomics if it makes a difference.
It is very important to construct the instances properly before they have been used by multiple threads. Otherwise those threads will get incomplete or wrong data from those partially constructed instances. My personal preference would be to use synchronized block.
Or you can also follow the "Lazy initialization holder class idiom" outlined by Brain Goetz in his book "Java concurrency in Practice":
#ThreadSafe
public class ResourceFactory {
private static class ResourceHolder {
public static Resource resource = new Resource();
}
public static Resource getResource() {
return ResourceHolder.resource;
}
}
Here the JVM defers initializing the ResourceHolder class until it is actually used. Moreover Resource is initialized with a static initializer, no additional synchronization is needed.
Note: Statically initialized objects require no explicit synchronization either during construction or when being referenced. But if the object is mutable, synchronization is still required by both readers and writers to make subsequent modifications visible and also to avoid data corruption.

Lazy-loaded singleton: Double-checked locking vs Initialization on demand holder idiom

I have a requirement to lazy-load resources in a concurrent environment. The code to load the resources should be executed only once.
Both Double-checked locking (using JRE 5+ and the volatile keyword) and Initialization on demand holder idiom seems to fit the job well.
Just by looking at the code, Initialization on demand holder idiom seems cleaner and more efficient (but hey, I'm guessing here). Still, I will have to take care and document the pattern at every one of my Singletons. At least to me, It would be hard to understand why code was written like this on the spot...
My question here is: Which approach s is better? And why?
If your answer is none. How would you tackle this requirement in a Java SE environment?
Alternatives
Could I use CDI for this without imposing it's use over my entire project? Any articles out there?
To add another, perhaps cleaner, option. I suggest the enum variation:
What is the best approach for using an Enum as a singleton in Java?
As far as readability I would go with the initialization on demand holder. The double checked locking, I feel, is a dated and an ugly implementation.
Technically speaking, by choosing double checked locking you would always incur a volatile read on the field where as you can do normal reads with the initialization on demand holder idiom.
Initialisation-on-demand holder only works for a singleton, you can't have per-instance lazily loaded elements. Double-checked locking imposes a cognitive burden on everyone who has to look at the class, as it is easy to get wrong in subtle ways. We used to have all sorts of trouble with this until we encapsulated the pattern into utility class in our concurrency library
We have the following options:
Supplier<ExpensiveThing> t1 = new LazyReference<ExpensiveThing>() {
protected ExpensiveThing create() {
… // expensive initialisation
}
};
Supplier<ExpensiveThing> t2 = Lazy.supplier(new Supplier<ExpensiveThing>() {
public ExpensiveThing get() {
… // expensive initialisation
}
});
Both have identical semantics as far as the usage is concerned. The second form makes any references used by the inner supplier available to GC after initialisation. The second form also has support for timeouts with TTL/TTI strategies.
Initialization-on-demand holder is always best practice for implementing singleton pattern. It exploits the following features of the JVM very well.
Static nested classes are loaded only when called by name.
The class loading mechanism is by default concurrency protected. So when a thread initializes a class, the other threads wait for its completion.
Also, you don't have to use the synchronize keyword, it makes your program 100 times slower.
I suspect that the initialization on demand holder is marginally faster that double-checked locking (using a volatile). The reason is that the former has no synchronization overhead once the instance has been created, but the latter involves reading a volatile which (I think) entails a full memory read.
If performance is not a significant concern, then the synchronized getInstance() approach is the simplest.

Why Classes to java have been added which are not thread safe?

I am seeing a lot of classes being added to Java which are not thread safe.
Like StringBuilder is not thread safe while StringBuffer was and StringBuilder is recoomended over Stringbuffer.
Also various collection classes are not thread safe.
Isn't being thread safe a good thing ?
Or i am just stupid and don't yet understand the meaning of being thread safe ?
Because thread safety makes things slower, and not everything has to be multi-threaded.
Consider reading this article to find out basics about thread safety :
http://en.wikipedia.org/wiki/Thread_safety
When you comfortable enough with the threads/or not, consider reading this book, it has great reviews :
http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601
Some classes are not suitable for using across multiple threads. StringBuffer is one of them IMHO.
It is very hard to find even a contrived example of when you would use StringBuffer in a multi-threaded way that cannot be more simple achieve other ways.
Thread safety is not a all or nothing property. Ten years ago some books recommended marking all methods of a class as synchronized in order to make them thread safe. This costs some performane, but it is far from a guarantee that your overall program is thread safe. Therefore, you have costs with a questionable gain. That is, why there are still classes added to Java library which are not thread safe.
The "make every method synchronized" strategy is only able to provide guarantees about the consistency of one object, and it has the potential to introduce dead-locks, or to be weaker than thought (think about wait()).
There is a performance overhead to inherently thread-safe code. If you do not need the class in a concurrent context but need the performance to be high then these, original classes are not ideal.
A typical usage of StringBuilder is something like:
return new StringBuilder().append("this").append("that").toString()
all in one thread, no need to synchronize anything.

Categories