Why Java/.NET allows every object to act as a lock? [duplicate] - java

Making every object lockable looks like a design mistake:
You add extra cost for every object created, even though you'll actually use it only in a tiny fraction of the objects.
Lock usage become implicit, having lockMap.get(key).lock() is more readable than synchronization on arbitrary objects, eg, synchronize (key) {...}.
Synchronized methods can cause subtle error of users locking the object with the synchronized methods
You can be sure that when passing an object to a 3rd parting API, it's lock is not being used.
eg
class Syncer {
synchronized void foo(){}
}
...
Syncer s = new Syncer();
synchronize(s) {
...
}
// in another thread
s.foo() // oops, waiting for previous section, deadlocks potential
Not to mention the namespace polution for each and every object (in C# at least the methods are static, in Java synchronization primitives have to use await, not to overload wait in Object...)
However I'm sure there is some reason for this design. What is the great benefit of intrinsic locks?

You add extra cost for every object created, even though you'll
actually use it only in a tiny fraction of the objects.
That's determined by the JVM implementation. The JVM specification says, "The association of a monitor with an object may be managed in various ways that are beyond the scope of this specification. For instance, the monitor may be allocated and deallocated at the same time as the object. Alternatively, it may be dynamically allocated at the time when a thread attempts to gain exclusive access to the object and freed at some later time when no thread remains in the monitor for the object."
I haven't looked at much JVM source code yet, but I'd be really surprised if any of the common JVMs handled this inefficiently.
Lock usage become implicit, having lockMap.get(key).lock() is more
readable than synchronization on arbitrary objects, eg, synchronize
(key) {...}.
I completely disagree. Once you know the meaning of synchronize, it's much more readable than a chain of method calls.
Synchronized methods can cause subtle error of users locking the
object with the synchronized methods
That's why you need to know the meaning of synchronize. If you read about what it does, then avoiding these errors becomes fairly trivial. Rule of thumb: Don't use the same lock in multiple places unless those places need to share the same lock. The same thing could be said of any language's lock/mutex strategy.
You can be sure that when passing an object to a 3rd parting API, it's
lock is not being used.
Right. That's usually a good thing. If it's locked, there should be a good reason why it's locked. Other threads (third party or not) need to wait their turns.
If you synchronize on myObject with the intent of allowing other threads to use myObject at the same time, you're doing it wrong. You could just as easily synchronize the same code block using myOtherObject if that would help.
Not to mention the namespace polution for each and every object (in C#
at least the methods are static, in Java synchronization primitives
have to use await, not to overload wait in Object...)
The Object class does include some convenience methods related to synchronization, namely notify(), notifyAll(), and wait(). The fact that you haven't needed to use them doesn't mean they aren't useful. You could just as easily complain about clone(), equals(), toString(), etc.

Actually you only have reference to that monitor in each object; the real monitor object is created only when you use synchronization => not so much memory is lost.
The alternative would be to add manually monitor to those classes that you need; this would complicate the code very much and would be more error-prone. Java has traded performance for productivity.

One benefit is automatic unlock on exit from synchronized block, even by exception.

I assume that like toString(), the designers thought that the benifits outweighed the costs.
Lots of decisions had to be made and a lot of the concepts were untested (Checked exceptions-ack!) but overall I'm sure it's pretty much free and more useful than an explicit "Lock" object.
Also do you add a "Lock" object to the language or the library? Seems like a language construct, but objects in the library very rarely (if ever?) have special treatment, but treating threading more as a library construct might have slowed things down..

Related

How are the final multi-threading guarantees and the memory model related in Java?

The memory model is defined in 17.4. Memory Model.
The final field multi-threading guarantees are given in 17.5. final Field Semantics.
I don't understand why these are separate sections.
AFAIK both final and the memory model provide some guarantees.
And any real program execution must respect both guarantees.
But it's now clear whether the final guarantees work for the intermediate executions used to validate causality requirements in 17.4.8. Executions and Causality Requirements.
Another unclear moment is that 17.5.1. Semantics of final Fields defines a new "special" happens-before, which differs from the happens-before in the memory model:
This happens-before ordering does not transitively close with other happens-before orderings.
If these are the same happens-before, then the happens-before isn't a partial order anymore (because it isn't transitive).
I don't understand how this doesn't break things.
If these are different happens-before, then it's not clear what the one in 17.5. final Field Semantics does.
The happens-before in 17.4. Memory Model is used to restrict what a read can return:
Informally, a read r is allowed to see the result of a write w if there is no happens-before ordering to prevent that read.
But 17.5. final Field Semantics is a different section.
The special 'final field guarantees' part was a later add-on. Documentation sometimes follows the quirks of history - possibly, had the 'final field guarantee' issue been discovered prior to the first release of the JMM, the documentation would have been structured differently.
In other words, you're asking for 'why is this stuff in a separate chapter' and perhaps the answer is: "Because it was added in a later version of java, and therefore it was written at a completely different time; a new chapter is presumably the simplest way to add some more documentation". We're talking about decades ago at this point, of course.
§17.5 explains its purpose. Quote:
The usage model for final fields is a simple one: Set the final fields for an object in that object's constructor; and do not write a reference to the object being constructed in a place where another thread can see it before the object's constructor is finished. If this is followed, then when the object is seen by another thread, that thread will always see the correctly constructed version of that object's final fields. It will also see versions of any object or array referenced by those final fields that are at least as up-to-date as the final fields are.
In other words, in the distant past, you could do this:
Thread A:
Make a new object. The constructor is 'well behaved' 1
Communicate the ref to this new object to another thread. Possibly in an unsafe way.
Thread B:
The receiving thread gets the correct ref (either because you did it safely with synchronization, i.e. happens-before relationship set up properly, or because you did it unsafely, but the JMM does not guarantee that unsafe code fails to work: It may still work).
It calls a method of this object.
Said object witnesses a final-marked field that isn't initialized, because the initialization did occur in thread A, but no happens-before relationship exists, and re-ordering and other shenanigans means that this thread doesn't see it yet.
This is extremely annoying. Part of the point of immutable classes is that you can more or less print out the JMM and set it on fire. You just don't need to care about virtually every tricky rule in it if your system is an amalgamation of immutable types. Except, it didn't actually work out that way prior to the existence of §17.5
The JMM as a general principle is designed to give any JVM implementations as few 'handcuffs' as possible whilst making developing for the JVM as uncomplicated as possible. It's a fine line - for example, had the JMM simply stated: "The JVM is free to re-order whatever it wants at any time, and cache whatever it wants, at any time, for whatever duration it wants", then writing JVMs that run code quickly and according to spec would be 'easier' (JVM impls would be faster), but, writing multithreaded code that actually does what you intended it to becomes borderline impossible. On the flipside, the JMM could also have guaranteed that re-ordering in the JVM is impossible to observe regardless of circumstance or architecture. But then JVMs would be slow as molasses, see Python and its much maligned global interpreter lock.
The JMM tries to be the happy compromise. And §17.5 is written with the same spirit.
It basically says:
You CAN rely on the notion that any well-behaved construction means that final fields will just work out without having to worry about happens-before relationships whatsoever.
However, you CANNOT, at all, rely on HOW the JVM implements the guarantee. In particular, we have defined what you can exactly rely on in terms of Happens-Before, but it's not the same H-B that the rest of the JMM talks about. We guarantee you that well-behaved construction means final fields won't be an issue but that's as far as our guarantee goes: You cannot use this guarantee to then force other guarantees out of the JMM; you can't use this mechanism as a wonky way to establish H-B for other stuff, for example.
The JMM buys room to maneuver for JVM impls. Whether a JVM impl actually uses it, is up to the JVM implementor. In other words, a JVM implementor may well decide to implement §17.5 by using the same locking mechanisms it uses to guarantee the H-B stuff in §17.4, and thus effectively you can apply properties like 'H-B relationships are transitive'. The point of the JMM is partly to allow JVM impls to take some pretty drastically different approaches to how the guarantees it dictates are in fact guaranteed. That's because JVMs have to be written so that they can run code about as fast as native code could on a wide variety of hardware, whilst still being a target platform that isn't impossible to develop for.
Quite the tightrope walk. This is the primary underlying explanation for the JMM can be obtuse and bizarre at times.
[1] A 'well behaved' constructor:
Does not pass its own reference (this) to any code outside of its own class during construction.
Does not invoke any of its own instance methods that then read its own fields (or, especially problematic, which can be overridden by subclasses, whose implementation uses its own fields). Basically: Calling any non-final method is an instant "You are not well behaved" violation.
Does not send any object refs of things I wishes to store in fields to code in other classes during construction. Even if it has already assigned it to the final field before doing so.

Risks of volatile-mutable fields in single-threaded contexts?

Is it safe to use the :volatile-mutable qualifier with deftype in a single-threaded program? This is a follow up to this question, this one, and this one. (It's a Clojure question, but I added the "Java" tag because Java programmers are likely to have insights about it, too.)
I've found that I can get a significant performance boost in a program I'm working on by using :volatile-mutable fields in a deftype rather than atoms, but I'm worried because the docstring for deftype says:
Note well that mutable fields are extremely difficult to use
correctly, and are present only to facilitate the building of higher
level constructs, such as Clojure's reference types, in Clojure
itself. They are for experts only - if the semantics and implications
of :volatile-mutable or :unsynchronized-mutable are not immediately
apparent to you, you should not be using them.
In fact, the semantics and implications of :volatile-mutable are not immediately apparent to me.
However, chapter 6 of Clojure Programming, by Emerick, Carper, and Grand says:
"Volatile" here has the same meaning as the volatile field modifier in
Java: reads and writes are atomic and must be executed in
program order; i.e., they cannot be reordered by the JIT compiler or
by the CPU. Volatiles are thus unsurprising and thread-safe — but
uncoordinated and still entirely open to race conditions.
This seems to imply that as long as accesses to a single volatile-mutable deftype field all take place within a single thread, there is nothing to special to worry about. (Nothing special, in that I still have to be careful about how I handle state if I might be using lazy sequences.) So if nothing introduces parallelism into my Clojure program, there should be no special danger to using deftype with :volatile-mutable.
Is that correct? What dangers am I not understanding?
That's correct, it's safe. You just have to be sure that your context is really single-threaded. Sometimes it's not that easy to guarantee that.
There's no risk in terms of thread-safety or atomicity when using a volatile mutable (or just mutable) field in a single-threaded context, because there's only one thread so there's no chance of two threads writing a new value to the field at the same time, or one thread writing a new value based on outdated values.
As others have pointed out in the comments you might want to simply use an :unsynchronized-mutable field to avoid the cost introduced by volatile. That cost comes from the fact that every write must be committed to main memory instead of thread local memory. See this answer for more info about this.
At the same time, you gain nothing by using volatile in a single-threaded context because there's no chance of having one thread writing a new value that will not be "seen" by other thread reading the same field.
That's what a volatile is intended for, but it's irrelevant in a single-thread context.
Also note that clojure 1.7 introduced volatile! intended to provide a "volatile box for managing state" as a faster alternative to
atom, with a similar interface but without it's compare and swap semantics. The only difference when using it is that you call vswap! and vreset! instead of swap! and reset!. I would use that instead of
deftype with ^:volatile-mutable if I need a volatile.

safe publication and the advantage of being immutable vs. effectively immutable

I'm re-reading Java Concurrency In Practice, and I'm not sure I fully understand the chapter about immutability and safe publication.
What the book says is:
Immutable objects can be used safely by any thread without additional
synchronization, even when synchronization is not used to publish
them.
What I don't understand is, why would anyone (interested in making his code correct) publish some reference unsafely?
If the object is immutable, and it's published unsafely, I understand that any other thread obtaining a reference to the object would see its correct state, because of the guarantees offered by proper immutability (with final fields, etc.).
But if the publication is unsafe, another thread might still see null or the previous reference after the publication, instead of the reference to the immutable object, which seems to me like something no-one would like.
And if safe publication is used to make sure the new reference is seen by all the threads, then even if the object is just effectively immutable (no final fields, but no way to mute them), then everything is safe again. As the book says :
Safely published effectively immutable objects can be used safely by
any thread without additional synchronization.
So, why is immutability (vs. effective immutability) so important? In what case would an unsafe publication be wanted?
It is desirable to design objects that don't need synchronization for two reasons:
The users of your objects can forget to synchronize.
Even though the overhead is very little, synchronization is not free, especially if your objects are not used often and by many different threads.
Because the above reasons are very important, it is better to learn the sometimes difficult rules and as a writer, make safe objects that don't require synchronization rather than hoping all the users of your code will remember to use it correctly.
Also remember that the author is not saying the object is unsafely published, it is safely published without synchronization.
As for your second question, I just checked, and the book does not promise you that another thread will always see the reference to the updated object, just that if it does, it will see a complete object. But I can imagine that if it is published through the constructor of another (Runnable?) object, it will be sweet. That does help with explaining all cases though.
EDIT:
effectively immutable and immutable
The difference between effectively immutable and immutable is that in the first case you still need to publish the objects in a safe way. For the truly immutable objects this isn't needed. So truly immutable objects are preferred because they are easier to publish for the reasons I stated above.
So, why is immutability (vs. effective immutability) so important?
I think the main point is that truly immutable objects are harder to break later on. If you've declared a field final, then it's final, period. You would have to remove the final in order to change that field, and that should ring an alarm. But if you've initially left the final out, someone could carelessly just add some code that changes the field, and boom - you're screwed - with only some added code (possibly in a subclass), no modification to existing code.
I would also assume that explicit immutability enables the (JIT) compiler to do some optimizations that would otherwise be hard or impossible to justify. For example, when using volatile fields, the runtime must guarantee a happens-before relation with writing and reading threads. In practice this may require memory barriers, disabling out-of-order execution optimizations, etc. - that is, a performance hit. But if the object is (deeply) immutable (contains only final references to other immutable objects), the requirement can be relaxed without breaking anything: the happens-before relation needs to be guaranteed only with writing and reading the one single reference, not the whole object graph.
So, explicit immutability makes the program simpler so that it's both easier for humans to reason and maintain and easier for the computer to execute optimally. These benefits grow exponentially as the object graph grows, i.e. objects contain objects that contain objects - it's all simple if everything is immutable. When mutability is needed, localizing it to strictly defined places and keeping everything else immutable still gives lots of these benefits.
I had the exact same question as the original poster when finishing reading chapters 1-3 . I think the authors could have done a better job elaborating on this a bit more.
I think the difference lies therein that the internal state of effectively immutable objects can be observed to be in an inconsistent state when they are not safely published whereas the internal state of immutable objects can never be observed to be in an inconsistent state.
However I do think the reference to an immutable object can be observed to be out of date / stale if the reference is not safely published.
"Unsafe publication" is often appropriate in cases where having other threads see the latest value written to a field would be desirable, but having threads see an earlier value would be relatively harmless. A prime example is the cached hash value for String. The first time hashCode() is called on a String, it will compute a value and cache it. If another thread which calls hashCode() on the same string can see the value computed by the first thread, it won't have to recompute the hash value (thus saving time), but nothing bad will happen if the second thread doesn't see the hash value. It will simply end up performing a redundant-but-harmless computation which could have been avoided. Having hashCode() publish the hash value safely would have been possible, but the occasional redundant hash computations are much cheaper than the synchronization required for safe publication. Indeed, except on rather long strings, synchronization costs would probably negate any benefit from caching.
Unfortunately, I don't think the creators of Java imagined situations where code would write to a field and prefer that it should be visible to other threads, but not mind too much if it isn't, and where the reference stored to the field would in turn identify another object with a similar field. This leads to situations writing semantically-correct code is much more cumbersome and likely slower than code which would be likely to work but whose semantics would not be guaranteed. I don't know any really good remedy for that in some cases other than using some gratuitous final fields to ensure that things get properly "published".

In a class that has many instances, is it better to use synchronization, or an atomic variable for fields?

I am writing a class of which will be created quite a few instances. Multiple threads will be using these instances, so the getters and setters of the fields of the class have to be concurrent. The fields are mainly floats. Thing is, I don't know what is more resource-hungry; using a synchronized section, or make the variable something like an AtomicInteger?
You should favor atomic primitives when it is possible to do so. On many architectures, atomic primitives can perform a bit better because the instructions to update them can be executed entirely in user space; I think that synchronized blocks and Locks generally need some support from the operating system kernel to work.
Note my caveat: "when it is possible to do so". You can't use atomic primitives if your classes have operations that need to atomically update more than one field at a time. For example, if a class has to modify a collection and update a counter (for example), that can't be accomplished using atomic primitives alone, so you'd have to use synchronized or some Lock.
The question already has an accepted answer, but as I'm not allowed to write comments yet here we go. My answer is that it depends. If this is critical, measure. The JVM is quite good at optimizing synchronized accesses when there is no (or little) contention, making it much cheaper than if a real kernel mutex had to be used every time. Atomics basically use spin-locks, meaning that they will try to make an atomic change and if they fail they will try again and again until they succeed. This can eat quite a bit of CPU is the resource is heavily contended from many threads.
With low contention atomics may well be the way to go, but in order to be sure try both and measure for your intended application.
I would probably start out with synchronized methods in order to keep the code simple; then measure and make the change to atomics if it makes a difference.
It is very important to construct the instances properly before they have been used by multiple threads. Otherwise those threads will get incomplete or wrong data from those partially constructed instances. My personal preference would be to use synchronized block.
Or you can also follow the "Lazy initialization holder class idiom" outlined by Brain Goetz in his book "Java concurrency in Practice":
#ThreadSafe
public class ResourceFactory {
private static class ResourceHolder {
public static Resource resource = new Resource();
}
public static Resource getResource() {
return ResourceHolder.resource;
}
}
Here the JVM defers initializing the ResourceHolder class until it is actually used. Moreover Resource is initialized with a static initializer, no additional synchronization is needed.
Note: Statically initialized objects require no explicit synchronization either during construction or when being referenced. But if the object is mutable, synchronization is still required by both readers and writers to make subsequent modifications visible and also to avoid data corruption.

questions around synchronization in java; when/how/to what extent

I am working on my first mutlithreaded program and got stuck about a couple of aspects of synchronization. I have gone over the multi-threading tutorial on oracle/sun homepage, as well as a number of questions here on SO, so I believe I have an idea of what synchronization is. However, as I mentioned there are a couple of aspects I am not quite sure how to figure out. I formulated them below in form of clear-cut question:
Question 1: I have a singleton class that holds methods for checking valid identifiers. It turns out this class needs to hold to collections to keep track of associations between 2 different identifier types. (If the word identifier sounds complicated; these are just strings). I chose to implement two MultiValueMap instances to implement this many-to-many relationship. I am not sure if these collections have to be thread-safe as the collection will be updated only at the creation of the instance of the singleton class but nevertheless I noticed that in the documentation it says:
Note that MultiValueMap is not synchronized and is not thread-safe. If you wish to use this map from multiple threads concurrently, you must use appropriate synchronization. This class may throw exceptions when accessed by concurrent threads without synchronization.
Could anyone elaborate on this "appropriate synchronization"? What exactly does it mean? I can't really use MultiValueMap.decorate() on a synchronized HashMap, or have I misunderstood something?
Question 2: I have another class that extends a HashMap to hold my experimental values, that are parsed in when the software starts. This class is meant to provide appropriate methods for my analysis, such as permutation(), randomization(), filtering(criteria) etc. Since I want to protect my data as much as possible, the class is created and updated once, and all the above mentioned methods return new collections. Again, I am not sure if this class needs to be thread-safe, as it's not supposed to be updated from multiple threads, but the methods will most certainly be called from a number of threads, and to be "safe" I have added synchronized modifier to all my methods. Can you foresee any problems with that? What kind of potential problems should I be aware of?
Thanks,
Answer 1: Your singleton class should not expose the collections it uses internally to other objects. Instead it should provide appropriate methods to expose the behaviours you want. For example, if your object has a Map in it, don't have a public or protected method to return that Map. Instead have a method that takes a key and returns the corresponding value in the Map (and optionally one that sets the value for the key). These methods can then be made thread safe if required.
NB even for collections that you do not intend to write to, I don't think you should assume that reads are necessarily thread safe unless they are documented to be so. The collection object might maintain some internal state that you don't see, but might get modified on reads.
Answer 2: Firstly, I don't think that inheritance is necessarily the correct thing to use here. I would have a class that provides your methods and has a HashMap as a private member. As long as your methods don't change the internal state of the object or the HashMap, they won't have to be synchronised.
It's hard to give general rules about synchronization, but your general understanding is right. A data-structure which is used in a read-only way, does not have to be synchronized. But, (1) you have to ensure that nobody (i.e. no other thread) can use this structure before it is properly initialized and (2) that the structure is indeed read-only. Remember, even iterators have a remove method.
To your second question: In order to ensure the immutability, i.e. that it is read-only, I would not inherit the HashMap but use it inside your class.
Synchronization commonly is needed when you either could have concurrent modifications of the underlying data or one thread modifies the data while another reads and needs to see that modification.
In your case, if I understand it correctly, the MultiValueMap is filled once upon creation and the just read. So unless reading the map would modify some internals it should be safe to read it from multiple threads without synchronization. The creation process should be synchronized or you should at least prevent read access during initialization (a simple flag might be sufficient).
The class you descibe in question 2 might not need to be synchronized if you always return new collections and no internals of the base collection are modified during creation of those "copies".
One additional note: be aware of the fact that the values in the collections might need to be synchronized as well, since if you safely get an object from the collection in multiple thread but then concurrently modify that object you'll still get problems.
So as a general rule of thumb: read-only access does not necessarily need synchronization (if the objects are not modified during those reads or if that doesn't matter), write access should generally be synchronized.
If your maps are populated once, at the time the class is loaded (i.e. in a static initializer block), and are never modified afterwards (i.e. no elements or associations are added / removed), you are fine. Static initialization is guaranteed to be performed in a thread safe manner by the JVM, and its results are visible to all threads. So in this case you most probably don't need any further synchronization.
If the maps are instance members (this is not clear to me from your description), but not modified after creation, I would say again you are most probably safe if you declare your members final (unless you publish the this object reference prematurely, i.e. pass it to the outside world from the cunstructor somehow before the constructor is finished).

Categories