java.util.concurrent vs. Boost Threads library

java.util.concurrent vs. Boost Threads library - java

How do the Boost Thread libraries compare against the java.util.concurrent libraries?
Performance is critical and so I would prefer to stay with C++ (although Java is a lot faster these days). Given that I have to code in C++, what libraries exist to make threading easy and less error prone.
I have heard recently that as of JDK 1.5, the Java memory model was changed to fix some concurrency issues. How about C++? The last time I did multithreaded programming in C++ was 3-4 years ago when I used pthreads. Although, I don't wish to use that anymore for a large project. The only other alternative that I know of is Boost Threads. However, I am not sure if it is good. I've heard good things about java.util.concurrent, but nothing yet about Boost threads.

java.util.concurrent and boost threads library have an overlapping functionality, but java.util.concurrent also provide a) higher-level abstractions and b) also lower level functions.
Boost threads provide:
Thread (Java: java.util.Thread)
Locking (Java: java.lang.Object and java.util.concurrent.locks)
Condition Variables (Java. java.lang.Object and java.util.concurrent)
Barrier (Java: Barrier)
java.util.concurrent has also:
Semaphores
Reader-writer locks
Concurrent data structures, e.g. a BlockingQueue or a concurrent lock-free hash map.
the Executor services as a highly flexible consumer producer system.
Atomic operations.
A side note: C++ has currently no memory model. On a different machine the same C++ application may have to deal with a different memory model. This makes portable, concurrent programming in C++ even more tricky.

Boost threads are a lot easier to use than pthreads, and, in my opinion, slightly easier to use than Java threads. When a boost thread object is instantiated, it launches a new thread. The user supplies a function or function object which will run in that new thread.
It's really as simple as:
boost::thread* thr = new boost::thread(MyFunc());
thr->join();
You can easily pass data to the thread by storing values inside the function object. And in the latest version of boost, you can pass a variable number of arguments to the thread constructor itself, which will then be passed to your function object's () operator.
You can also use RAII-style locks with boost::mutex for synchronization.
Note that C++0x will use the same syntax for std::thread.

Performance wise I wouldn't really worry. It is my gut feeling that a boost/c++ expert could write faster code than a java expert. But any advantages would have to fought for.
I prefer Boost's design paradigms to Java's. Java is OO all the way, where Boost/C++ allows for OO if you like but uses the most useful paradigm for the problem at hand. In particular I love RAII when dealing with locks. Java handles memory management beautifully, but sometimes it feels like the rest of the programmers' resources get shafted: file handles, mutexes, DB, sockets, etc.
Java's concurrent library is more extensive than Boost's. Thread pools, concurrent containers, atomics, etc. But the core primitives are on par with each other, threads, mutexes, condition variables.
So for performance I'd say it's a wash. If you need lots of high level concurrent library support Java wins. If you prefer paradigm freedom C++.

If performance is an issue in your multithreaded program, then you should consider a lock-free design.
Lock-free means threads do not compete for a shared resource and that minimizes switching costs. In that department, Java has a better story IMHO, with its concurrent collections. You can rapidly come up with a lock-free solution.
For having used the Boost Thread lib a bit (but not extensively), I can say that your thinking will be influenced by what's available, and that means essentially a locking solution.
Writing a lock-free C++ solution is very difficult, because of lack of library support and also conceptually because it's missing a memory model that guarantees you can write truly immutable objects.
this book is a must read: Java Concurrency in Practice

If you're targeting a specific platform then the direct OS call will probably be a little faster than using boost for C++. I would tend to use ACE, since you can generally make the right calls for your main platform and it will still be platform-independent. Java should be about the same speed so long as you can guarantee that it will be running on a recent version.

In C++ one can directly use pthreads (pthread_create() etc) if one wanted to. Internally Java uses pthreads via its run-time environment. Do "ldd " to see.

Related

Why define the Java memory model?

Java's multithreaded code is finally mapped to the operating system thread for execution.
Is the operating system thread not thread safe?
Why use the Java memory model to ensure thread safety?Why define the Java memory model?
I hope someone can answer this question, I have looked up a lot of information on the Internet, still do not understand!
The material on the web is all about atomicity, visibility, orderliness, and using the cache consistency model as an example, but I don't think it really answers the question.
thank you very much!

The operating system thread is not thread safe (that statement does not make a lot of sense, but basically, the operating system does not ensure that the intended atomicity of your code is respected).
The problem is that whether two data items are related and therefore need to be synchronized is only really understood by your application.
For example, imagine you are defining a ListOfIntegers class which contains an int array and count of the number of items used in the array. These two data items are related and the way they are updated needs to be co-ordinated in order to ensure that if the object is accessed by two different threads they are always updated in a consistent manner, even if the threads update them simultaneously. Only your application knows how these data items are related. The operating system doesn't know. They are just two pieces of memory as far as it is concerned. That is why you have to implement the thread safety (by using synchronized or carefully arranging how the fields are updated).
The Java "memory model" is pretty close to the hardware model. There is a stack for primitives and objects are allocated on the heap. Synchronization is provided to allow the programmer to lock access to shared data on the heap. In addition, there are rules that the optimizer must follow so that the opimisations don't defeat the synchronisations put in place.

Every programing language that takes concurrency seriously needs a memory model - and here is why.
The memory model is the crux of the concurrency semantics of shared-memory systems. It defines the possible values that a read operation is allowed to return for any given set of write operations performed by a concurrent program, thereby defining the basic semantics of shared variables. In other words, the memory model specifies the set of allowed outputs of a program's read and write operations, and constrains an implementation to produce only (but at least one) such allowed executions. The memory model may and often does allow executions where the outcome cannot be inferred from the order in which read and write operations occur in the program. It is impossible to meaningfully reason about a program or any part of the programming language implementation without an unambiguous memory model. The memory model defines the possible outcomes of a concurrent programs read and write operations. Conversely, the memory model also defines which instruction reorderings may be permitted, either by the processor, the memory system, or the compiler.
This is an excerpt from the paper Memory Models for C/C++ Programmers which I have co-authored. Even though a large part of it is dedicated to the C++ memory model, it also covers more general areas -
starting with the reason why we need a memory model in the first place, explaining the (intuitive) sequential consistent model, and finally the weaker memory models provided by current hardware in x86 and ARM/POWER CPUs.

The Java Memory Model answers the following question: What shall happen when multiple threads modify the same memory location.
And the answer the memory model gives is:
If a program has no data races, then all executions of the program will appear to be sequentially consistent.
There is a great paper by Sarita V. Adve, Hans-J. Boehm about why the Java and C++ Memory Model are designed the way they are: Memory Models: A Case For Rethinking Parallel Languages and Hardware
From the paper:
We have been repeatedly surprised at how difficult it is to formalize the seemingly simple and fundamental property of “what value a read should return in a multithreaded program.”
Memory models, which describe the semantics of shared variables, are crucial to both correct multithreaded applications and the entire underlying implementation stack. It is difficult to teach multithreaded programming without clarity on memory models.
After much prior confusion, major programming languages are converging on a model that guarantees simple interleaving-based semantics for "data-race-free" programs and most hardware vendors have committed to support this model.

Biased locking in java

I keep reading about how biased locking, using the flag -XX:+UseBiasedLocking, can improve the performance of un-contended synchronization. I couldn't find a reference to what it does and how it improves the performance.
Can anyone explain me what exactly it is or may be point me to some links/resources that explains??

Essentially, if your objects are locked only by one thread, the JVM can make an optimization and "bias" that object to that thread in such a way that subsequent atomic operations on the object incurs no synchronization cost. I suppose this is typically geared towards overly conservative code that performs locks on objects without ever exposing them to another thread. The actual synchronization overhead will only kick in once another thread tries to obtain a lock on the object.
It is on by default in Java 6.
-XX:+UseBiasedLocking
Enables a technique for improving the performance of uncontended synchronization. An object is "biased" toward the thread which first acquires its monitor via a monitorenter bytecode or synchronized method invocation; subsequent monitor-related operations performed by that thread are relatively much faster on multiprocessor machines. Some applications with significant amounts of uncontended synchronization may attain significant speedups with this flag enabled; some applications with certain patterns of locking may see slowdowns, though attempts have been made to minimize the negative impact.

Does this not answer your questions?
http://www.oracle.com/technetwork/java/tuning-139912.html#section4.2.5
Enables a technique for improving the performance of uncontended
synchronization. An object is "biased" toward the thread which first
acquires its monitor via a monitorenter bytecode or synchronized
method invocation; subsequent monitor-related operations performed by
that thread are relatively much faster on multiprocessor machines.
Some applications with significant amounts of uncontended
synchronization may attain significant speedups with this flag
enabled; some applications with certain patterns of locking may see
slowdowns, though attempts have been made to minimize the negative
impact.
Though I think you'll find it's on by default in 1.6. Use the PrintFlagsFinal diagnostic option to see what the effective flags are. Make sure you specify -server if you're investigating for a server application because the flags can differ:
http://www.jroller.com/ethdsy/entry/print_all_jvm_flags

I've been wondering about biased locks myself.
However it seems that java's biased locks are slower on intel's nehalem processors than normal locks, and presumably on the two generations of processors since nehalem. See http://mechanical-sympathy.blogspot.com/2011/11/java-lock-implementations.html
and here http://www.azulsystems.com/blog/cliff/2011-11-16-a-short-conversation-on-biased-locking
Also more information here https://blogs.oracle.com/dave/entry/biased_locking_in_hotspot
I've been hoping that there is some relatively cheap way to revoke a biased lock on intel, but I'm beginning to believe that isn't possible. The articles I've seen on how it's done rely on either:
using the os to stop the thread
sending a signal, ie running code in the other thread
having safe points that are guaranteed to run fairly often in the other thread and waiting for one to be executed (which is what java does).
having similar safe points that are a call to a return - and the other thread MODIFIES THE CODE to a breakpoint...

Worth mentioning that biased locking will be disabled by default jdk15 onwards
JEP 374 : Disable and Deprecate Biased Locking
The performance gains seen in the past are far less evident today. Many applications that benefited from biased locking are older, legacy applications that use the early Java collection APIs, which synchronize on every access (e.g., Hashtable and Vector). Newer applications generally use the non-synchronized collections (e.g., HashMap and ArrayList), introduced in Java 1.2 for single-threaded scenarios, or the even more-performant concurrent data structures, introduced in Java 5, for multi-threaded scenarios.
Further
Biased locking introduced a lot of complex code into the synchronization subsystem and is invasive to other HotSpot components as well. This complexity is a barrier to understanding various parts of the code and an impediment to making significant design changes within the synchronization subsystem. To that end we would like to disable, deprecate, and eventually remove support for biased locking.
And ya, no more System.identityHashCode(o) magic ;)

Two paper here:
https://cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/f4a5b21d-66fa-4885-92bf-c4e81c06d916/File/ccd39237cd4dc109d91786762fba41f0/qrl_oplocks_biasedlocking.pdf
https://www.oracle.com/technetwork/java/biasedlocking-oopsla2006-wp-149958.pdf
web page too:
https://blogs.oracle.com/dave/biased-locking-in-hotspot
jvm hotspot source code:
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/oops/markOop.hpp

Multi-Core/Concurrent Programming and .NET/Java

I often hear about other languages promoted as being more suitable for multi-core/concurrent programming e.g. Clojure, Scala, Erlang etc. but I'm a little confused about why I need to worry about multiple cores, shouldn't the Java/.NET VM handle that automatically and if not, what are the reasons behind it?
Is it because those languages mentioned are functional and have some intrinsic advantage over non-functional languages?

The reason you need to care is that processors are generally speaking not getting any faster. Instead more of them are being added to computers in the form of additional cores. In order for a program to take advantage of the extra processors it generally speaking must use multiple threads to do so. Java, and most other languages, will not automatically use more threads in your program. This is something you need to manually do.
Some people prefer functional style languages like Scala, Erlang and F# to non-functional for multi-threaded programming. Functional languages tend to be at least shallowly immutable by default and hence in theory are easier to work with in multi-threaded situations.
Here is a very good article on this subject
The Free Lunch is Over

Functional languages have the intrinsic advantage that most functions/methods call are idempotent and that they typically use "immutability" a lot.
Doing concurrency when using immutable "stuff" is wwaayy easier than when using mutable "stuff".
But anyway: you need to worry about concurrency because we're going to CPU with more and more cores and not faster and faster clocks. Java / C# do not automagically make your program concurrents: you still have to do the hard work yourself.

In the case of normal imperative languages, by default no, you do not get much help from the platform. No compiler is clever enough to parallelize normal imperative code.
There are however various help libraries that can help you on different platforms. For example the Task Parallel library in .NET allows you to do this for example:
Parallel.ForEach(files, file =>
{
Process(file);
});
Pure functional languages always have the benefit of minimal shared state which means that such code is more easily parallelized.

I never did program functional languages, but they are supposed to be easier for coding concurrently since the states do not change (or do not change much), so that you can have the same object on two threads at once. If you have the same object on two threads at once in Java, you have to be very careful and use some special synchronization constructs such that the two threads can "see" the object in the same state when it is necessary.
The current problem in programming is that everybody is used to doing things sequentially, but the hardware is moving multi-core. There has to be a complete paradigm shift in the way we code and think about code. Right now, coding for multi-core in Java or C# is basically just sequential coding with hacks to make it parallelizable. Will functional programming turn out to be the required paradigm shift? I don't know.

Does the CLR perform "lock elision" optimization? If not why not?

The JVM performs a neat trick called lock elision to avoid the cost of locking on objects that are only visible to one thread.
There's a good description of the trick here:
http://www.ibm.com/developerworks/java/library/j-jtp10185/
Does the .Net CLR do something similar? If not then why not?

It's neat, but is it useful? I have a hard time coming up with an example where the compiler can prove that a lock is thread local. Almost all classes don't use locking by default, and when you choose one that locks, then in most cases it will be referenced from some kind of static variable foiling the compiler optimization anyways.
Another thing is that the java vm uses escape analysis in its proof. And AFAIK .net hasn't implemented escape analysis. Other uses of escape analysis such as replacing heap allocations with stack allocations sound much more useful and should be implemented first.
IMO it's currently not worth the coding effort. There are many areas in the .net VM which are not optimized very well and have much bigger impact.
SSE vector instructions and delegate inlining are two examples from which my code would profit much more than from this optimization.

EDIT: As chibacity points out below, this is talking about making locks really cheap rather than completely eliminating them. I don't believe the JIT has the concept of "thread-local objects" although I could be mistaken... and even if it doesn't now, it might in the future of course.
EDIT: Okay, the below explanation is over-simplified, but has at least some basis in reality :) See Joe Duffy's blog post for some rather more detailed information.
I can't remember where I read this - probably "CLR via C#" or "Concurrent Programming on Windows" - but I believe that the CLR allocates sync blocks to objects lazily, only when required. When an object whose monitor has never been contested is locked, the object header is atomically updated with a compare-exchange operation to say "I'm locked". If a different thread then tries to acquire the lock, the CLR will be able to determine that it's already locked, and basically upgrade that lock to a "full" one, allocating it a sync block.
When an object has a "full" lock, locking operations are more expensive than locking and unlocking an otherwise-uncontested object.
If I'm right about this (and it's a pretty hazy memory) it should be feasible to lock and unlock a monitor on different threads cheaply, so long as the locks never overlap (i.e. there's no contention).
I'll see if I can dig up some evidence for this...

In answer to your question: No, the CLR\JIT does not perform "lock elision" optimization i.e. the CLR\JIT does not remove locks from code which is only visible to single threads. This can easily be confirmed with simple single threaded benchmarks on code where lock elision should apply as you would expect in Java.
There are likely to be a number of reasons why it does not do this, but chiefly is the fact that in the .Net framework this is likely to be an uncommonly applied optimization, so is not worth the effort of implementing.
Also in .Net uncontended locks are extremely fast due to the fact they are non-blocking and executed in user space (JVMs appear to have similar optimizations for uncontended locks e.g. IBM). To quote from C# 3.0 In A Nutshell's threading chapter
Locking is fast: you can expect to acquire and release a lock in less
than 100 nanoseconds on a 3 GHz computer if the lock is uncontended"
A couple of example scenarios where lock elision could be applied, and why it's not worth it:
Using locks within a method in your own code that acts purely on locals
There is not really a good reason to use locking in this scenario in the first place, so unlike optimizations such as hoisting loop invariants or method inling, this is a pretty uncommon case and the result of unnecessary use of locks. The runtime shouldn't be concerned with optimizing out uncommon and extreme bad usage.
Using someone else's type that is declared as a local which uses locks internally
Although this sounds more useful, the .Net framework's general design philosophy is to leave the responsibility for locking to clients, so it's rare that types have any internal lock usage. Indeed, the .Net framework is pathologically unsynchronized when it comes to instance methods on types that are not specifically designed and advertized to be concurrent. On the other hand, Java has common types that do include synchronization e.g. StringBuffer and Vector. As the .Net BCL is largely unsynchronized, lock elision is likely to have little effect.
Summary
I think overall, there are fewer cases in .Net where lock elision would kick in, because there are simply not as many places where there would be thread-local locks. Locks are far more likely to be used in places which are visible to multiple threads and therefore should not be elided. Also, uncontended locking is extremely quick.
I had difficulty finding any real-world evidence that lock elision actually provides that much of a performance benefit in Java (for example...), and the latest docs for at least the Oracle JVM state that elision is not always applied for thread local locks, hinting that it is not an always given optimization anyway.
I suspect that lock elision is something that is made possible through the introduction of escape analysis in the JVM, but is not as important for performance as EA's ability to analyze whether reference types can be allocated on the stack.

Java - use of volatile only makes sense in multiprocessor systems?

Use of volatile only makes sense in multiprocessor systems. is this wrong?
i'm trying to learn about thread programming, so if you know any good articles/pdfs ... i like stuff that mentions a bit about how the operating system works as well not just the language's syntax.

No. Volatile can be used in multi-threaded applications. These may or may not run on more than one processor.

volatile is used to ensure all thread see the same copy of the data. If there is only one thread reading/writing to a field, it doesn't need to be volatile. It will work just fine, just be a bit slower.
In Java you don't have much visibility as to the processor architecture, generally you talk in terms of threads and multi-threading.
I suggest Java Concurrency in Practice, it good whatever your level of knowledge, http://www.javaconcurrencyinpractice.com/
The whole point of using Java is you don't need to know most of the details of how threads work etc. If you learn lots of stuff you don't use you are likely to forget it. ;)

Volatile makes sense in a multithreaded programm, running these threads on a single processor or multiple processors does not make a difference. The volatile keyword is used to tell the JVM (and the Java Memory Model it uses) that it should not re-order or cache the value of a variable with this keyword. This guarantees that Threads using a volatile variable will never see a stale version of that variable.
See this link on the Java Memory Model in general for more details. Or this one for information about Volatile.

No. Volatile is used to support concurrency. In certain circumstances, it can be used instead of synchronization.
This article by Brian Goetz really helped me understand volatile. It has several examples of the use of volatile, and it explains under what conditions it can be used.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.