Multi-Core/Concurrent Programming and .NET/Java

Multi-Core/Concurrent Programming and .NET/Java - java

I often hear about other languages promoted as being more suitable for multi-core/concurrent programming e.g. Clojure, Scala, Erlang etc. but I'm a little confused about why I need to worry about multiple cores, shouldn't the Java/.NET VM handle that automatically and if not, what are the reasons behind it?
Is it because those languages mentioned are functional and have some intrinsic advantage over non-functional languages?

The reason you need to care is that processors are generally speaking not getting any faster. Instead more of them are being added to computers in the form of additional cores. In order for a program to take advantage of the extra processors it generally speaking must use multiple threads to do so. Java, and most other languages, will not automatically use more threads in your program. This is something you need to manually do.
Some people prefer functional style languages like Scala, Erlang and F# to non-functional for multi-threaded programming. Functional languages tend to be at least shallowly immutable by default and hence in theory are easier to work with in multi-threaded situations.
Here is a very good article on this subject
The Free Lunch is Over

Functional languages have the intrinsic advantage that most functions/methods call are idempotent and that they typically use "immutability" a lot.
Doing concurrency when using immutable "stuff" is wwaayy easier than when using mutable "stuff".
But anyway: you need to worry about concurrency because we're going to CPU with more and more cores and not faster and faster clocks. Java / C# do not automagically make your program concurrents: you still have to do the hard work yourself.

In the case of normal imperative languages, by default no, you do not get much help from the platform. No compiler is clever enough to parallelize normal imperative code.
There are however various help libraries that can help you on different platforms. For example the Task Parallel library in .NET allows you to do this for example:
Parallel.ForEach(files, file =>
{
Process(file);
});
Pure functional languages always have the benefit of minimal shared state which means that such code is more easily parallelized.

I never did program functional languages, but they are supposed to be easier for coding concurrently since the states do not change (or do not change much), so that you can have the same object on two threads at once. If you have the same object on two threads at once in Java, you have to be very careful and use some special synchronization constructs such that the two threads can "see" the object in the same state when it is necessary.
The current problem in programming is that everybody is used to doing things sequentially, but the hardware is moving multi-core. There has to be a complete paradigm shift in the way we code and think about code. Right now, coding for multi-core in Java or C# is basically just sequential coding with hacks to make it parallelizable. Will functional programming turn out to be the required paradigm shift? I don't know.

Related

Why define the Java memory model?

Java's multithreaded code is finally mapped to the operating system thread for execution.
Is the operating system thread not thread safe?
Why use the Java memory model to ensure thread safety?Why define the Java memory model?
I hope someone can answer this question, I have looked up a lot of information on the Internet, still do not understand!
The material on the web is all about atomicity, visibility, orderliness, and using the cache consistency model as an example, but I don't think it really answers the question.
thank you very much!

The operating system thread is not thread safe (that statement does not make a lot of sense, but basically, the operating system does not ensure that the intended atomicity of your code is respected).
The problem is that whether two data items are related and therefore need to be synchronized is only really understood by your application.
For example, imagine you are defining a ListOfIntegers class which contains an int array and count of the number of items used in the array. These two data items are related and the way they are updated needs to be co-ordinated in order to ensure that if the object is accessed by two different threads they are always updated in a consistent manner, even if the threads update them simultaneously. Only your application knows how these data items are related. The operating system doesn't know. They are just two pieces of memory as far as it is concerned. That is why you have to implement the thread safety (by using synchronized or carefully arranging how the fields are updated).
The Java "memory model" is pretty close to the hardware model. There is a stack for primitives and objects are allocated on the heap. Synchronization is provided to allow the programmer to lock access to shared data on the heap. In addition, there are rules that the optimizer must follow so that the opimisations don't defeat the synchronisations put in place.

Every programing language that takes concurrency seriously needs a memory model - and here is why.
The memory model is the crux of the concurrency semantics of shared-memory systems. It defines the possible values that a read operation is allowed to return for any given set of write operations performed by a concurrent program, thereby defining the basic semantics of shared variables. In other words, the memory model specifies the set of allowed outputs of a program's read and write operations, and constrains an implementation to produce only (but at least one) such allowed executions. The memory model may and often does allow executions where the outcome cannot be inferred from the order in which read and write operations occur in the program. It is impossible to meaningfully reason about a program or any part of the programming language implementation without an unambiguous memory model. The memory model defines the possible outcomes of a concurrent programs read and write operations. Conversely, the memory model also defines which instruction reorderings may be permitted, either by the processor, the memory system, or the compiler.
This is an excerpt from the paper Memory Models for C/C++ Programmers which I have co-authored. Even though a large part of it is dedicated to the C++ memory model, it also covers more general areas -
starting with the reason why we need a memory model in the first place, explaining the (intuitive) sequential consistent model, and finally the weaker memory models provided by current hardware in x86 and ARM/POWER CPUs.

The Java Memory Model answers the following question: What shall happen when multiple threads modify the same memory location.
And the answer the memory model gives is:
If a program has no data races, then all executions of the program will appear to be sequentially consistent.
There is a great paper by Sarita V. Adve, Hans-J. Boehm about why the Java and C++ Memory Model are designed the way they are: Memory Models: A Case For Rethinking Parallel Languages and Hardware
From the paper:
We have been repeatedly surprised at how difficult it is to formalize the seemingly simple and fundamental property of “what value a read should return in a multithreaded program.”
Memory models, which describe the semantics of shared variables, are crucial to both correct multithreaded applications and the entire underlying implementation stack. It is difficult to teach multithreaded programming without clarity on memory models.
After much prior confusion, major programming languages are converging on a model that guarantees simple interleaving-based semantics for "data-race-free" programs and most hardware vendors have committed to support this model.

What are memory fences used for in Java?

Whilst trying to understand how SubmissionPublisher (source code in OpenJDK 10, Javadoc), a new class added to the Java SE in version 9, has been implemented, I stumbled across a few API calls to VarHandle I wasn't previously aware of:
fullFence, acquireFence, releaseFence, loadLoadFence and storeStoreFence.
After doing some research, especially regarding the concept of memory barriers/fences (I have heard of them previously, yes; but never used them, thus was quite unfamiliar with their semantics), I think I have a basic understanding of what they are for. Nonetheless, as my questions might arise from a misconception, I want to ensure that I got it right in the first place:
Memory barriers are reordering constraints regarding reading and writing operations.
Memory barriers can be categorized into two main categories: unidirectional and bidirectional memory barriers, depending on whether they set constraints on either reads or writes or both.
C++ supports a variety of memory barriers, however, these do not match up with those provided by VarHandle. However, some of the memory barriers available in VarHandle provide ordering effects that are compatible to their corresponding C++ memory barriers.
#fullFence is compatible to atomic_thread_fence(memory_order_seq_cst)
#acquireFence is compatible to atomic_thread_fence(memory_order_acquire)
#releaseFence is compatible to atomic_thread_fence(memory_order_release)
#loadLoadFence and #storeStoreFence have no compatible C++ counter part
The word compatible seems to really important here since the semantics clearly differ when it comes to the details. For instance, all C++ barriers are bidirectional, whereas Java's barriers aren't (necessarily).
Most memory barriers also have synchronization effects. Those especially depend upon the used barrier type and previously-executed barrier instructions in other threads. As the full implications a barrier instruction has is hardware-specific, I'll stick with the higher-level (C++) barriers. In C++, for instance, changes made prior to a release barrier instruction are visible to a thread executing an acquire barrier instruction.
Are my assumptions correct? If so, my resulting questions are:
Do the memory barriers available in VarHandle cause any kind of memory synchronization?
Regardless of whether they cause memory synchronization or not, what may reordering constraints be useful for in Java? The Java Memory Model already gives some very strong guarantees regarding ordering when volatile fields, locks or VarHandle operations like #compareAndSet are involved.
In case you're looking for an example: The aforementioned BufferedSubscription, an inner class of SubmissionPublisher (source linked above), established a full fence in line 1079, function growAndAdd. However, it is unclear for me what it is there for.

This is mainly a non-answer, really (initially wanted to make it a comment, but as you can see, it's far too long). It's just that I questioned this myself a lot, did a lot of reading and research and at this point in time I can safely say: this is complicated. I even wrote multiple tests with jcstress to figure out how really they work (while looking at the assembly code generated) and while some of them somehow made sense, the subject in general is by no means easy.
The very first thing you need to understand:
The Java Language Specification (JLS) does not mention barriers, anywhere. This, for java, would be an implementation detail: it really acts in terms of happens before semantics. To be able to proper specify these according to the JMM (Java Memory Model), the JMM would have to change quite a lot.
This is work in progress.
Second, if you really want to scratch the surface here, this is the very first thing to watch. The talk is incredible. My favorite part is when Herb Sutter raises his 5 fingers and says, "This is how many people can really and correctly work with these." That should give you a hint of the complexity involved. Nevertheless, there are some trivial examples that are easy to grasp (like a counter updated by multiple threads that does not care about other memory guarantees, but only cares that it is itself incremented correctly).
Another example is when (in java) you want a volatile flag to control threads to stop/start. You know, the classical:
volatile boolean stop = false; // on thread writes, one thread reads this
If you work with java, you would know that without volatile this code is broken (you can read why double check locking is broken without it for example). But do you also know that for some people that write high performance code this is too much? volatile read/write also guarantees sequential consistency - that has some strong guarantees and some people want a weaker version of this.
A thread safe flag, but not volatile? Yes, exactly: VarHandle::set/getOpaque.
And you would question why someone might need that for example? Not everyone is interested with all the changes that are piggy-backed by a volatile.
Let's see how we will achieve this in java. First of all, such exotic things already existed in the API: AtomicInteger::lazySet. This is unspecified in the Java Memory Model and has no clear definition; still people used it (LMAX, afaik or this for more reading). IMHO, AtomicInteger::lazySet is VarHandle::releaseFence (or VarHandle::storeStoreFence).
Let's try to answer why someone needs these?
JMM has basically two ways to access a field: plain and volatile (which guarantees sequential consistency). All these methods that you mention are there to bring something in-between these two - release/acquire semantics; there are cases, I guess, where people actually need this.
An even more relaxation from release/acquire would be opaque, which I am still trying to fully understand.
Thus bottom line (your understanding is fairly correct, btw): if you plan to use this in java - they have no specification at the moment, do it on you own risk. If you do want to understand them, their C++ equivalent modes are the place to start.

Java parallel stream use or not to use

Is using of parallel stream insdead of executor services in Java considered as bad practice? Why?
As you know myList.parallelStream().map(e -> ...) will use ForkJoinPool.common() under the hood. So if you will use at least two parallel streams at the same time you can face issues when:
map function is blocking. But there is ForkJoinPool.ManagedBlocker as a rescue.
map function can be very CPU intensive which will cause other parallel streams to starve. Is there any way to set priority among RecursiveTasks or between ForkJoinPools?
In other hand you can create as many ForkJoinPools as you want. new ForkJoinPool(4).submit(() -> myList.parallelStream().... Is it considered performance-wise to use multiple ForkJoinPools at one JVM?
Update
Use or not parallel stream = use or not ForkJoinPool, right? I found this and this links pretty useful to asnwer the last question

There isn't a one fits all solution for .parallel(). Joschua Bloch says:
[...] do not even attempt to parallelize a stream pipeline unless you
have good reason to believe that it will preserve the correctness of the computation
and increase its speed. The cost of inappropriately parallelizing a stream can be a
program failure or performance disaster. If you believe that parallelism may be
justified, ensure that your code remains correct when run in parallel, and do careful
performance measurements under realistic conditions. If your code remains correct
and these experiments bear out your suspicion of increased performance, then and
only then parallelize the stream in production code.
-Effective Java 3rd Edition, page 225, Item 28: Use caution when making streams parallel
He recommends you do a thorough benchmark under realistic conditions and decide on a case by case basis. Also, not only can using .parallel() lead to bad performance, it can also lead to safety failures:
Safety failures may result from parallelizing a pipeline that uses
mappers, filters, and other programmer-supplied function objects that fail to adhere
to their specifications.
-Effective Java 3rd Edition, page 224, Item 28: Use caution when making streams parallel
To answer your question, it isn't considered bad practice, but you should use utmost caution when working with .parallel() and not blindly slap it on every stream in your code base.

Here is very interesting talk which shares some wisdom around the question "when to consider using parallel streams?", shared by Brian Goetz himself.
He takes over in the second part of this seminar:
https://www.youtube.com/watch?v=2nup6Oizpcw&t=25m43s
Brian Goetz is the Java Language Architect at Oracle, and was the
specification lead for JSR-335 (Lambda Expressions for the Java
Programming Language.) He is the author of the best-selling Java
Concurrency in Practice, as well as over 75 articles on Java
development, and has been fascinated by programming since Jimmy Carter
was President.

Java - use of volatile only makes sense in multiprocessor systems?

Use of volatile only makes sense in multiprocessor systems. is this wrong?
i'm trying to learn about thread programming, so if you know any good articles/pdfs ... i like stuff that mentions a bit about how the operating system works as well not just the language's syntax.

No. Volatile can be used in multi-threaded applications. These may or may not run on more than one processor.

volatile is used to ensure all thread see the same copy of the data. If there is only one thread reading/writing to a field, it doesn't need to be volatile. It will work just fine, just be a bit slower.
In Java you don't have much visibility as to the processor architecture, generally you talk in terms of threads and multi-threading.
I suggest Java Concurrency in Practice, it good whatever your level of knowledge, http://www.javaconcurrencyinpractice.com/
The whole point of using Java is you don't need to know most of the details of how threads work etc. If you learn lots of stuff you don't use you are likely to forget it. ;)

Volatile makes sense in a multithreaded programm, running these threads on a single processor or multiple processors does not make a difference. The volatile keyword is used to tell the JVM (and the Java Memory Model it uses) that it should not re-order or cache the value of a variable with this keyword. This guarantees that Threads using a volatile variable will never see a stale version of that variable.
See this link on the Java Memory Model in general for more details. Or this one for information about Volatile.

No. Volatile is used to support concurrency. In certain circumstances, it can be used instead of synchronization.
This article by Brian Goetz really helped me understand volatile. It has several examples of the use of volatile, and it explains under what conditions it can be used.

java.util.concurrent vs. Boost Threads library

How do the Boost Thread libraries compare against the java.util.concurrent libraries?
Performance is critical and so I would prefer to stay with C++ (although Java is a lot faster these days). Given that I have to code in C++, what libraries exist to make threading easy and less error prone.
I have heard recently that as of JDK 1.5, the Java memory model was changed to fix some concurrency issues. How about C++? The last time I did multithreaded programming in C++ was 3-4 years ago when I used pthreads. Although, I don't wish to use that anymore for a large project. The only other alternative that I know of is Boost Threads. However, I am not sure if it is good. I've heard good things about java.util.concurrent, but nothing yet about Boost threads.

java.util.concurrent and boost threads library have an overlapping functionality, but java.util.concurrent also provide a) higher-level abstractions and b) also lower level functions.
Boost threads provide:
Thread (Java: java.util.Thread)
Locking (Java: java.lang.Object and java.util.concurrent.locks)
Condition Variables (Java. java.lang.Object and java.util.concurrent)
Barrier (Java: Barrier)
java.util.concurrent has also:
Semaphores
Reader-writer locks
Concurrent data structures, e.g. a BlockingQueue or a concurrent lock-free hash map.
the Executor services as a highly flexible consumer producer system.
Atomic operations.
A side note: C++ has currently no memory model. On a different machine the same C++ application may have to deal with a different memory model. This makes portable, concurrent programming in C++ even more tricky.

Boost threads are a lot easier to use than pthreads, and, in my opinion, slightly easier to use than Java threads. When a boost thread object is instantiated, it launches a new thread. The user supplies a function or function object which will run in that new thread.
It's really as simple as:
boost::thread* thr = new boost::thread(MyFunc());
thr->join();
You can easily pass data to the thread by storing values inside the function object. And in the latest version of boost, you can pass a variable number of arguments to the thread constructor itself, which will then be passed to your function object's () operator.
You can also use RAII-style locks with boost::mutex for synchronization.
Note that C++0x will use the same syntax for std::thread.

Performance wise I wouldn't really worry. It is my gut feeling that a boost/c++ expert could write faster code than a java expert. But any advantages would have to fought for.
I prefer Boost's design paradigms to Java's. Java is OO all the way, where Boost/C++ allows for OO if you like but uses the most useful paradigm for the problem at hand. In particular I love RAII when dealing with locks. Java handles memory management beautifully, but sometimes it feels like the rest of the programmers' resources get shafted: file handles, mutexes, DB, sockets, etc.
Java's concurrent library is more extensive than Boost's. Thread pools, concurrent containers, atomics, etc. But the core primitives are on par with each other, threads, mutexes, condition variables.
So for performance I'd say it's a wash. If you need lots of high level concurrent library support Java wins. If you prefer paradigm freedom C++.

If performance is an issue in your multithreaded program, then you should consider a lock-free design.
Lock-free means threads do not compete for a shared resource and that minimizes switching costs. In that department, Java has a better story IMHO, with its concurrent collections. You can rapidly come up with a lock-free solution.
For having used the Boost Thread lib a bit (but not extensively), I can say that your thinking will be influenced by what's available, and that means essentially a locking solution.
Writing a lock-free C++ solution is very difficult, because of lack of library support and also conceptually because it's missing a memory model that guarantees you can write truly immutable objects.
this book is a must read: Java Concurrency in Practice

If you're targeting a specific platform then the direct OS call will probably be a little faster than using boost for C++. I would tend to use ACE, since you can generally make the right calls for your main platform and it will still be platform-independent. Java should be about the same speed so long as you can guarantee that it will be running on a recent version.

In C++ one can directly use pthreads (pthread_create() etc) if one wanted to. Internally Java uses pthreads via its run-time environment. Do "ldd " to see.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.