Thread usage of Java Parallel Stream Reduce [duplicate]

Thread usage of Java Parallel Stream Reduce [duplicate] - java

In JDK8, how many threads are spawned when i'm using parallelStream? For instance, in the code:
list.parallelStream().forEach(/** Do Something */);
If this list has 100000 items, how many threads will be spawned?
Also, do each of the threads get the same number of items to work on or is it randomly allotted?

The Oracle's implementation[1] of parallel stream uses the current thread and in addition to that, if needed, also the threads that compose the default fork join pool ForkJoinPool.commonPool(), which has a default size equal to one less than the number of cores of your CPU.
That default size of the common pool can be changed with this property:
-Djava.util.concurrent.ForkJoinPool.common.parallelism=8
Alternatively, you can use your own pool:
ForkJoinPool myPool = new ForkJoinPool(8);
myPool.submit(() ->
list.parallelStream().forEach(/* Do Something */);
).get();
Regarding the order, jobs will be executed as soon as a thread is available, in no specific order.
As correctly pointed out by #Holger this is an implementation specific detail (with just one vague reference at the bottom of a document), both approaches will work on Oracle's JVM but are definitely not guaranteed to work on JVMs from other vendors, the property could not exist in a non-Oracle implementation and Streams could not even use a ForkJoinPool internally rendering the alternative based on the behavior of ForkJoinTask.fork completely useless (see here for details on this).

While #uraimo is correct, the answer depends on exactly what "Do Something" does. The parallel.streams API uses the CountedCompleter Class which has some interesting problems. Since the F/J framework does not use a separate object to hold results, long chains may result in an OOME. Also those long chains can sometimes cause a Stack Overflow. The answer to those problems is the use of the Paraquential technique as I pointed out in this article.
The other problem is excessive thread creation when using nested parallel forEach.

Related

Java8 stream().map().reduce() is really map reduce

I saw this code somewhere using stream().map().reduce().
Does this map() function really works parallel? If Yes, then how many maximum number of threads it can initiate for map() function?
What if I use parallelStream() instead of just stream() for the below particular use-case.
Can anyone give me good example of where to NOT use parallelStream()
Below code is just to extract tName from tCode and returns comma separated String.
String ts = atList.stream().map(tcode -> {
return CacheUtil.getTCache().getTInfo(tCode).getTName();
}).reduce((tName1, tName2) -> {
return tName1 + ", " + tName2;
}).get();

this stream().map().reduce() is not parallel, thus a single thread acts on the stream.
you have to add parallel or in other cases parallelStream (depends on the API, but it's the same thing). Using parallel by default you will get number of available processors - 1; but the main thread is used too in the ForkJoinPool#commonPool; thus there will be usually 2, 4, 8 threads etc. To check how many you will get, use:
Runtime.getRuntime().availableProcessors()
You can use a custom pool and get as many threads as you want, as shown here.
Also notice that the entire pipeline is run in parallel, not just the map operation.
There isn't a golden law about when to use and when not to use parallel streams, the best way is to measure. But there are obvious choices, like a stream of 10 elements - this is way too little to have any real benefit from parallelization.

All parallel streams use common fork-join thread pool and if you submit a long-running task, you effectively block all threads in the pool. Consequently you block all other tasks that are using parallel streams.
There are only two options how to make sure that such thing will never happen. The first is to ensure that all tasks submitted to the common fork-join pool will not get stuck and will finish in a reasonable time. But it's easier said than done, especially in complex applications. The other option is to not use parallel streams and wait until Oracle allows us to specify the thread pool to be used for parallel streams.
Use case
Lets say you have a collection (List) which gets loaded with values at the start of application and no new value is added to it at any later point. In above scenario you can use parallel stream without any concerns.
Don't worry stream is efficient and safe.

Mutate Non thread safe collections

Can anyone please explain to me the consequences of mutating a collection in java that is not thread-safe and is being used by multiple threads?

The results are undefined and somewhat random.
With JDK collections that are designed to fail fast, you might receive a ConcurrentModificationException. This is really the only consequence that is specific to thread safety with collections, as opposed to any other class.
Problems that occur generally with thread-unsafe classes may occur:
The internal state of the collection might be corrupted.
The mutation may appear to be successful, but the changes may not, in fact, be visible to other threads at any given time. They might be invisible at first and become visible later.
The changes might actually be successful under light load, but fail randomly under heavy load with lots of threads in contention.
Race conditions might occur, as was mentioned in a comment above.
There are lots of other possibilities, none of them pleasant. Worst of all, these things tend to most commonly reveal themselves in production, when the system is stressed.
In short, you probably don't want to do that.

The most common outcome is it looks like it works, but doesn't work all the time.
This can mean you have a problem which
works on one machine but doesn't on another.
works for a while but something apparently unrelated changes and your program breaks.
whenever you have a bug you don't know if it's a multi-threading issue or not if you are not using thread safe data structures.
What can happen is;
you rarely/randomly get an error and strange behaviour
your code goes into an infinite loop and stops working (HashMap used to do this)
The only option is to;
limit the amount of state which is shared between threads, ideally none at all.
be very careful about how data is updated.
don't rely on unit tests, you have to understand what the code doing and be confident it will be behave correctly in all possible situations.

The invariants of the data structure will not be guaranteed.
For example:
If thread 2 does a read whilst thread 1 is adding to the DS thread 1 may consider this element added while thread 2 doesn't see that the element has been added yet.
There are plenty of data structures that aren't thread-safe that will still appear to function(i.e. not throw) in a multi threaded environment and they might even perform correctly under certain circumstances(like if you aren't doing any writes to the data structure).
To fully understand this topic exploring the different classes of bugs that occur in concurrent systems is recommended: this short document seems like a good start.
http://pages.cs.wisc.edu/~remzi/OSTEP/threads-bugs.pdf

Is java concurrent package implemented using locks?

Conceptually,
Mutex
Reader's/Writer lock (Better form of Mutex)
Semaphore
Condition Variable
are used as four major synchronization mechanisms, which are purely lock based. Different programming language have different terms/jargon for these 4 mechanisms. POSIX pthread package is one such example for such implementation.
First two get implemented using spin lock(Busy-wait).
Last two get implemented using sleep lock.
Lock based synchronisation is expensive in terms of cpu cycles.
But, I learnt that java.util.concurrent packages do not use lock(sleep/spin) based mechanism to implement synchronisation.
My question:
What is the mechanism used by java concurrent package to implement synchronization? Because spin lock is cpu intensive and sleep lock is more costlier than spin lock due to frequent context switch.

That very much depends on what parts of the java.util.concurrent package you use (and to a lesser degree on the implementation). E.g. the LinkedBlockingQueue as of Java 1.7 uses both ReentrantLocks and Conditions, while e.g. the java.util.concurrent.atomic classes or the CopyOnWrite* classes rely on volatiles + native methods (that insert the appropriate memory barriers).
The actual native implementation of Locks, Semaphores, etc. also varies between architectures and implementations.
Edit: If you really care about performance, you should measure performance of your specific workload. There are folks far more clever than me like A. Shipilev (whose site is a trove of information on this topic) on the JVM team, who do this and care deeply about JVM performance.

This question is best answered by looking at the source code for java.util.concurrent. The precise implementation depends on the class you are referring to.
For example, many of the implementations make use of volatile data and sun.misc.Unsafe, which defers e.g. compare-and-swap to native operations. Semaphore (via AbstractQueuedSynchronizer) makes heavy use of this.
You can browse through the other objects there (use the navigation pane on the left of that site) to take a look at the other synchronization objects and how they are implemented.

The short answer is no.
Concurrent collections are not implemented with locks compared to synchronized collections.
I myself had the exact same issue as what is asked, wanted to always understand the details. What helped me ultimately to fully understand what's going on under the hood was to read the following chapter in java concurrency in practice:
5.1 Synchronized collections
5.2 Concurrent collections
The idea is based on doing atomic operations, which basically requires no lock, since they are atomic.

The OP's question and the comment exchanges appear to contain quite a bit of confusion. I will avoid answering the literal questions and instead try to give an overview.
Why does java.util.concurrent become today's recommended practice?
Because it encourages good application coding patterns. The potential performance gain (which may or may not materialize) is a bonus, but even if there is no performance gain, java.util.concurrent is still recommended because it helps people write correct code. Code that is fast but is flawed has no value.
How does java.util.concurrent encourage good coding patterns?
In many ways. I will just list a few.
(Disclaimer: I come from a C# background and do not have comprehensive knowledge of Java's concurrent package; though a lot of similarities exist between the Java and C# counterparts.)
Concurrent data collections simplifies code.
Often, we use locking when we need to access and modify a data structure from different threads.
A typical operation involves:
Lock (blocked until succeed),
Read and write values,
Unlock.
Concurrent data collections simplify this by rolling all these operations into a single function call. The result is:
Simpler code on the caller's side,
Possibly more optimized, because the library implementation can possibly use a different (and more efficient) locking or lock-free mechanism than the JVM object monitor.
Avoids a common pitfall of race condition: Time of check to time of use.
Two broad categories of concurrent data collection classes
There are two flavors of concurrent data collection classes. They are designed for very different application needs. To benefit from the "good coding patterns", you must know which one to use given each situation.
Non-blocking concurrent data collections
These classes can guarantee a response (returning from a method call) in a deterministic amount of time - whether the operation succeeds or fails. It never deadlocks or wait forever.
Blocking concurrent data collections
These classes make use of JVM and OS synchronization features to link together data operations with thread control.
As you have mentioned, they use sleep locks. If a blocking operation on a blocking concurrent data collection is not satisfied immediately, the thread requesting this operation goes into sleep, and will be waken up when the operation is satisfied.
There is also a hybrid: blocking concurrent data collections that allow one to do a quick (non-blocking) check to see if the operation might succeed. This quick check can suffer from the "Time of check to time of use" race condition, but if used correctly it can be useful to some algorithms.
Before the java.util.concurrent package becomes available, programmers often had to code their own poor-man's alternatives. Very often, these poor alternatives have hidden bugs.
Besides data collections?
Callable, Future, and Executor are very useful for concurrent processing. One could say that these patterns offer something remarkably different from the imperative programming paradigm.
Instead of specifying the exact order of execution of a number of tasks, the application can now:
Callable allows packaging "units of work" with the data that will be worked on,
Future provides a way for different units of work to express their order dependencies - which work unit must be completed ahead of another work unit, etc.
In other words, if two different Callable instances don't indicate any order dependencies, then they can potentially be executed simultaneously, if the machine is capable of parallel execution.
Executor specifies the policies (constraints) and strategies on how these units of work will be executed.
One big thing which was reportedly missing from the original java.util.concurrent is the ability to schedule a new Callable upon the successful completion of a Future when it is submitted to an Executor. There are proposals calling for a ListenableFuture.
(In C#, the similar unit-of-work composability is known as Task.WhenAll and Task.WhenAny. Together they make it possible to express many well-known multi-threading execution patterns without having to explicitly create and destroy threads with own code.)

Queue implementation with blocked 'take()' but with eviction policy

Is there an implementation with a blocking queue for take but bounded by a maximum size. When the size of the queue reaches a given max-size, instead of blocking 'put', it will remove the head element and insert it. So put is not blocked() but take() is.
One usage is that if I have a very slow consumer, the system will not crash ( runs out of memory ) rather these message will be removed but I do not want to block the producer.
An example of this would stock trading system. When you get a spike in stock trade/quote data, if you haven't consumed data, you want to automatically throw away old stock trade/quote.

There currently isnt in Java a thread-safe queue that will do what you are looking for. However, there is a BlockingDequeue (Double Ended Queue) that you can write a wrapper in which you can take from the head and and tail as you see freely.
This class, similar to a BlockingQueue, is thread safe.

Several strategies are provided in ThreadPoolExecutor. Search for "AbortPolicy" in this javadoc . You can also implement your own policy if you want. Perhaps Discard is similar to what you want. Personally I think CallerRuns is what you want in most cases.
I think using these is a better solution, but if you absolutely want to implement it at the queue, I'd probably do it by composition. Perhaps use a LinkedList or something and wrap it with synchronize keyword.
EDIT:(some clarifications..)
"Executor" is basically a thread pool combined with a blocking queue. It is the recommended way to implement a producer/consumer pattern in java. The authors of these libraries provides several strategies to cope with issues like you mentioned. If you are interested, here is another approach to specifically address the OOME issue (the source is framework specific and can't be used as is).

Java concurrency - Should block or yield?

I have multiple threads each one with its own private concurrent queue and all they do is run an infinite loop retrieving messages from it. It could happen that one of the queues doesn't receive messages for a period of time (maybe a couple seconds), and also they could come in big bursts and fast processing is necessary.
I would like to know what would be the most appropriate to do in the first case: use a blocking queue and block the thread until I have more input or do a Thread.yield()?
I want to have as much CPU resources available as possible at a given time, as the number of concurrent threads may increase with time, but also I don't want the message processing to fall behind, as there is no guarantee of when the thread will be reescheduled for execution when doing a yield(). I know that hardware, operating system and other factors play an important role here, but setting that aside and looking at it from a Java (JVM?) point of view, what would be the most optimal?

Always just block on the queues. Java yields in the queues internally.
In other words: You cannot get any performance benefit in the other threads if you yield in one of them rather than just block.

You certainly want to use a blocking queue - they are designed for exactly this purpose (you want your threads to not use CPU time when there is no work to do).
Thread.yield() is an extremely temperamental beast - the scheduler plays a large role in exactly what it does; and one simple but valid implementation is to simply do nothing.

Alternatively, consider converting your implementation to use one of the managed ExecutorService implementations - probably ThreadPoolExecutor.
This may not be appropriate for your use case, but if it is, it removes the whole burden of worrying about thread management from your own code - and these questions about yielding or not simply vanish.
In addition, if better thread management algorithms emerge in future - for example, something akin to Apple's Grand Central Dispatch - you may be able to convert your application to use it with almost no effort.

Another thing that you could do is use the concurrent hash map for your queue. When you do a read it gives you a reference of the object you were looking for, so it is possible you my miss a message that was just put into the queue. But if all this is doing is listening for a message you will catch it the next iteration. It would be different if the messages could be updated by other threads. But there doesn't really seem to be a reason to block that I can see.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.