Do parallel streams treat upstream iterators in a thread safe way? - java

Today I was using a stream that was performing a parallel() operation after a map, however; the underlying source is an iterator which is not thread safe which is similar to the BufferedReader.lines implementation.
I originally thought that trySplit would be called on the created thread, however; I observed that the accesses to the iterator have come from multiple threads.
By example, the following silly iterator implementation is just setup with enough elements to cause splitting and also keeps track of the unique threads that accessed the hasNext method.
class SillyIterator implements Iterator<String> {
private final ArrayDeque<String> src =
IntStream.range(1, 10000)
.mapToObj(Integer::toString)
.collect(toCollection(ArrayDeque::new));
private Map<String, String> ts = new ConcurrentHashMap<>();
public Set<String> threads() { return ts.keySet(); }
private String nextRecord = null;
#Override
public boolean hasNext() {
var n = Thread.currentThread().getName();
ts.put(n, n);
if (nextRecord != null) {
return true;
} else {
nextRecord = src.poll();
return nextRecord != null;
}
}
#Override
public String next() {
if (nextRecord != null || hasNext()) {
var rec = nextRecord;
nextRecord = null;
return rec;
}
throw new NoSuchElementException();
}
}
Using this to create a stream as follows:
var iter = new SillyIterator();
StreamSupport
.stream(Spliterators.spliteratorUnknownSize(
iter, Spliterator.ORDERED | Spliterator.NONNULL
), false)
.map(n -> "value = " + n)
.parallel()
.collect(toList());
System.out.println(iter.threads());
This on my system output the two fork join threads as well as the main thread, which kind of scared me.
[ForkJoinPool.commonPool-worker-1, ForkJoinPool.commonPool-worker-2, main]

Thread safety does not necessarily imply being accessed by only one thread. The important aspect is that there is no concurrent access, i.e. no access by more than one thread at the same time. If the access by different threads is temporally ordered and this ordering also ensures the necessary memory visibility, which is the responsibility of the caller, it still is a thread safe usage.
The Spliterator documentation says:
Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition.
The spliterator doesn’t need to be confined to the same thread throughout its lifetime, but there should be a clear handover at the caller’s side ensuring that the old thread stops using it before the new thread starts using it.
But the important takeaway is, the spliterator doesn’t need to be thread safe, hence, the iterator wrapped by a spliterator also doesn’t need to be thread safe.
Note that a typical behavior is splitting and handing over before starting traversal, but since an ordinary Iterator doesn’t support splitting, the wrapping spliterator has to iterate and buffer elements to implement splitting. Therefore, the Iterator experiences traversal by different threads (but one at a time) when the traversal has not been started from the Stream implementation’s perspective.
That said, the lines() implementation of BufferedReader is a bad example which you should not follow. Since it’s centered around a single readLine() call, it would be natural to implement Spliterator directly instead of implementing a more complicated Iterator and have it wrapped via spliteratorUnknownSize(…).
Since your example is likewise centered around a single poll() call, it’s also straight-forward to implement Spliterator directly:
class SillySpliterator extends Spliterators.AbstractSpliterator<String> {
private final ArrayDeque<String> src = IntStream.range(1, 10000)
.mapToObj(Integer::toString).collect(toCollection(ArrayDeque::new));
SillySpliterator() {
super(Long.MAX_VALUE, ORDERED | NONNULL);
}
#Override
public boolean tryAdvance(Consumer<? super String> action) {
String nextRecord = src.poll();
if(nextRecord == null) return false;
action.accept(nextRecord);
return true;
}
}
Depending on your real life case, you may also pass the actual deque size to the constructor and provide the SIZED characteristic.
Then, you may use it like
var result = StreamSupport.stream(new SillySpliterator(), true)
.map(n -> "value = " + n)
.collect(toList());

Related

Why does list.size change when executing java parallel stream?

Consider the following code:
static void statefullParallelLambdaSet() {
Set<Integer> s = new HashSet<>(
Arrays.asList(1, 2, 3, 4, 5, 6)
);
List<Integer> list = new ArrayList<>();
int sum = s.parallelStream().mapToInt(e -> { // pipeline start
if (list.size() <= 3) { // list.size() changes while the pipeline operation is executing.
list.add(e); // mapToInt's lambda expression depends on this value, so it's stateful.
return e;
}
else return 0;
}).sum(); // terminal operation
System.out.println(sum);
}
In the code above, it says that list.size() changes while the pipe operation is running, but I don't understand.
Since list.add(e) is executed at once in multiple threads because it is executed in parallel, is it correct to assume that the value changes each time it is executed?
The reason why the value changes even if it is executed as a serial stream is that there is no order because it is a set, so the number drawn is different each time it is executed...
Am I right?
So the reason this happens is because of what is called race conditions CPU even many threaded ones are running more processes than just your applications processes so it could parse and instruction evaluate it and then have to jump off to do something for the OS and then come back and another parallel process for your application has managed to get past it because the core / hyper-thread has not been stolen from its job.
you can read about race conditions in books like: https://link.springer.com/referenceworkentry/10.1007/978-0-387-09766-4_36
But what you're supposed to do to prevent this is implemented locks on the memory you're altering, in Java you want to look at java.util.concurrent.Locks https://www.baeldung.com/java-concurrent-locks
Note that the problem itself is slightly artificial, because it's not very likely to get a significant performance gain by parallelizing this task.
Issues Explained
Your code accumulates the result by operating via side-effects which is discouraged by the Stream API documentation.
And you've stumbled on the very first bullet point from the link above:
... there are no guarantees as to:
the visibility of those side-effects to other threads;
ArrayList is not a thread-safe Collection, and as a consequence each thread is not guaranteed to observe the same state of the list.
Also, note that map() operation (and all it's flavors) is not intended to perform side-effects and it's function according to the documentation should be stateless:
mapper - a non-interfering, stateless function to apply to each element
In this case, the correct way to incorporate from processing the previous stream elements would be to define a Collector.
For that we would need to define a mutable container which would hold a list
In a nut-shell, Collector can be implemented as concurrent (i.e. optimized for a multithreaded environment, so that all the threads are updating the same mutable container) or non-concurrent (each thread creates its own instance of the mutable container and populates it, then results produces by each thread are getting merged).
In order to implement a concurrent Collector, we need to provide a thread-safe mutable container and specify a characteristic CONCURRENT. If take a look at the implementations of the List interface, you'll find out that the only options that JDK offers are CopyOnWriteArrayList and outdated Verctor.
CopyOnWriteArrayList would be a terrible choice since under the hood it would create a new list with every added element, that's a recipe on how to get an OutOfMemoryError. This Collection is not suitable for frequent updates.
And if we would use a synchronized List it would buy anything in terms of performance, because threads would not be able to operate on this list simultaneously. While one thread is adding an element, the others are blocked. In fact, it would be slower than processing the data sequentially, because synchronization has a cost.
For that reason, Locking suggested in another answer would only allow to get a correct result, but you would not be able to benefit from the parallel execution.
What we can do is create a non-concurrent Collector (i.e. a collector that uses a non-thread-safe container) based on a plain ArrayList (it still would be able to be used with a parallel stream, each thread would act independently on a separate container without locking and running into concurrency-related issues).
Non-concurrent Collector
Firstly, we need to define a custom accumulation type that encapsulates the ArrayList and the sum of consumed elements.
And in order to create a Collector, we need to use static method Collector.of().
Collector:
public static Collector<Integer, ?, IntSumContainer> toParallelIntSumContainer(int limit) {
return Collector.of(
() -> new IntSumContainer(limit),
IntSumContainer::accept,
IntSumContainer::merge
);
}
Custom accumulation type:
public class IntSumContainer implements IntConsumer {
private int sum;
private List<Integer> list = new ArrayList<>();
private final int limit;
public IntSumContainer(int limit) {
this.limit = limit;
}
#Override
public void accept(int value) {
if (list.size() < limit) {
list.add(value);
sum += value;
}
}
public IntSumContainer merge(IntSumContainer other) {
other.list.stream().limit(limit - list.size()).forEach(this::accept); // there couldn't be issues related to concurrent access in the case, hence performing side-effects via forEach is safe
return this;
}
// getters
}
Usage example:
List<Integer> source = List.of(1, 2, 3, 4, 5, 6);
IntSumContainer result = s.parallelStream()
.collect(toIntSumContainer(3));
List<Integer> list = result.getList();
int sum = result.getSum();
System.out.println(list);
System.out.println(sum);
Output:
[1, 2, 3]
6
Concurrent Collector
Since you're using as a stream source a HashSet, which produces an unordered stream, probably it might be not important which elements would be present in the resulting collection and would contribute to the resulting sum. And since you were using a Set, you might be fine with getting a Set as a result as well.
In this case we can make use of the concurrent HashSet which is provided by the JDK in the form of a view over the keys of the ConcurrentHashMap and can be obtained via static method ConcurrentHashMap.newKeySet(). The implementation of ConcurrentHashMap is lock-free.
To accumulate the sum concurrently, we can use a LongAdder which is more performant than AtomicLong when frequent updates are required because of not being synchronized (which is the case here).
Like in the previous example, a custom accumulation type would encapsulate the Set and the sum of consumed elements.
While defining collector, in order to make it concurrent we need to specify the characteristic CONCURRENT, and UNORDERED would also be handy since we stated that the ordering is not important.
Collector:
public static Collector<Integer, ?, ConcurrentIntSumContainer> toConcurrentIntSumContainer(int limit) {
return Collector.of(
() -> new ConcurrentIntSumContainer(limit),
ConcurrentIntSumContainer::accept,
(left, right) -> { throw new AssertionError("merge function is not expected be called by the Parallel collector"); },
Collector.Characteristics.UNORDERED, Collector.Characteristics.CONCURRENT
);
}
Custom accumulation type:
public class ConcurrentIntSumContainer implements IntConsumer {
private LongAdder sum = new LongAdder();
private Set<Integer> set = ConcurrentHashMap.newKeySet();
private final int limit;
public ConcurrentIntSumContainer(int limit) {
this.limit = limit;
}
#Override
public void accept(int value) {
if (set.size() < limit && set.add(value)) {
sum.add(value);
}
}
public Set<Integer> getSet() {
return new HashSet<>(set); // because a general purpose set is faster than concurrent set
}
public long getSum() {
return sum.sum();
}
}
Usage example:
List<Integer> source = List.of(1, 2, 3, 4, 5, 6);
ConcurrentIntSumContainer result1 = source.parallelStream()
.collect(toConcurrentIntSumContainer(3));
Set<Integer> set = result1.getSet();
long sum = result1.getSum();
System.out.println(set);
System.out.println(sum);
Output:
[1, 4, 5]
10

Check size and then perform operation - is it safe for ConcurrentLinkedDeque?

I need to replace the first value in Deque with the new value, only
if the size will exceed the limit. I wrote this code to solve it:
final class Some {
final int buffer;
final Deque<Operation> operations = new ConcurrentLinkedDeque<>();
// constructors ommited;
#Override
public void register(final Operation operation) {
if (this.operations.size() == this.buffer) {
// remove the oldest operation
this.operations.removeFirst();
}
// add new operation to the tail
this.operations.addLast(operation);
}
#Override
public void apply() {
// take the fresh operation from tail and perform it
this.operations.removeLast().perform();
}
}
As you see, I have two methods, that modifies the Deque. I have doubts, that this code will work correctly in the multithreaded environment. The question is: is it safe to check the size() and then performing operations, that modifies the ConcurrentLinkedDeque afterward? I want to have as least locks as possible. So if this code won't work, then I had to introduce locking and then there is no point in the usage of ConcurrentLinkedDeque().
final class Some {
final int buffer;
final Deque<Operation> operations = new LinkedList<>();
final Lock lock = new ReentrantLock();
// constructors ommited;
#Override
public void register(final Operation operation) {
this.lock.lock();
try {
if (this.operations.size() == this.buffer) {
// remove the oldest operation
this.operations.removeFirst();
}
// add new operation to the tail
this.operations.addLast(operation);
} finally {
lock.unlock();
}
}
#Override
public void apply() {
this.lock.lock();
try {
// take the fresh operation from tail and perform it
this.operations.removeLast().perform();
} finally {
this.lock.unlock();
}
}
}
This is the alternative with the Lock. Is that the only way to achieve what I want? I am especially interested in trying to use the concurrent collections.
Concurrent collections are thread-safe when it comes to internal state. In other words, they
Allow multiple threads to read/write concurrently without having to worry that the internal state will become corrupted
Allow iteration and removal while other threads are modifying the collection
Not all, however. I believe CopyOnWriteArrayList's Iterator does not support the remove() operation
Guarantees things such as happens-before
Meaning a write by one thread will happen-before a read by a subsequent thread
However, they are not thread-safe across external method calls. When you call one method it will acquire whatever locks are necessary but those locks are released by the time the method returns. If you're not careful this can lead to a check-then-act race condition. Looking at your code
if (this.operations.size() == this.buffer) {
this.operations.removeFirst();
}
this.operations.addLast(operation);
the following can happen:
Thread-A checks size condition, result is false
Thread-A moves to add new Operation
Before Thread-A can add the Operation, Thread-B checks size condition which results in false as well
Thread-B goes to add new Operation
Thread-A does add new Operation
Oh, no! The Operation added by Thread-A causes the size threshold to be reached
Thread-B, already past the if statement, adds its Operation making the deque have one too many Operations
This is why a check-then-act requires external synchronization, which you do in your second example using a Lock. Note you could also use a synchronized block on the Deque.
Unrelated to your question: You call Operation.perform() in your second example while still holding the Lock. This means no other thread can attempt to add another Operation to the Deque while perform() executes. If this isn't desired you can change the code like so:
Operation op;
lock.lock();
try {
op = deque.pollLast(); // poll won't throw exception if there is no element
} finally {
lock.unlock();
}
if (op != null) {
op.perform();
}
From the doc of size()
BlockquoteBeware that, unlike in most collections, this method is NOT a constant-time operation. Because of the asynchronous nature of these deques, determining the current number of elements requires traversing them all to count them. Additionally, it is possible for the size to change during execution of this method, in which case the returned result will be inaccurate. Thus, this method is typically not very useful in concurrent applications.
While #Slaw is correct, also add that an addition/subtraction can occur during the traversal.
I don't use size() in my software. I keep my own count of what is in the collection with an AtomicInteger. If count.get() < max, I can add. Being a little over max is ok for my usage. You can use a lock on count to force compliance.

Improve the performance of non-duplicate concurrent ArrayList

I came across the performance issue when implementing a data structure of non-duplicate concurrent ArrayList(or ConcurrentLinkedQueue).
public class NonDuplicateList implements Outputable {
private Map<Term, Integer> map;
private List<Term> terms;
public NonDuplicateList() {
this.map = new HashMap<>();
this.terms = new ArrayList<>();
}
public synchronized int addTerm(Term term) { //bad performance :(
Integer index = map.get(term);
if (index == null) {
index = terms.size();
terms.add(term);
map.put(term, index);
}
return index;
}
#Override
public void output(DataOutputStream out) throws IOException {
out.writeInt(terms.size());
for (Term term : terms) {
term.output(out);
}
}
}
Note that Term and NonDuplicateList both implement Outputable interface to output.
In order to keep NonDuplicateList thread-safe, I use synchronized to guard the method addTerm(Term) and the performance is as bad as expected, when currently invoking addTerm.
It seems that ConcurrentHashMap isn't suitable for this case, since it doesn't keep strong data consistency. Any idea how to improve the performance of addTerm without losing its thread-safety?
EDIT:
output method, i.e. iteration through NonDuplicateList, might not be thread-safe since only one thread will access this method after concurrently invoking addTerm, but addTerm must return the index value immediately as soon as a term is added into the NonDuplicateList.
There is a possibility to reuse ConcurrentHashMap in your implementation if you can sacrifice addTerm return type. Instead of returning actual index you can return boolean which indicates whether addition was successful or produced duplicate. This will also allow you to remove method synchronization and improve performance:
private ConcurrentMap<Term, Boolean> map;
private List<Term> terms;
public boolean addTerm(Term term) {
Boolean previousValue = map.putIfAbsent(term, Boolean.TRUE);
if (previousValue == null) {
terms.add(term);
return true;
}
return false;
}
I am afraid you will not get much faster solution here. The point is to avoid synchronization when you don't need it. If you don't mind weak consistency, using ConcurrentHashMap iterator can be significantly cheaper than either preventing other threads from adding items while you're iterating or taking a consistent snapshot when the iterator is created.
On the other hand, when you need synchronization and a consistent iterator, you'll need an alternative for ConcurrentHashMap. One that comes to my mind is java.util.Collections#synchronizedMap, but it's using synchronization at Object level, so every read/write operation needs to acquire lock, which is a performance overhead.
Take a look at ConcurrentSkipListMap, which guarantees average O(log(n)) performance on a wide variety of operations. It also has a number of operations that ConcurrentHashMap doesn't: ceilingEntry/Key, floorEntry/Key, etc. It also maintains a sort order, which would otherwise have to be calculated (at notable expense) if you were using a ConcurrentHashMap. Maybe it would be possible to get rid of list+map and use ConcurrentSkipListMap instead. Index of element might be computed using ConcurrentSkipListMap api.

Is it Thread-safe on this operating on ConcurrentHashMap?

private final ConcurrentHashMap<Float, VoteItem> datum = new ConcurrentHashMap<>();
public void vote(float graduation) {
datum.putIfAbsent(graduation, new VoteItem(graduation, new AtomicInteger(0)));
datum.get(graduation).getNum().incrementAndGet();
}
Does the method vote is totally thread safe? VoteItem.getNum() returns an AtomicInteger? Or if there is a better way to achieve it?
If VoteItem#getNum() is thread-safe, e. g. returns final property, and no deletions are performed in parallel thread, your code is also thread-safe as there is no chance for putIfAbsent() to overwrite existing entry, and thus no chance for get() to return entry that is overwritten.
But there is more common way to achieve it using result of putIfAbsent(), which returns existing value if it is present for a given key:
public void vote(float graduation) {
VoteItem i = datum.putIfAbsent(graduation, new VoteItem(graduation, new AtomicInteger(1)));
if (i != null)
i.getNum().incrementAndGet();
}
This handles possibility of concurrent removals as well. In contrast to your code, where concurrent removal can be performed between putIfAbsent() and get() thus causing NPE, here no such situation can occur.
And consider to use computeIfAbsent() instead of putIfAbsent() in order to avoid unnessessary VoteItem creations:
public void vote(float graduation) {
datum.computeIfAbsent(graduation, g -> new VoteItem(g, new AtomicInteger(0)))
.getNum()
.incrementAndGet();
}
Calling getNum() on result is possible because in contrast to putIfAbsent(), which returns null if value didn't exist prior to insertion, it returns just computed value.

Thread-safe Map in Java

I understand the overall concepts of multi-threading and synchronization but am new to writing thread-safe code. I currently have the following code snippet:
synchronized(compiledStylesheets) {
if(compiledStylesheets.containsKey(xslt)) {
exec = compiledStylesheets.get(xslt);
} else {
exec = compile(s, imports);
compiledStylesheets.put(xslt, exec);
}
}
where compiledStylesheets is a HashMap (private, final). I have a few questions.
The compile method can take a few hundred milliseconds to return. This seems like a long time to have the object locked, but I don't see an alternative. Also, it is unnecessary to use Collections.synchronizedMap in addition to the synchronized block, correct? This is the only code that hits this object other than initialization/instantiation.
Alternatively, I know of the existence of a ConcurrentHashMap but I don't know if that's overkill. The putIfAbsent() method will not be usable in this instance because it doesn't allow me to skip the compile() method call. I also don't know if it will solve the "modified after containsKey() but before put()" problem, or if that's even really a concern in this case.
Edit: Spelling
For tasks of this nature, I highly recommend Guava caching support.
If you can't use that library, here is a compact implementation of a Multiton. Use of the FutureTask was a tip from assylias, here, via OldCurmudgeon.
public abstract class Cache<K, V>
{
private final ConcurrentMap<K, Future<V>> cache = new ConcurrentHashMap<>();
public final V get(K key)
throws InterruptedException, ExecutionException
{
Future<V> ref = cache.get(key);
if (ref == null) {
FutureTask<V> task = new FutureTask<>(new Factory(key));
ref = cache.putIfAbsent(key, task);
if (ref == null) {
task.run();
ref = task;
}
}
return ref.get();
}
protected abstract V create(K key)
throws Exception;
private final class Factory
implements Callable<V>
{
private final K key;
Factory(K key)
{
this.key = key;
}
#Override
public V call()
throws Exception
{
return create(key);
}
}
}
I think you are looking for a Multiton.
There's a very good Java one here that #assylas posted some time ago.
You can loosen the lock at the risk of an occasional doubly compiled stylesheet in race condition.
Object y;
// lock here if needed
y = map.get(x);
if(y == null) {
y = compileNewY();
// lock here if needed
map.put(x, y); // this may happen twice, if put is t.s. one will be ignored
y = map.get(x); // essential because other thread's y may have been put
}
This requires get and put to be atomic, which is true in the case of ConcurrentHashMap and you can achieve by wrapping individual calls to get and put with a lock in your class. (As I tried to explain with "lock here if needed" comments - the point being you only need to wrap individual calls, not have one big lock).
This is a standard thread safe pattern to use even with ConcurrentHashMap (and putIfAbsent) to minimize the cost of compiling twice. It still needs to be acceptable to compile twice sometimes, but it should be okay even if expensive.
By the way, you can solve that problem. Usually the above pattern isn't used with a heavy function like compileNewY but a lightweight constructor new Y(). e.g. do this:
class PrecompiledY {
public volatile Y y;
private final AtomicBoolean compiled = new AtomicBoolean(false);
public void compile() {
if(!compiled.getAndSet(true)) {
y = compile();
}
}
}
// ...
ConcurrentMap<X, PrecompiledY> myMap; // alternatively use proper locking
py = map.get(x);
if(py == null) {
py = new PrecompiledY(); // much cheaper than compiling
map.put(x, y); // this may happen twice, if put is t.s. one will be ignored
y = map.get(x); // essential because other thread's y may have been put
y.compile(); // object that didn't get inserted never gets compiled
}
Also:
Alternatively, I know of the existence of a ConcurrentHashMap but I don't know if that's overkill.
Given that your code is heavily locking, ConcurrentHashMap is almost certainly far faster, so not overkill. (And much more likely to be bug-free. Concurrency bugs are not fun to fix.)
Please see Erickson's comment below. Using double-checked locking with Hashmaps is not very smart
The compile method can take a few hundred milliseconds to return. This seems like a long time to have the object locked, but I don't see an alternative.
You can use double-checked locking, and note that you don't need any lock before get since you never remove anything from the map.
if(compiledStylesheets.containsKey(xslt)) {
exec = compiledStylesheets.get(xslt);
} else {
synchronized(compiledStylesheets) {
if(compiledStylesheets.containsKey(xslt)) {
// another thread might have created it while
// this thread was waiting for lock
exec = compiledStylesheets.get(xslt);
} else {
exec = compile(s, imports);
compiledStylesheets.put(xslt, exec);
}
}
}
}
Also, it is unnecessary to use Collections.synchronizedMap in addition to the synchronized block, correct?
Correct
This is the only code that hits this object other than initialization/instantiation.
First of all, the code as you posted it is race-condition-free because containsKey() result will never change while compile() method is running.
Collections.synchronizedMap() is useless for your case as stated above because it wraps all map methods into a synchronized block using either this as a mutex or another object you provided (for two-argument version).
IMO using ConcurrentHashMap is also not an option because it stripes locks based on key hashCode() result; its concurrent iterators is also useless here.
If you really want compile() out of synchronized block, you may pre-calculate if before checking containsKey(). This may draw the overall performance back, but may be better than calling it in synchronized block. To make a decision, personally I would consider how often key "miss" is happening and so, which option is preferrable - keep the lock for longer times or calculate your stuff always.

Categories