I have a Stream<Item> which I'm mapping to a CompleteableFuture<ItemResult>
What I'd like to do is to know when all the futures are completed.
One may suggest to:
collect all the futures to an array and use CompleteableFuture.allOf(). This is somewhat problematic since there could be hundreds of thousands of items
just continue with forEach(CompleteableFuture::join). This is problematic too as calling forEach with join will just block the stream and it will be essentially a serial processing and not concurrent
Inject a poisoned item in the end of the stream. This could work but it's not that elegant in my view
check if the executor queue is empty - This is quite limiting because I might use more than one executor in the future. Also, the queue can be momentarily empty
Monitor the database instead and check the number of new items
I feel like all the suggested solutions aren't good enough.
What is the appropriate way to monitor the futures?
Thanks
EDIT:
another (vague) idea I had in mind is to use a counter and wait for it to go down to zero. But again, need to check that it's not a momentarily 0..
Disclaimer: I'm not sure whether Phaser is the right tool here, and if yes, whether it's better to have one root with multiple children or to chain them like I'm proposing below, so feel free to correct me.
Here's one approach that uses Phaser.
A Phaser has a limited number of parties, so we need to create a new child Phaser if that limit is about to get reached:
private Phaser register(Phaser phaser) {
if (phaser.getRegisteredParties() < 65534) {
// warning: side-effect,
// conflicts with AtomicReference#updateAndGet recommendation,
// might not fit well if the Stream is parallel:
phaser.register();
return phaser;
} else {
return new Phaser(phaser, 1);
}
}
Register each CompletableFuture against that Phaser chain, and deregister once done:
private void register(CompletableFuture<?> future, AtomicReference<Phaser> phaser) {
Phaser registeredPhaser = phaser.updateAndGet(this::register);
future
.thenRun(registeredPhaser::arriveAndDeregister)
.exceptionally(e -> {
// log e?
registeredPhaser.arriveAndDeregister();
return null;
});
}
Wait for all futures to be finished:
private <T> void await(Stream<CompletableFuture<T>> futures) {
Phaser rootPhaser = new Phaser(1);
AtomicReference<Phaser> phaser = new AtomicReference<>(rootPhaser);
futures.forEach(future -> register(future, phaser));
rootPhaser.arriveAndAwaitAdvance();
rootPhaser.arriveAndDeregister();
}
Example:
ExecutorService executor = Executors.newFixedThreadPool(500);
// creating fake stream with 500,000 futures:
Stream<CompletableFuture<Integer>> stream = IntStream
.rangeClosed(1, 500_000)
.mapToObj(i -> CompletableFuture.supplyAsync(() -> {
try {
TimeUnit.MILLISECONDS.sleep(10);
if (i % 50_000 == 0) {
System.out.println(Thread.currentThread().getName() + ": " + i);
}
return i;
} catch (InterruptedException e) {
throw new IllegalStateException(e);
}
}, executor));
// usage:
await(stream);
System.out.println("Done");
Outputs:
pool-1-thread-348: 50000
pool-1-thread-395: 100000
pool-1-thread-333: 150000
pool-1-thread-30: 200000
pool-1-thread-120: 250000
pool-1-thread-10: 300000
pool-1-thread-241: 350000
pool-1-thread-340: 400000
pool-1-thread-283: 450000
pool-1-thread-176: 500000
Done
There is the following pipeline:
item is produced (the producer is external to the pipeline);
item is deserialized (JSON to Java object);
item is processed;
At the moment it all happens synchronously in a single thread:
while(producer.next()) {
var item = gson.deserialize(producer.item());
processItem(item);
}
Or schematically:
PRODUCER -> DESERIALIZATION -> CONSUMER
(sync) (sync) (sync)
The concern is that the deserialization step has no side-effects and could be parallelized saving some world time.
The overall code should like the following:
var pipeline = new Pipeline<Item>();
pipeline.setProducer(producer);
pipeline.setDeserialization(gson::deserialize);
pipeline.setConsumer(item -> {
...
});
pipeline.run();
Or schematically:
-> DESERIALIZATION
-> DESERIALIZATION
-> DESERIALIZATION
PRODUCER -> ... -> CONSUMER
-> DESERIALIZATION
-> DESERIALIZATION
-> DESERIALIZATION
(sync) (parallel) (sync)
Important notice. Deserialized items should be produced:
synchronously;
in the same order the original producer produces encoded items.
Q. Is there a standardized way to code such a pipeline?
Try
while(producer.next()) {
CompletableFuture.supplyAsync(()-> gson.deserialize(producer.item()))
.thenRunAsync(item->processItem(item));
}
One way you can achieve your pattern is to:
Construct a multi-threaded executor to process the decoding requests
Have a consumer queue; each time you submit an item to be decoded, also add the corresponding Future object to the consumer queue
Have a consumer thread sit waiting to take items off the queue [which therefore consumes them in the order they were posted], call the corresponding get() method [which waits for the item to be decoded]
So the 'consumer' would look like this:
BlockingQueue<Future<Item>> consumerQueue = new LinkedBlockingDeque<>();
Thread consumerThread = new Thread(() -> {
try {
while (true) {
Future<Item> item = consumerQueue.take();
try {
// Get the next decoded item that's ready
Item decodedItem = item.get();
// 'Consume' the item
...
} catch (ExecutionException ex) {
}
}
} catch (InterruptedException irr) {
}
});
consumerThread.start()
Meanwhile, the 'producer' end, with its multi-threaded 'decoder', would look like this:
ExecutorService decoder = Executors.newFixedThreadPool(4);
while (!producer.hasNext()) {
Item item = producer.next()
// Submit the decode job for asynchronous processing
Future<Item> p = decoder.submit(() -> {
item.decode();
}, item);
// Also queue this decode job for future consumption once complete
consumerQueue.add(p);
}
As a separate matter, I wonder if you will actually see much benefit in practice, since by insisting on consumption in the same order, you are inherently introducing a serial condition on the process. But technically, this is one way that you could achieve what you are after.
P.S. If you didn't want a separate consumer thread, then the same 'producer' thread could poll the queue for completed items and execute in line.
I'm trying to understand how to apply backpressure in Spring WebFlux. I understand the theory of backpressure, but I can't reproduce it, so I don't fully understand it.
Let's take the following example:
public void test() throws InterruptedException {
EmitterProcessor<String> processor = EmitterProcessor.create();
new Thread(() -> {
int i = 0;
while(runThread) {
try {
Thread.sleep(100);
} catch (InterruptedException ignored) {
}
processor.onNext("Value: " + i);
i++;
}
processor.onComplete();
}).start();
processor
.subscribe(makeSubscriber("FIRST - "), Throwable::printStackTrace);
}
private Consumer<String> makeSubscriber(String label) {
return v -> {
System.out.println(label + v);
try {
Thread.sleep(1000);
} catch (InterruptedException ignored) {
}
};
}
I have created a Hot Flux in the form of an EmitterProcessor and in a separate thread I start producing data for it.
A bit lower, I subscribe to it. The subscriber is slower than the rate at which elements are being produced, so the issues should start to occur, right?
But the subscriber logic is run on the producer thread. When I call processor.onNext(), it synchronously calls all the subscribers, so if the subscribers are slow, the publisher is slowed down as well. So, then backpressure doesn't even seem useful.
I have also tried making two Spring Boot WebFlux applications, one with a Flux endpoint and one that consumes the endpoint, so I can be certain the consumer runs on a separate thread. But then, any attempt I make at backpressure in the consumer does nothing. There is no buffer being filled, there is nothing being dropped or anything!
Can anyone give me a concrete example of backpressure? Preferably in Spring WebFlux but I'll take any reactive Java library.
the documentation to the variant of subscribe method you have chosen reads:
The subscription will request an unbounded demand (Long.MAX_VALUE).
that is, you switched off backpressure yourself.
To use backpressure , subscribe with Flux.subscribe(Subscriber)
Is it possible to specify a custom thread pool for Java 8 parallel stream? I can not find it anywhere.
Imagine that I have a server application and I would like to use parallel streams. But the application is large and multi-threaded so I want to compartmentalize it. I do not want a slow running task in one module of the applicationblock tasks from another module.
If I can not use different thread pools for different modules, it means I can not safely use parallel streams in most of real world situations.
Try the following example. There are some CPU intensive tasks executed in separate threads.
The tasks leverage parallel streams. The first task is broken, so each step takes 1 second (simulated by thread sleep). The issue is that other threads get stuck and wait for the broken task to finish. This is contrived example, but imagine a servlet app and someone submitting a long running task to the shared fork join pool.
public class ParallelTest {
public static void main(String[] args) throws InterruptedException {
ExecutorService es = Executors.newCachedThreadPool();
es.execute(() -> runTask(1000)); //incorrect task
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.shutdown();
es.awaitTermination(60, TimeUnit.SECONDS);
}
private static void runTask(int delay) {
range(1, 1_000_000).parallel().filter(ParallelTest::isPrime).peek(i -> Utils.sleep(delay)).max()
.ifPresent(max -> System.out.println(Thread.currentThread() + " " + max));
}
public static boolean isPrime(long n) {
return n > 1 && rangeClosed(2, (long) sqrt(n)).noneMatch(divisor -> n % divisor == 0);
}
}
There actually is a trick how to execute a parallel operation in a specific fork-join pool. If you execute it as a task in a fork-join pool, it stays there and does not use the common one.
final int parallelism = 4;
ForkJoinPool forkJoinPool = null;
try {
forkJoinPool = new ForkJoinPool(parallelism);
final List<Integer> primes = forkJoinPool.submit(() ->
// Parallel task here, for example
IntStream.range(1, 1_000_000).parallel()
.filter(PrimesPrint::isPrime)
.boxed().collect(Collectors.toList())
).get();
System.out.println(primes);
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
} finally {
if (forkJoinPool != null) {
forkJoinPool.shutdown();
}
}
The trick is based on ForkJoinTask.fork which specifies: "Arranges to asynchronously execute this task in the pool the current task is running in, if applicable, or using the ForkJoinPool.commonPool() if not inForkJoinPool()"
The parallel streams use the default ForkJoinPool.commonPool which by default has one less threads as you have processors, as returned by Runtime.getRuntime().availableProcessors() (This means that parallel streams leave one processor for the calling thread).
For applications that require separate or custom pools, a ForkJoinPool may be constructed with a given target parallelism level; by default, equal to the number of available processors.
This also means if you have nested parallel streams or multiple parallel streams started concurrently, they will all share the same pool. Advantage: you will never use more than the default (number of available processors). Disadvantage: you may not get "all the processors" assigned to each parallel stream you initiate (if you happen to have more than one). (Apparently you can use a ManagedBlocker to circumvent that.)
To change the way parallel streams are executed, you can either
submit the parallel stream execution to your own ForkJoinPool: yourFJP.submit(() -> stream.parallel().forEach(soSomething)).get(); or
you can change the size of the common pool using system properties: System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "20") for a target parallelism of 20 threads.
Example of the latter on my machine which has 8 processors. If I run the following program:
long start = System.currentTimeMillis();
IntStream s = IntStream.range(0, 20);
//System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "20");
s.parallel().forEach(i -> {
try { Thread.sleep(100); } catch (Exception ignore) {}
System.out.print((System.currentTimeMillis() - start) + " ");
});
The output is:
215 216 216 216 216 216 216 216 315 316 316 316 316 316 316 316 415 416 416 416
So you can see that the parallel stream processes 8 items at a time, i.e. it uses 8 threads. However, if I uncomment the commented line, the output is:
215 215 215 215 215 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216
This time, the parallel stream has used 20 threads and all 20 elements in the stream have been processed concurrently.
Alternatively to the trick of triggering the parallel computation inside your own forkJoinPool you can also pass that pool to the CompletableFuture.supplyAsync method like in:
ForkJoinPool forkJoinPool = new ForkJoinPool(2);
CompletableFuture<List<Integer>> primes = CompletableFuture.supplyAsync(() ->
//parallel task here, for example
range(1, 1_000_000).parallel().filter(PrimesPrint::isPrime).collect(toList()),
forkJoinPool
);
The original solution (setting the ForkJoinPool common parallelism property) no longer works. Looking at the links in the original answer, an update which breaks this has been back ported to Java 8. As mentioned in the linked threads, this solution was not guaranteed to work forever. Based on that, the solution is the forkjoinpool.submit with .get solution discussed in the accepted answer. I think the backport fixes the unreliability of this solution also.
ForkJoinPool fjpool = new ForkJoinPool(10);
System.out.println("stream.parallel");
IntStream range = IntStream.range(0, 20);
fjpool.submit(() -> range.parallel()
.forEach((int theInt) ->
{
try { Thread.sleep(100); } catch (Exception ignore) {}
System.out.println(Thread.currentThread().getName() + " -- " + theInt);
})).get();
System.out.println("list.parallelStream");
int [] array = IntStream.range(0, 20).toArray();
List<Integer> list = new ArrayList<>();
for (int theInt: array)
{
list.add(theInt);
}
fjpool.submit(() -> list.parallelStream()
.forEach((theInt) ->
{
try { Thread.sleep(100); } catch (Exception ignore) {}
System.out.println(Thread.currentThread().getName() + " -- " + theInt);
})).get();
We can change the default parallelism using the following property:
-Djava.util.concurrent.ForkJoinPool.common.parallelism=16
which can set up to use more parallelism.
To measure the actual number of used threads, you can check Thread.activeCount():
Runnable r = () -> IntStream
.range(-42, +42)
.parallel()
.map(i -> Thread.activeCount())
.max()
.ifPresent(System.out::println);
ForkJoinPool.commonPool().submit(r).join();
new ForkJoinPool(42).submit(r).join();
This can produce on a 4-core CPU an output like:
5 // common pool
23 // custom pool
Without .parallel() it gives:
3 // common pool
4 // custom pool
Until now, I used the solutions described in the answers of this question. Now, I came up with a little library called Parallel Stream Support for that:
ForkJoinPool pool = new ForkJoinPool(NR_OF_THREADS);
ParallelIntStreamSupport.range(1, 1_000_000, pool)
.filter(PrimesPrint::isPrime)
.collect(toList())
But as #PabloMatiasGomez pointed out in the comments, there are drawbacks regarding the splitting mechanism of parallel streams which depends heavily on the size of the common pool. See Parallel stream from a HashSet doesn't run in parallel .
I am using this solution only to have separate pools for different types of work but I can not set the size of the common pool to 1 even if I don't use it.
Note:
There appears to be a fix implemented in JDK 10 that ensures the Custom Thread Pool uses the expected number of threads.
Parallel stream execution within a custom ForkJoinPool should obey the parallelism
https://bugs.openjdk.java.net/browse/JDK-8190974
If you don't want to rely on implementation hacks, there's always a way to achieve the same by implementing custom collectors that will combine map and collect semantics... and you wouldn't be limited to ForkJoinPool:
list.stream()
.collect(parallel(i -> process(i), executor, 4))
.join()
Luckily, it's done already here and available on Maven Central:
http://github.com/pivovarit/parallel-collectors
Disclaimer: I wrote it and take responsibility for it.
Go to get abacus-common. Thread number can by specified for parallel stream. Here is the sample code:
LongStream.range(4, 1_000_000).parallel(threadNum)...
Disclosure: I'm the developer of abacus-common.
If you don't need a custom ThreadPool but you rather want to limit the number of concurrent tasks, you can use:
List<Path> paths = List.of("/path/file1.csv", "/path/file2.csv", "/path/file3.csv").stream().map(e -> Paths.get(e)).collect(toList());
List<List<Path>> partitions = Lists.partition(paths, 4); // Guava method
partitions.forEach(group -> group.parallelStream().forEach(csvFilePath -> {
// do your processing
}));
(Duplicate question asking for this is locked, so please bear me here)
Here is how I set the max thread count flag mentioned above programatically and a code sniped to verify that the parameter is honored
System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "2");
Set<String> threadNames = Stream.iterate(0, n -> n + 1)
.parallel()
.limit(100000)
.map(i -> Thread.currentThread().getName())
.collect(Collectors.toSet());
System.out.println(threadNames);
// Output -> [ForkJoinPool.commonPool-worker-1, Test worker, ForkJoinPool.commonPool-worker-3]
If you don't mind using a third-party library, with cyclops-react you can mix sequential and parallel Streams within the same pipeline and provide custom ForkJoinPools. For example
ReactiveSeq.range(1, 1_000_000)
.foldParallel(new ForkJoinPool(10),
s->s.filter(i->true)
.peek(i->System.out.println("Thread " + Thread.currentThread().getId()))
.max(Comparator.naturalOrder()));
Or if we wished to continue processing within a sequential Stream
ReactiveSeq.range(1, 1_000_000)
.parallel(new ForkJoinPool(10),
s->s.filter(i->true)
.peek(i->System.out.println("Thread " + Thread.currentThread().getId())))
.map(this::processSequentially)
.forEach(System.out::println);
[Disclosure I am the lead developer of cyclops-react]
I tried the custom ForkJoinPool as follows to adjust the pool size:
private static Set<String> ThreadNameSet = new HashSet<>();
private static Callable<Long> getSum() {
List<Long> aList = LongStream.rangeClosed(0, 10_000_000).boxed().collect(Collectors.toList());
return () -> aList.parallelStream()
.peek((i) -> {
String threadName = Thread.currentThread().getName();
ThreadNameSet.add(threadName);
})
.reduce(0L, Long::sum);
}
private static void testForkJoinPool() {
final int parallelism = 10;
ForkJoinPool forkJoinPool = null;
Long result = 0L;
try {
forkJoinPool = new ForkJoinPool(parallelism);
result = forkJoinPool.submit(getSum()).get(); //this makes it an overall blocking call
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
} finally {
if (forkJoinPool != null) {
forkJoinPool.shutdown(); //always remember to shutdown the pool
}
}
out.println(result);
out.println(ThreadNameSet);
}
Here is the output saying the pool is using more threads than the default 4.
50000005000000
[ForkJoinPool-1-worker-8, ForkJoinPool-1-worker-9, ForkJoinPool-1-worker-6, ForkJoinPool-1-worker-11, ForkJoinPool-1-worker-10, ForkJoinPool-1-worker-1, ForkJoinPool-1-worker-15, ForkJoinPool-1-worker-13, ForkJoinPool-1-worker-4, ForkJoinPool-1-worker-2]
But actually there is a weirdo, when I tried to achieve the same result using ThreadPoolExecutor as follows:
BlockingDeque blockingDeque = new LinkedBlockingDeque(1000);
ThreadPoolExecutor fixedSizePool = new ThreadPoolExecutor(10, 20, 60, TimeUnit.SECONDS, blockingDeque, new MyThreadFactory("my-thread"));
but I failed.
It will only start the parallelStream in a new thread and then everything else is just the same, which again proves that the parallelStream will use the ForkJoinPool to start its child threads.
I made utility method to run task in parallel with argument which defines max number of threads.
public static void runParallel(final int maxThreads, Runnable task) throws RuntimeException {
ForkJoinPool forkJoinPool = null;
try {
forkJoinPool = new ForkJoinPool(maxThreads);
forkJoinPool.submit(task).get();
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
} finally {
if (forkJoinPool != null) {
forkJoinPool.shutdown();
}
}
}
It creates ForkJoinPool with max number of allowed threads and it shuts it down after the task completes (or fails).
Usage is following:
final int maxThreads = 4;
runParallel(maxThreads, () ->
IntStream.range(1, 1_000_000).parallel()
.filter(PrimesPrint::isPrime)
.boxed().collect(Collectors.toList()));
The (currently) accepted answer is partly wrong. It is not sufficient to just submit() the parallel stream to the dedicated fork-join-pool. In this case, the stream will use that pool's threads and additionally the common fork-join-pool and even the calling thread to handle the workload of the stream, it seems up to the size of the common fork-join pool. The behaviour is a bit weird but definitely not what is required.
To actually restrict the work completely to the dedicated pool, you must encapsulate it into a CompletableFuture:
final int parallelism = 4;
ForkJoinPool forkJoinPool = null;
try {
forkJoinPool = new ForkJoinPool(parallelism);
final List<Integer> primes = CompletableFuture.supplyAsync(() ->
// Parallel task here, for example
IntStream.range(1, 1_000_000).parallel()
.filter(PrimesPrint::isPrime)
.boxed().collect(Collectors.toList()),
forkJoinPool) // <- passes dedicated fork-join pool as executor
.join(); // <- Wait for result from forkJoinPool
System.out.println(primes);
} finally {
if (forkJoinPool != null) {
forkJoinPool.shutdown();
}
}
This code stays with all operations in forkJoinPool on both Java 8u352 and Java 17.0.1.