I have a program that parses reports coming from multiple devices (about 1000 devices), saves them to a DB and then does additional processing on them.
Parsing the reports can be done concurrently, but the saving to the DB and the additional processing requires some synchronization based on what device ID they come from (since it might be needed to update the same data on the DB).
So, I can run the processing in parallel as long as the threads are handling reports from different device IDs.
What could be the most efficient way to process this?
Example
I initially thought about using a thread pool and locking on the device ID, but that won't be efficient if I get a burst of reports coming from a single device.
For example, considering a thread pool with 4 threads and 10 incoming reports:
Report #
DeviceID
1
A
2
A
3
A
4
A
5
A
6
B
7
C
8
D
9
E
10
F
Thread 1 would start processing A's report, thread 2-4 would wait until thread 1 finishes, and the rest of the reports would get queued.
It would be more efficient if the rest of A's reports could be queued instead, allowing B/C/D reports to be processed concurrently. Is there an efficient way to do this?
Try using a priority queue. The highest priority items in the queue would be chosen for processing by the thread pool. For example:
NOTE: I know priority queues are not typically implemented using an array and that some priority queues use smaller index values for higher priority. I just use this notation for simplicities sake.
Let (DeviceID, Priority).
Let current thread pool be empty -> []
Say, we get an incoming 10 reports ->
[(A, 1), (A, 1), (A, 1), (B, 1), (B, 1), (C, 1), (D, 1), (E, 1), (F, 1), (G, 1)]
(represents the filled priority queue upon receiving the reports).
So, you dequeue the first item and give it to the thread pool. Then decrement the priority of all items in the priority queue with DeviceID A. This would look like the following:
(A, 1) is dequeued so you just get A. The priority queue would then shift after decrementing the priorities of A's still in the queue.
[(B, 1), (B, 1), (C, 1), (D, 1), (E, 1), (F, 1), (G, 1), (A, 0), (A, 0)]
Project Loom
After having seen some late 2020 videos on YouTube.com with Ron Pressler, head of Project Loom at Oracle, the solution is quite simple with the new virtual threads (fibers) feature coming to a future release of Java:
Call a new Executors method to create an executor service that uses virtual threads (fibers) rather than platform/kernel threads.
Submit all incoming report processing tasks to that executor service.
Inside each task, attempt to grab a semaphore, one semaphore for each of your 1,000 devices.
That semaphore will be the way to process only one input per device at a time, to parallelize per source-device. If the semaphore representing a particular device is not available, simply block — let your report processing thread wait until the semaphore is available.
Project Loom maintains many lightweight virtual threads (fibers), even millions, that are run on a few heavyweight platform/kernel threads. This makes blocking a thread cheap.
Early builds of a JDK binary with Project Loom built-in for macOS/Linux/Windows are available now.
Caveat: I’m no expert on concurrency nor on Project Loom. But your particular use-case seems to match some specific recommendations made by Ron Pressler in his videos.
Example code
Here is some example code that I noodled around with. I am not all sure this is a good example or not.
I used an early-access build of Java 16, specially built with Project Loom technology: Build 16-loom+9-316 (2020/11/30) for macOS Intel.
package work.basil.example;
import java.time.*;
import java.util.*;
import java.util.concurrent.*;
/**
* An example of using Project Loom virtual threads to more simply process incoming data on background threads.
* <p>
* This code was built as a possible solution to this Question at StackOverflow.com: https://stackoverflow.com/q/65327325/642706
* <p>
* Posted in my Answer at StackOverflow.com: https://stackoverflow.com/a/65328799/642706
* <p>
* ©2020 Basil Bourque. 2020-12.
* <p>
* This work by Basil Bourque is licensed under CC BY 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0
* <p>
* Caveats:
* - Project Loom is still in early-release, available only as a special build of OpenJDK for Java 16.
* - I am *not* an expert on concurrency in general, nor Project Loom in particular. This code is merely me guessing and experimenting.
*/
public class App
{
// FYI, Project Loom links:
// https://wiki.openjdk.java.net/display/loom/Main
// http://jdk.java.net/loom/ (special early-access builds of Java 16 with Project Loom built-in)
// https://download.java.net/java/early_access/loom/docs/api/ (Javadoc)
// https://www.youtube.com/watch?v=23HjZBOIshY (Ron Pressler talk, 2020-07)
public static void main ( String[] args )
{
System.out.println( "java.version: " + System.getProperty( "java.version" ) );
App app = new App();
app.checkForProjectLoom();
app.demo();
}
public static boolean projectLoomIsPresent ( )
{
try
{
Thread.class.getDeclaredMethod( "startVirtualThread" , Runnable.class );
return true;
}
catch ( NoSuchMethodException e )
{
return false;
}
}
private void checkForProjectLoom ( )
{
if ( App.projectLoomIsPresent() )
{
System.out.println( "INFO - Running on a JVM with Project Loom technology. " + Instant.now() );
} else
{
throw new IllegalStateException( "Project Loom technology not present in this Java implementation. " + Instant.now() );
}
}
record ReportProcessorRunnable(Semaphore semaphore , Integer deviceIdentifier , boolean printToConsole , Queue < String > fauxDatabase) implements Runnable
{
#Override
public void run ( )
{
// Our goal is to serialize the report-processing per device.
// Each device can have only one report being processed at a time.
// In Project Loom this can be accomplished simply by spawning virtual threads for all such
// reports but process them serially by synchronizing on a binary (single-permit) semaphore.
// Each thread working on a report submitted for that device waits on semaphore assigned to that device.
// Blocking to wait for the semaphore is cheap in Project Loom using virtual threads. The underlying
// platform/kernel thread carrying this virtual thread will be assigned other work while this
// virtual thread is parked.
try
{
semaphore.acquire(); // Blocks until the semaphore for this particular device becomes available. Blocking is cheap on a virtual thread.
// Simulate more lengthy work being done by sleeping the virtual thread handling this task via the executor service.
try {Thread.sleep( Duration.ofMillis( 100 ) );} catch ( InterruptedException e ) {e.printStackTrace();}
String fauxData = "Insert into database table for device ID # " + this.deviceIdentifier + " at " + Instant.now();
fauxDatabase.add( fauxData );
if ( Objects.nonNull( this.printToConsole ) && this.printToConsole ) { System.out.println( fauxData ); }
semaphore.release(); // For fun, comment-out this line to see the effect of the per-device semaphore at runtime.
}
catch ( InterruptedException e )
{
e.printStackTrace();
}
}
}
record IncomingReportsSimulatorRunnable(Map < Integer, Semaphore > deviceToSemaphoreMap ,
ExecutorService reportProcessingExecutorService ,
int countOfReportsToGeneratePerBatch ,
boolean printToConsole ,
Queue < String > fauxDatabase)
implements Runnable
{
#Override
public void run ( )
{
if ( printToConsole ) System.out.println( "INFO - Generating " + countOfReportsToGeneratePerBatch + " reports at " + Instant.now() );
for ( int i = 0 ; i < countOfReportsToGeneratePerBatch ; i++ )
{
// Make a new Runnable task containing report data to be processed, and submit this task to the executor service using virtual threads.
// To simulate a device sending in a report, we randomly pick one of the devices to pretend it is our source of report data.
final List < Integer > deviceIdentifiers = List.copyOf( deviceToSemaphoreMap.keySet() );
int randomIndexNumber = ThreadLocalRandom.current().nextInt( 0 , deviceIdentifiers.size() );
Integer deviceIdentifier = deviceIdentifiers.get( randomIndexNumber );
Semaphore semaphore = deviceToSemaphoreMap.get( deviceIdentifier );
Runnable processReport = new ReportProcessorRunnable( semaphore , deviceIdentifier , printToConsole , fauxDatabase );
reportProcessingExecutorService.submit( processReport );
}
}
}
private void demo ( )
{
// Configure experiment.
Duration durationOfExperiment = Duration.ofSeconds( 20 );
int countOfReportsToGeneratePerBatch = 7; // Would be 40 per the Stack Overflow Question.
boolean printToConsole = true;
// To use as a concurrent list, I found this suggestion to use `ConcurrentLinkedQueue`: https://stackoverflow.com/a/25630263/642706
Queue < String > fauxDatabase = new ConcurrentLinkedQueue < String >();
// Represent each of the thousand devices that are sending us report data to be processed.
// We map each device to a Java `Semaphore` object, to serialize the processing of multiple reports per device.
final int firstDeviceNumber = 1_000;
final int countDevices = 10; // Would be 1_000 per the Stack Overflow question.
final Map < Integer, Semaphore > deviceToSemaphoreMap = new TreeMap <>();
for ( int i = 0 ; i < countDevices ; i++ )
{
Integer deviceIdentifier = i + firstDeviceNumber; // Our devices are identified as numbered 1,000 to 1,999.
Semaphore semaphore = new Semaphore( 1 , true ); // A single permit to make a binary semaphore, and make it fair.
deviceToSemaphoreMap.put( deviceIdentifier , semaphore );
}
// Run experiment.
// Notice that in Project Loom the `ExecutorService` interface is now `AutoCloseable`, for use in try-with-resources syntax.
try (
ScheduledExecutorService reportGeneratingExecutorService = Executors.newSingleThreadScheduledExecutor() ;
ExecutorService reportProcessingExecutorService = Executors.newVirtualThreadExecutor() ;
)
{
Runnable simulateIncommingReports = new IncomingReportsSimulatorRunnable( deviceToSemaphoreMap , reportProcessingExecutorService , countOfReportsToGeneratePerBatch , printToConsole , fauxDatabase );
ScheduledFuture scheduledFuture = reportGeneratingExecutorService.scheduleAtFixedRate( simulateIncommingReports , 0 , 1 , TimeUnit.SECONDS );
try {Thread.sleep( durationOfExperiment );} catch ( InterruptedException e ) {e.printStackTrace();}
}
// Notice that when reaching this point we block until all submitted tasks still running are finished,
// because that is the new behavior of `ExecutorService` being `AutoCloseable`.
System.out.println( "INFO - executor services shut down at this point. " + Instant.now() );
// Results of experiment
System.out.println( "fauxDatabase.size(): " + fauxDatabase.size() );
System.out.println( "fauxDatabase = " + fauxDatabase );
}
}
Related
class ThreadDemo extends Thread
{
public void run()
{
for(int i =0; i<5;i++)
{
System.out.println(i);
}
}
}
class ThreadApp
{
public static void main(String args[])
{
ThreadDemo thread1 = new ThreadDemo();
thread1.start();
ThreadDemo thread2 = new ThreadDemo();
thread2.start();
ThreadDemo thread3 = new ThreadDemo();
thread3.start();
}
}
Output:
0
2
3
1
4
1
2
4
3
0
0
1
2
3
4
By default, Java applications are single thread application. We are going for the concept called multithreading to share the work. Means, instead of doing the work with one thread (Main thread) if we create the thread, then it will simplifies the work. I understood this thing theoretically. My doubt arises when I start coding.
In the above program I have created 3 threads. If 3 threads are working on the same logic (iteration and printing the values by using for loop) why it is giving 3 separate output instead of giving one set of values from 0 to 4?
Duplicating the work, not sharing the work
You said:
We are going for the concept called multithreading to share the work.
But you did not share the work. You repeated the work. Instead of running a for loop once, you ran a for loop three times, once in each of three threads.
You asked:
why it is giving 3 separate output instead of giving one set of values from 0 to 4?
If a school teacher asks each of three pupils to write the alphabet on the board, we would end up not with 26 letters but with 78 letters (3 * 26). Each pupil would be looping through the letters of the alphabet. Likewise, each of your three threads looped through the count of zero to four.
Your for loop is local to (within) your task code. So each thread runs all of that code starting at the top. So the for loop executes three times, one per thread.
Beware: System.out prints out-of-order
Sending text to System.out via calls to println or print does not result in text appearing immediately and in the order sent. The lines of text you send may appear out of order.
When examining a sequence of statements, always include a timestamp such as java.time.Instant.now(). Then study the output. You may need to manually re-order the outputs using a text editor to see the true sequence.
You can see the out-of-chronological-order lines in my own example output below.
Executor service
In modern Java we no longer need address the Thread class directly. Generally best to use the Executors framework. See Oracle Tutorial.
package work.basil.example;
import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class Counter
{
public static void main ( String[] args )
{
ExecutorService executorService = Executors.newCachedThreadPool();
Runnable task = new Runnable()
{
#Override
public void run ( )
{
for ( int i = 0 ; i < 5 ; i++ )
{
System.out.println( "i = " + i + " at " + Instant.now() + " on thread " + Thread.currentThread().getName() );
}
}
};
executorService.submit( task );
executorService.submit( task );
executorService.submit( task );
// Let our program run a while, then gracefully shutdown the executor service.
// Otherwise the backing thread pool may run indefinitely, like a zombie 🧟.
try { Thread.sleep( Duration.ofSeconds( 5 ).toMillis() ); }catch ( InterruptedException e ) { e.printStackTrace(); }
executorService.shutdown();
try { executorService.awaitTermination( 1 , TimeUnit.MINUTES ); } catch ( InterruptedException e ) { e.printStackTrace(); }
}
}
When run.
i = 0 at 2021-01-06T05:28:34.349290Z on thread pool-1-thread-1
i = 1 at 2021-01-06T05:28:34.391997Z on thread pool-1-thread-1
i = 0 at 2021-01-06T05:28:34.349246Z on thread pool-1-thread-2
i = 0 at 2021-01-06T05:28:34.349464Z on thread pool-1-thread-3
i = 1 at 2021-01-06T05:28:34.392467Z on thread pool-1-thread-3
i = 2 at 2021-01-06T05:28:34.392162Z on thread pool-1-thread-1
i = 2 at 2021-01-06T05:28:34.392578Z on thread pool-1-thread-3
i = 3 at 2021-01-06T05:28:34.392670Z on thread pool-1-thread-1
i = 3 at 2021-01-06T05:28:34.392773Z on thread pool-1-thread-3
i = 4 at 2021-01-06T05:28:34.393165Z on thread pool-1-thread-3
i = 1 at 2021-01-06T05:28:34.392734Z on thread pool-1-thread-2
i = 4 at 2021-01-06T05:28:34.392971Z on thread pool-1-thread-1
i = 2 at 2021-01-06T05:28:34.395138Z on thread pool-1-thread-2
i = 3 at 2021-01-06T05:28:34.396407Z on thread pool-1-thread-2
i = 4 at 2021-01-06T05:28:34.397002Z on thread pool-1-thread-2
Project Loom
Project Loom promises to be bring to Java new features such as virtual threads (fibers), and making ExecutorService be AutoCloseable for use with try-with-resources to automatically shutdown.
Let's rewrite the above code to use Project Loom technologies. Preliminary builds are available now based on early-access Java 16.
Also, we can rewrite the anonymous class seen above with simpler lambda syntax.
Another difference from above: Virtual threads do not have a name. So we switch to using the id number of the thread to differentiate between threads running.
try
(
ExecutorService executorService = Executors.newVirtualThreadExecutor() ;
)
{
Runnable task = ( ) -> {
for ( int i = 0 ; i < 5 ; i++ )
{
System.out.println( "i = " + i + " at " + Instant.now() + " on thread " + Thread.currentThread().getId() );
}
};
executorService.submit( task );
executorService.submit( task );
executorService.submit( task );
}
// At this point, the flow-of-control blocks until all submitted tasks are done.
// And the executor service is also shutdown by this point.
When run.
i = 0 at 2021-01-06T05:41:36.628800Z on thread 17
i = 1 at 2021-01-06T05:41:36.647428Z on thread 17
i = 2 at 2021-01-06T05:41:36.647626Z on thread 17
i = 3 at 2021-01-06T05:41:36.647828Z on thread 17
i = 4 at 2021-01-06T05:41:36.647902Z on thread 17
i = 0 at 2021-01-06T05:41:36.628842Z on thread 14
i = 1 at 2021-01-06T05:41:36.648148Z on thread 14
i = 2 at 2021-01-06T05:41:36.648227Z on thread 14
i = 3 at 2021-01-06T05:41:36.648294Z on thread 14
i = 4 at 2021-01-06T05:41:36.648365Z on thread 14
i = 0 at 2021-01-06T05:41:36.628837Z on thread 16
i = 1 at 2021-01-06T05:41:36.648839Z on thread 16
i = 2 at 2021-01-06T05:41:36.648919Z on thread 16
i = 3 at 2021-01-06T05:41:36.648991Z on thread 16
i = 4 at 2021-01-06T05:41:36.649054Z on thread 16
Sharing state across threads
If you really wanted to share values across the threads, you define them outside the immediate task code.
In this next example, we define a class Counter that implements Runnable. As a Runnable we can pass an instance of this class to an executor service. We defined a member field, a ConcurrentMap (a thread-safe Map) that tracks each of our desired numbers 0-4. For each of those five numbers, we map to the id number of the virtual thread that was able to beat the other virtual threads to submitting that entry into our originally-empty map.
Be aware that we are submitting a single Counter object to all three threads. So all three threads have access to the very same ConcurrentMap object. That is why we must use a ConcurrentMap rather than a plain Map. Any resource shared across threads must be built to be thread-safe.
We are calling Thread.sleep to try to mix things up. Otherwise, the first thread might finish all the work while the main thread is still submitting to the second and third threads.
package work.basil.example;
import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.*;
public class Counter implements Runnable
{
public ConcurrentMap < Integer, Long > results = new ConcurrentHashMap <>();
#Override
public void run ( )
{
try { Thread.sleep( Duration.ofMillis( 100 ) ); } catch ( InterruptedException e ) { e.printStackTrace(); }
Long threadId = Thread.currentThread().getId(); // ID of this thread.
for ( int i = 0 ; i < 5 ; i++ )
{
// Shake things up by waiting some random time.
try { Thread.sleep( Duration.ofMillis( ThreadLocalRandom.current().nextInt(1, 100) ) ); } catch ( InterruptedException e ) { e.printStackTrace(); }
results.putIfAbsent( i , threadId ); // Auto-boxing converts the `int` value of `i` to be wrapped as a `Integer` object.
}
}
}
Here is a main method to make our exercise happen.
public static void main ( String[] args )
{
Counter counter = new Counter();
try
(
ExecutorService executorService = Executors.newVirtualThreadExecutor() ;
)
{
executorService.submit( counter );
executorService.submit( counter );
executorService.submit( counter );
}
// At this point, the flow-of-control blocks until all submitted tasks are done.
// And the executor service is also shutdown by this point.
System.out.println( "counter.results = " + counter.results );
}
In the results of this particular run, we can see that the two threads number 16 and 17 had all the success in putting entries into our map. The third thread was not able to be the first to put in any of the five entries.
counter.results = {0=16, 1=17, 2=17, 3=16, 4=16}
Try to do some various tests and see by yourself what is coming and from where
public class ThreadApp {
public static void main(String args[]) throws InterruptedException {
ThreadDemo thread1 = new ThreadApp().new ThreadDemo("t1",4);
ThreadDemo thread2 = new ThreadApp().new ThreadDemo("t2",7);
thread2.start();
thread1.start();
ThreadDemo thread3 = new ThreadApp().new ThreadDemo("t3",2);
// wait till t1 &t2 finish run then launch t3
thread1.join();
thread2.join();
thread3.start();
}
class ThreadDemo extends Thread {
int stop;
public ThreadDemo(String name, int stop) {
super(name);
this.stop = stop;
}
public void run() {
for (int i = 0; i < stop; i++) {
System.out.println(this.getName() + ":" + i);
}
}
}
}
Possible Output:
t2:0
t2:1
t1:0
t2:2
t1:1
t2:3
t1:2
t2:4
t1:3
t2:5
t2:6
//due to join t3 start only after t1 & t2 finish their run
t3:0
t3:1
Related to benefits, just one hint Producer-Consumer problem ...
I have a function that is iterating the list using parallelStream in forEach is then calling an API with the the item as param. I am then storing the result in a hashMap.
try {
return answerList.parallelStream()
.map(answer -> getReplyForAnswerCombination(answer))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
} catch (final NullPointerException e) {
log.error("Error in generating final results.", e);
return null;
}
When I run it on laptop 1, it takes 1 hour.
But on laptop 2, it takes 5 hours.
Doing some basic research I found that the parallel streams use the default ForkJoinPool.commonPool which by default has one less threads as you have processors.
Laptop1 and laptop2 have different processors.
Is there a way to find out how many streams that can run parallelly on Laptop1 and Laptop2?
Can I use the suggestion given here to safely increase the number of parallel streams in laptop2?
long start = System.currentTimeMillis();
IntStream s = IntStream.range(0, 20);
System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "20");
s.parallel().forEach(i -> {
try { Thread.sleep(100); } catch (Exception ignore) {}
System.out.print((System.currentTimeMillis() - start) + " ");
});
Project Loom
If you want maximum performance on threaded code that blocks (as opposed to CPU-bound code), then use virtual threads (fibers) provided in Project Loom. Preliminary builds are available now, based on early-access Java 16.
Virtual threads
Virtual threads can be dramatically faster because a virtual thread is “parked” while blocked, set aside, so another virtual thread can make progress. This is so efficient for blocking tasks that threads can number in the millions.
Drop the streams approach. Merely send off each input to a virtual thread.
Full example code
Let's define classes for Answer and Reply, our inputs & outputs. We will use record, a new feature coming to Java 16, as an abbreviated way to define an immutable data-driven class. The compiler implicitly creates default implementations of constructor, getters, equals & hashCode, and toString.
public record Answer (String text)
{
}
…and:
public record Reply (String text)
{
}
Define our task to be submitted to an executor service. We write a class named ReplierTask that implements Runnable (has a run method).
Within the run method, we sleep the current thread to simulate waiting for a call to a database, file system, and/or remote service.
package work.basil.example;
import java.time.Duration;
import java.time.Instant;
import java.util.UUID;
import java.util.concurrent.ConcurrentMap;
public class ReplierTask implements Runnable
{
private Answer answer;
ConcurrentMap < Answer, Reply > map;
public ReplierTask ( Answer answer , ConcurrentMap < Answer, Reply > map )
{
this.answer = answer;
this.map = map;
}
private Reply getReplyForAnswerCombination ( Answer answer )
{
// Simulating a call to some service to produce a `Reply` object.
try { Thread.sleep( Duration.ofSeconds( 1 ) ); } catch ( InterruptedException e ) { e.printStackTrace(); } // Simulate blocking to wait for call to service or db or such.
return new Reply( UUID.randomUUID().toString() );
}
// `Runnable` interface
#Override
public void run ( )
{
System.out.println( "`run` method at " + Instant.now() + " for answer: " + this.answer );
Reply reply = this.getReplyForAnswerCombination( this.answer );
this.map.put( this.answer , reply );
}
}
Lastly, some code to do the work. We make a class named Mapper that contains a main method.
We simulate some input by populating an array of Answer objects. We create an empty ConcurrentMap in which to collect the results. And we assign each Answer object to a new thread where we call for a new Reply object and store the Answer/Reply pair as an entry in the map.
package work.basil.example;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;
public class Mapper
{
public static void main ( String[] args )
{
System.out.println("Runtime.version(): " + Runtime.version() );
System.out.println("availableProcessors: " + Runtime.getRuntime().availableProcessors());
System.out.println("maxMemory: " + Runtime.getRuntime().maxMemory() + " | maxMemory/(1024*1024) -> megs: " +Runtime.getRuntime().maxMemory()/(1024*1024) );
Mapper app = new Mapper();
app.demo();
}
private void demo ( )
{
// Simulate our inputs, a list of `Answer` objects.
int limit = 10_000;
List < Answer > answers = new ArrayList <>( limit );
for ( int i = 0 ; i < limit ; i++ )
{
answers.add( new Answer( String.valueOf( i ) ) );
}
// Do the work.
Instant start = Instant.now();
System.out.println( "Starting work at: " + start + " on count of tasks: " + limit );
ConcurrentMap < Answer, Reply > results = new ConcurrentHashMap <>();
try
(
ExecutorService executorService = Executors.newVirtualThreadExecutor() ;
// Executors.newFixedThreadPool( 5 )
// Executors.newFixedThreadPool( 10 )
// Executors.newFixedThreadPool( 1_000 )
// Executors.newVirtualThreadExecutor()
)
{
for ( Answer answer : answers )
{
ReplierTask task = new ReplierTask( answer , results );
executorService.submit( task );
}
}
// At this point the flow-of-control blocks until all submitted tasks are done.
// The executor service is automatically closed by this point as well.
Duration elapsed = Duration.between( start , Instant.now() );
System.out.println( "results.size() = " + results.size() + ". Elapsed: " + elapsed );
}
}
We can change out the Executors.newVirtualThreadExecutor() with a pool of platform threads, to compare against virtual threads. Let's try a pool of 5, 10, and 1,000 platform threads on a Mac mini Intel with macOS Mojave sporting 6 real cores, no hyper-threading, 32 gigs of memory, and OpenJDK special build version 16-loom+9-316 assigned maxMemory of 8 gigs.
10,000 tasks at 1 second each
Total elapsed time
5 platform threads
half-hour — PT33M29.755792S
10 platform threads
quarter-hour — PT16M43.318973S
1,000 platform threads
10 seconds — PT10.487689S
10,000 platform threads
Error…unable to create native thread: possibly out of memory or process/resource limits reached
virtual threads
Under 3 seconds — PT2.645964S
Caveats
Caveat: Project Loom is experimental and subject to change, not intended for production use yet. The team is asking for folks to give feedback now.
Caveat: CPU-bound tasks such as encoding video should stick with platform/kernel threads rather than virtual threads. Most common code doing blocking operations such as I/O, like accessing files, logging, hitting a database, or making network calls, will likely see massive performance boosts with virtual threads.
Caveat: You must have enough memory available for many or even all of your tasks to be running simultaneously. If not enough memory will be available, you must take additional steps to throttle the virtual threads.
The setting java.util.concurrent.ForkJoinPool.common.parallelism will have an effect on the threads available to use for operations which make use of the ForkJoinPool, such as Stream.parallel(). However: whether your task uses more threads depends on the number of items in the stream, and whether it takes less time to run depends on the nature of each task and your available processors.
This test program shows the effect of changing this system property with a trivial task:
public static void main(String[] args) {
ConcurrentHashMap<String,String> threads = new ConcurrentHashMap<>();
int max = Integer.parseInt(args[0]);
boolean parallel = args.length < 2 || !"single".equals(args[1]);
int [] arr = IntStream.range(0, max).toArray();
long start = System.nanoTime();
IntStream stream = Arrays.stream(arr);
if (parallel)
stream = stream.parallel();
stream.forEach(i -> {
threads.put("hc="+Thread.currentThread().hashCode()+" tn="+Thread.currentThread().getName(), "value");
});
long end = System.nanoTime();
System.out.println("parallelism: "+System.getProperty("java.util.concurrent.ForkJoinPool.common.parallelism"));
System.out.println("Threads: "+threads.keySet());
System.out.println("Array size: "+arr.length+" threads used: "+threads.size()+" ms="+TimeUnit.NANOSECONDS.toMillis(end-start));
}
Adding more threads won't necessarily speed things up. Here are some examples from test run to count the threads used. It may help you decide on best approach for your own task contained in getReplyForAnswerCombination().
java -cp example.jar -Djava.util.concurrent.ForkJoinPool.common.parallelism=1000 App 100000
Array size: 100000 threads used: 37
java -cp example.jar -Djava.util.concurrent.ForkJoinPool.common.parallelism=50 App 100000
Array size: 100000 threads used: 20
java -cp example.jar APP 100000 single
Array size: 100000 threads used: 1
I suggest you see the thread pooling (with or without LOOM) in #Basil Bourque answer and also the JDK source code of the ForkJoinPool constructor has some details on this system property.
private ForkJoinPool(byte forCommonPoolOnly)
In Java, I need to get two parameters, Runnable and Delay in milliseconds, and need to run the Runnable in the delay time. This needs to be run in single thread, and if the method is invoked with different parameter values and the previous task has not finished, it should be kept in a queue.
public void runScheduledTask(Runnable task, long delay) {
// ...
}
...
runScheduledTask(task1, 10); // at 00:00:00.000
runScheduledTask(task2, 10); // at 00:00:00.005
when the method is invoked with task2, task1 has not started/finished yet as there is delay 10, so task2 should be stored in queue.
Is there a way that I can check if the task1 has completed in this case?
As this should be run in the current thread, I have no idea what classes or tools I can use.
tl;dr
is there a way to check if Runnable is completed in single thread?
Yes.
Run your Runnable with a ScheduledExecutorService provided by a call to Executors.newSingleThreadScheduledExecutor.
Capture the returned ScheduledFuture object.
Interrogate that ScheduledFuture object for its completion status, asking if it is done or is cancelled.
Executors framework
Modern Java offers the Executors framework to nicely manage spinning off tasks onto other threads. See tutorial by Oracle.
The Executors class (plural in name) has factory methods for producing an ExecutorService object.
You want a delay in execution rather than immediate execution. So than means you want a ScheduledExecutorService.
ScheduledExecutorService scheduledExecutorService = …
You asked for single-threaded execution. That means you want an executor backed by a thread pool sized to a single thread rather than several.
ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor() ;
Tell that scheduled executor service to run your Runnable after your desired delay. Call the ScheduledExecutorService::schedule method.
If the executors framework were being written today, the delay would be represented as a java.time.Duration object. But java.time was not yet invented. So a delay is represented as some number along with a TimeUnit object to designate the granularity of time. You did not specify your desired granularity, so I'll go with seconds.
scheduledExecutorService.schedule( runnable , 10 , TimeUnit.SECONDS );
Put all that together.
Runnable runnable = ( ) -> {
System.out.println( "Running the runnable at " + Instant.now() );
};
ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
scheduledExecutorService.schedule( runnable , 10 , TimeUnit.SECONDS ); // Wait ten seconds before running the runnable.
// Wait long enough on this `main` thread for the background thread to do its thing.
try
{
Thread.sleep( Duration.ofSeconds( 12 ).toMillis() );
}
catch ( InterruptedException e )
{
e.printStackTrace();
}
// Clean-up.
scheduledExecutorService.shutdown(); // Always shutdown your executor service when no longer needed, or when ending your app.
System.out.println( "Done at " + Instant.now() );
Notice how we gracefully shutdown the executor service. Otherwise the thread(s) backing the executor might continuing running zombie-like indefinitely after our app has ended.
See this code run live at IdeOne.com.
Running the runnable at 2020-11-16T05:30:04.106661Z
Done at 2020-11-16T05:30:06.105624Z
Check for completion
You asked:
Is there a way that I can check if the task1 has completed
Yes.
The ScheduledExecutorService::schedule method seen above returns a ScheduledFuture object. We can interrogate that returned ScheduledFuture object for its completion status.
Our code above ignores that returned object. Let's alter our code to capture the returned object.
Runnable runnable = ( ) -> {
System.out.println( "Running the runnable at " + Instant.now() );
};
ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
ScheduledFuture future = scheduledExecutorService.schedule( runnable , 10 , TimeUnit.SECONDS ); // Wait ten seconds before running the runnable.
try
{
Thread.sleep( Duration.ofSeconds( 4 ).toMillis() ); // Wait a short while, then see if runnable completed.
System.out.println( "isDone: " + future.isDone() + " | " + "isCancelled: " + future.isCancelled() );
Thread.sleep( Duration.ofSeconds( 4 ).toMillis() ); // Wait a short while, then see if runnable completed.
System.out.println( "isDone: " + future.isDone() + " | " + "isCancelled: " + future.isCancelled() );
Thread.sleep( Duration.ofSeconds( 4 ).toMillis() ); // Wait a short while, then see if runnable completed.
System.out.println( "isDone: " + future.isDone() + " | " + "isCancelled: " + future.isCancelled() );
}
catch ( InterruptedException e )
{
e.printStackTrace();
}
// Clean-up.
scheduledExecutorService.shutdown();
System.out.println( "Done at " + Instant.now() );
When run.
isDone: false | isCancelled: false
isDone: false | isCancelled: false
Running the runnable at 2020-11-16T05:44:07.803125Z
isDone: true | isCancelled: false
Done at 2020-11-16T05:44:09.821267Z
I have threads which are given random number (1 to n) and are instructed to print them in sorted order. I used semaphore such that I acquire the number of permits = random number and release one permit more than what was acquired.
acquired = random number; released = 1+random number
Initial permit count for semaphore is 1. So thread with random number 1 should get permit and then 2 and so on.
This is supported as per the documentation given below
There is no requirement that a thread that releases a permit must have acquired that permit by calling acquire().
The problem is my program gets stuck after 1 for n>2.
My program is given below:
import java.util.concurrent.Semaphore;
public class MultiThreading {
public static void main(String[] args) {
Semaphore sem = new Semaphore(1,false);
for(int i=5;i>=1;i--)
new MyThread(i, sem);
}
}
class MyThread implements Runnable {
int var;Semaphore sem;
public MyThread(int a, Semaphore s) {
var =a;sem=s;
new Thread(this).start();
}
#Override
public void run() {
System.out.println("Acquiring lock -- "+var);
try {
sem.acquire(var);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(var);
System.out.println("Releasing lock -- "+var);
sem.release(var+1);
}
}
Output is :
Acquiring lock -- 4
Acquiring lock -- 5
Acquiring lock -- 3
Acquiring lock -- 2
Acquiring lock -- 1
1
Releasing lock -- 1
While If I modify my code with tryAcquire, it runs perfectly well.
Below is new run implementation
#Override
public void run() {
boolean acquired = false;
while(!acquired) {
acquired = sem.tryAcquire(var);
}
System.out.println(var);
sem.release(var+1);
}
Can someone please explain the semaphore's permit acquire mechanism when mulitple threads are waiting with different permit request??
It's a clever strategy, but you're misunderstanding how Sempahore hands out permits. If you run your code enough times you'll actually see it reach step two:
Acquiring lock -- 5
Acquiring lock -- 1
1
Releasing lock -- 1
Acquiring lock -- 3
Acquiring lock -- 2
2
Acquiring lock -- 4
Releasing lock -- 2
If you keep on re-running it enough times you'd actually see it successfully finish. This happens because of how Semaphore hands out permits. You're assuming Semaphore will try to accommodate an acquire() call as soon as it has enough permits to do so. If we look carefully at the documentation for Semaphore.aquire(int) we'll see that is not the case (emphasis mine):
If insufficient permits are available then the current thread becomes disabled for thread scheduling purposes and lies dormant until ... some other thread invokes one of the release methods for this semaphore, the current thread is next to be assigned permits and the number of available permits satisfies this request.
In other words Semaphore keeps a queue of pending acquire request and, upon each call to .release(), only checks the head of the queue. In particular if you enable fair queuing (set the second constructor argument to true) you'll see even step one doesn't occur, because step 5 is (usually) the first in the queue and even new acquire() calls that could be fulfilled will be queued up behind the other pending calls.
In short this means you cannot rely on .acquire() to return as soon as possible, as your code assumes.
By using .tryAcquire() in a loop instead you avoid making any blocking calls (and therefore put a lot more load on your Semaphore) and as soon as the necessary number of permits becomes available a tryAcquire() call will successfully obtain them. This works but is wasteful.
Picture a wait-list at a restaurant. Using .aquire() is like putting your name on the list and waiting to be called. It may not be perfectly efficient, but they'll get to you in a (reasonably) fair amount of time. Imagine instead if everyone just shouted at the host "Do you have a table for n yet?" as often as they could - that's your tryAquire() loop. It may still work out (as it does in your example) but it's certainly not the right way to go about it.
So what should you do instead? There's a number of possibly useful tools in java.util.concurrent, and which is best somewhat depends on what exactly you're trying to do. Seeing as you're effectively having each thread start the next one I might use a BlockingQueue as the synchronization aid, pushing the next step into the queue each time. Each thread would then poll the queue, and if it's not the activated thread's turn replace the value and wait again.
Here's an example:
public class MultiThreading {
public static void main(String[] args) throws Exception{
// Use fair queuing to prevent an out-of-order task
// from jumping to the head of the line again
// try setting this to false - you'll see far more re-queuing calls
BlockingQueue<Integer> queue = new ArrayBlockingQueue<>(1, true);
for (int i = 5; i >= 1; i--) {
Thread.sleep(100); // not necessary, just helps demonstrate the queuing behavior
new MyThread(i, queue).start();
}
queue.add(1); // work starts now
}
static class MyThread extends Thread {
int var;
BlockingQueue<Integer> queue;
public MyThread(int var, BlockingQueue<Integer> queue) {
this.var = var;
this.queue = queue;
}
#Override
public void run() {
System.out.println("Task " + var + " is now pending...");
try {
while (true) {
int task = queue.take();
if (task != var) {
System.out.println(
"Task " + var + " got task " + task + " instead - re-queuing");
queue.add(task);
} else {
break;
}
}
} catch (InterruptedException e) {
// If a thread is interrupted, re-mark the thread interrupted and terminate
Thread.currentThread().interrupt();
return;
}
System.out.println("Finished task " + var);
System.out.println("Registering task " + (var + 1) + " to run next");
queue.add(var + 1);
}
}
}
This prints the following and terminates successfully:
Task 5 is now pending...
Task 4 is now pending...
Task 3 is now pending...
Task 2 is now pending...
Task 1 is now pending...
Task 5 got task 1 instead - re-queuing
Task 4 got task 1 instead - re-queuing
Task 3 got task 1 instead - re-queuing
Task 2 got task 1 instead - re-queuing
Finished task 1
Registering task 2 to run next
Task 5 got task 2 instead - re-queuing
Task 4 got task 2 instead - re-queuing
Task 3 got task 2 instead - re-queuing
Finished task 2
Registering task 3 to run next
Task 5 got task 3 instead - re-queuing
Task 4 got task 3 instead - re-queuing
Finished task 3
Registering task 4 to run next
Task 5 got task 4 instead - re-queuing
Finished task 4
Registering task 5 to run next
Finished task 5
Registering task 6 to run next
The Javadoc for Semaphore.acquire(int) says:
If insufficient permits are available then the current thread becomes
disabled for thread scheduling purposes and lies dormant until one of
two things happens:
Some other thread invokes one of the release methods for this semaphore,
the current thread is next to be assigned permits and the number of
available permits satisfies this request [or the thread is interrupted].
The thread that is "next to be assigned" is probably thread 4 in your example. It is waiting until there are 4 permits available. However, thread 1, which gets a permit upon calling acquire(), only releases 2 permits, which is not enough to unblock thread 4. Meanwhile, thread 2, which is the only thread for which there are sufficient permits, is not the next to be assigned, so it doesn't get the permits.
Your modified code runs fine because the threads don't block when they try to get a semaphore; they just try again, going to the back of the line. Eventually thread 2 reaches the front of the line and is thus next to be assigned, and so gets its permits.
I wrote some Java code to learn more about the Executor framework.
Specifically, I wrote code to verify the Collatz Hypothesis - this says that if you iteratively apply the following function to any integer, you get to 1 eventually:
f(n) = ((n % 2) == 0) ? n/2 : 3*n + 1
CH is still unproven, and I figured it would be a good way to learn about Executor. Each thread is assigned a range [l,u] of integers to check.
Specifically, my program takes 3 arguments - N (the number to which I want to check CH), RANGESIZE (the length of the interval that a thread has to process), and NTHREAD, the size of the threadpool.
My code works fine, but I saw much less speedup that I expected - of the order of 30% when I went from 1 to 4 threads.
My logic was that the computation is completely CPU bound, and each subtask (checking CH for a fixed size range) is takes roughly the same time.
Does anyone have ideas as to why I'm not seeing a 3 to 4x increase in speed?
If you could report your runtimes as you increase the number of thread (along with the machine, JVM and OS) that would also be great.
Specifics
Runtimes:
java -d64 -server -cp . Collatz 10000000 1000000 4 => 4 threads, takes 28412 milliseconds
java -d64 -server -cp . Collatz 10000000 1000000 1 => 1 thread, takes 38286 milliseconds
Processor:
Quadcore Intel Q6600 at 2.4GHZ, 4GB. The machine is unloaded.
Java:
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b03)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02, mixed mode)
OS:
Linux quad0 2.6.26-2-amd64 #1 SMP Tue Mar 9 22:29:32 UTC 2010 x86_64 GNU/Linux
Code: (I can't get the code to post, I think it's too long for SO requirements, the source is available on Google Docs
import java.math.BigInteger;
import java.util.Date;
import java.util.List;
import java.util.ArrayList;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
class MyRunnable implements Runnable {
public int lower;
public int upper;
MyRunnable(int lower, int upper) {
this.lower = lower;
this.upper = upper;
}
#Override
public void run() {
for (int i = lower ; i <= upper; i++ ) {
Collatz.check(i);
}
System.out.println("(" + lower + "," + upper + ")" );
}
}
public class Collatz {
public static boolean check( BigInteger X ) {
if (X.equals( BigInteger.ONE ) ) {
return true;
} else if ( X.getLowestSetBit() == 1 ) {
// odd
BigInteger Y = (new BigInteger("3")).multiply(X).add(BigInteger.ONE);
return check(Y);
} else {
BigInteger Z = X.shiftRight(1); // fast divide by 2
return check(Z);
}
}
public static boolean check( int x ) {
BigInteger X = new BigInteger( new Integer(x).toString() );
return check(X);
}
static int N = 10000000;
static int RANGESIZE = 1000000;
static int NTHREADS = 4;
static void parseArgs( String [] args ) {
if ( args.length >= 1 ) {
N = Integer.parseInt(args[0]);
}
if ( args.length >= 2 ) {
RANGESIZE = Integer.parseInt(args[1]);
}
if ( args.length >= 3 ) {
NTHREADS = Integer.parseInt(args[2]);
}
}
public static void maintest(String [] args ) {
System.out.println("check(1): " + check(1));
System.out.println("check(3): " + check(3));
System.out.println("check(8): " + check(8));
parseArgs(args);
}
public static void main(String [] args) {
long lDateTime = new Date().getTime();
parseArgs( args );
List<Thread> threads = new ArrayList<Thread>();
ExecutorService executor = Executors.newFixedThreadPool( NTHREADS );
for( int i = 0 ; i < (N/RANGESIZE); i++) {
Runnable worker = new MyRunnable( i*RANGESIZE+1, (i+1)*RANGESIZE );
executor.execute( worker );
}
executor.shutdown();
while (!executor.isTerminated() ) {
}
System.out.println("Finished all threads");
long fDateTime = new Date().getTime();
System.out.println("time in milliseconds for checking to " + N + " is " +
(fDateTime - lDateTime ) +
" (" + N/(fDateTime - lDateTime ) + " per ms)" );
}
}
Busy waiting can be a problem:
while (!executor.isTerminated() ) {
}
You can use awaitTermination() instead:
while (!executor.awaitTermination(1, TimeUnit.SECONDS)) {}
You are using BigInteger. It consumes a lot of register space. What you most likely have on the compiler level is register spilling that makes your process memory-bound.
Also note that when you are timing your results you are not taking into account extra time taken by the JVM to allocate threads and work with the thread pool.
You could also have memory conflicts when you are using constant Strings. All strings are stored in a shared string pool and so it may become a bottleneck, unless java is really clever about it.
Overall, I wouldn't advise using Java for this kind of stuff. Using pthreads would be a better way to go for you.
As #axtavt answered, busy waiting can be a problem. You should fix that first, as it is part of the answer, but not all of it. It won't appear to help in your case (on Q6600), because it seems to be bottlenecked at 2 cores for some reason, so another is available for the busy loop and so there is no apparent slowdown, but on my Core i5 it speeds up the 4-thread version noticeably.
I suspect that in the case of the Q6600 your particular app is limited by the amount of shared cache available or something else specific to the architecture of that CPU. The Q6600 has two 4MB L2 caches, which means CPUs are sharing them, and no L3 cache. On my core i5, each CPU has a dedicated L2 cache (256K, then there is a larger 8MB shared L3 cache. 256K more per-CPU cache might make a difference... otherwise something else architecture wise does.
Here is a comparison of a Q6600 running your Collatz.java, and a Core i5 750.
On my work PC, which is also a Q6600 # 2.4GHz like yours, but with 6GB RAM, Windows 7 64-bit, and JDK 1.6.0_21* (64-bit), here are some basic results:
10000000 500000 1 (avg of three runs): 36982 ms
10000000 500000 4 (avg of three runs): 21252 ms
Faster, certainly - but not completing in quarter of the time like you would expect, or even half... (though it is roughly just a bit more than half, more on that in a moment). Note in my case I halved the size of the work units, and have a default max heap of 1500m.
At home on my Core i5 750 (4 cores no hyperthreading), 4GB RAM, Windows 7 64-bit, jdk 1.6.0_22 (64-bit):
10000000 500000 1 (avg of 3 runs) 32677 ms
10000000 500000 4 (avg of 3 runs) 8825 ms
10000000 500000 4 (avg of 3 runs) 11475 ms (without the busy wait fix, for reference)
the 4 threads version takes 27% of the time the 1 thread version takes when the busy-wait loop is removed. Much better. Clearly the code can make efficient use of 4 cores...
NOTE: Java 1.6.0_18 and later have modified default heap settings - so my default heap size is almost 1500m on my work PC, and around 1000m on my home PC.
You may want to increase your default heap, just in case garbage collection is happening and slowing your 4 threaded version down a bit. It might help, it might not.
At least in your example, there's a chance your larger work unit size is skewing your results slightly...halving it may help you get closer to at least 2x the speed since 4 threads will be kept busy for a longer portion of the time. I don't think the Q6600 will do much better at this particular task...whether it is cache or some other inherent architecture thing.
In all cases, I am simply running "java Collatz 10000000 500000 X", where x = # of threads indicated.
The only changes I made to your java file were to make one of the println's into a print, so there were less linebreaks for my runs with 500000 per work unit so I could see more results in my console at once, and I ditched the busy wait loop, which matters on the i5 750 but didn't make a difference on the Q6600.
You can should try using the submit function and then watching the Future's that are returning checking them to see if the thread has finished.
Terminate doesn't return until there is a shutdown.
Future submit(Runnable task)
Submits a Runnable task for execution and returns a Future representing that task.
isTerminated()
Returns true if all tasks have completed following shut down.
Try this...
public static void main(String[] args) {
long lDateTime = new Date().getTime();
parseArgs(args);
List<Thread> threads = new ArrayList<Thread>();
List<Future> futures = new ArrayList<Future>();
ExecutorService executor = Executors.newFixedThreadPool(NTHREADS);
for (int i = 0; i < (N / RANGESIZE); i++) {
Runnable worker = new MyRunnable(i * RANGESIZE + 1, (i + 1) * RANGESIZE);
futures.add(executor.submit(worker));
}
boolean done = false;
while (!done) {
for(Future future : futures) {
done = true;
if( !future.isDone() ) {
done = false;
break;
}
}
try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
System.out.println("Finished all threads");
long fDateTime = new Date().getTime();
System.out.println("time in milliseconds for checking to " + N + " is " +
(fDateTime - lDateTime) +
" (" + N / (fDateTime - lDateTime) + " per ms)");
System.exit(0);
}