Java Multithreading return value - java

I have a code that loop a process, the code is like this:
for (int z = 0; z < m_ID.length; z++) {
expretdata = expret.Get_Expected_Return(sStartDate, sEndDate, m_ID[z], sBookName, nHistReturn,nMarketReturn, nCustomReturn);
m_Alpha[z] = expretdata;
}
Get_Expected_Return() is an expensive method that take too long. So if M_ID.length more than 200, it will take a hour to complete the task.
I want to optimize it with multithread. I tried to save the return value to Map static global variable, and reorder it with key. Because I need data to be ordered by index of M_ID.length.
But, when I try to run the multithread some of threads return value = NULL, it looks like the thread doesn't run the method.
Is multithread the right way to do it? or give me any advice to optimize it.

Multithreaded can be very useful if your expensive methods are independent and don't use too much of a shared singular resource such as a single hard drive.
Your use case of ordered results can be solved using Callables and Futures:
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
class Test
{
static final int CALLS = 10;
static int slowMethod(int n) throws InterruptedException
{
Thread.sleep(1000);
return n+1;
}
public static void main(String[] args) throws InterruptedException, ExecutionException
{
ExecutorService executor = Executors.newCachedThreadPool();
List<Future<Integer>> futures = new ArrayList<>();
for (int i = 0; i < CALLS ; i++)
{
final int finali = i;
futures.add(executor.submit(()->slowMethod(finali)));
}
for(Future<Integer> f: futures) {System.out.print(f.get());}
executor.shutdown();
}
}

If you are using Java 8 or above version, you can use parallelStream.
m_Alpha = m_ID.parallelStream()
.map( z => {
return expret.Get_Expected_Return(sStartDate, sEndDate, m_ID[z],
sBookName, nHistReturn,nMarketReturn, nCustomReturn);
})
.toArray(Integer[]::new);
Type of constructor that is provided toArray method should be the same with the type of m_Alpha.

You can use CompletableFuture to execute this task in parallel. Here is an example.
// lets define a wrapper class which is responsible to put calculated data into the array->
private void longExecution(int index, DataType m_Alpah, ... sStartDate, sEndDate, m_ID_index_z, sBookName, nHistReturn,nMarketReturn, nCustomReturn){
m_Alpha[index] = expret.Get_Expected_Return(sStartDate, sEndDate, m_ID_index_z, sBookName, nHistReturn,nMarketReturn, nCustomReturn);
}
// Now from your code:
...
CompletableFuture[] futures = new CompletableFuture[m_ID.length];
for (int z = 0; z < m_ID.length; z++) {
CompletableFuture.supplyAsync(() ->
longExecution(z, m_Alpah, sStartDate, sEndDate, m_ID[z], sBookName, nHistReturn,nMarketReturn, nCustomReturn));
);
}
// waiting for completing all of the futures.
CompletableFuture.allOf(futures).join();
// After this line:
//m_Alpha <- array will hold the result.

Related

How to use CompletableFuture to use result of first Callable task as arg to all subsequent Callable tasks?

How to use CompletableFuture to use result of first Callable task as arg to all subsequent Callable tasks? I have 3 tasks that need to run like so:
First blocking task runs and returns a value
2nd and 3rd tasks run asysnchronously with argument supplied from first task and return values.
All 3 values summed up as a final result from all of it.
I tried to do this below, but I am stuck on the .thenApply clause.
I can't quite get this code to work. IN the .thenApply clause, how do I pass an argument from the object response returned?
import com.google.common.util.concurrent.Uninterruptibles;
import java.util.concurrent.*;
public class ThreadPoolTest {
static ExecutorService threadPool = Executors.newFixedThreadPool(10);
public static void main(String[] args) {
CompletableFuture<SumCalculator> cf =
CompletableFuture.supplyAsync(() -> new SumCalculator(100000), threadPool);
Integer initialResult = cf.getNow(null).call();
CompletableFuture<SumCalculator> cf2 = CompletableFuture.completedFuture(initialResult)
.thenApplyAsync((i) -> new SumCalculator(i));
// i want to call 2 or more SumCalulator tasks here
System.out.println("DONE? " + cf2.isDone());
System.out.println("message? " + cf2.getNow(null).call());
threadPool.shutdown();
System.out.println("Program exit.");
}
public static class SumCalculator implements Callable<Integer> {
private int n;
public SumCalculator(int n) {
this.n = n;
}
public Integer call() {
int sum = 0;
for (int i = 1; i <= n; i++) {
sum += i;
}
Uninterruptibles.sleepUninterruptibly(800, TimeUnit.MILLISECONDS);
return sum;
}
}
}
NOTE: I do want to collect the responses from all 3 tasks together at the end of the Futures as a combined result list, perhaps as a stream of Integer values? In this case, I would want to sum the values. I am wanting to do this for a performance benefit with multiple threads.
If I understood correctly:
CompletableFuture<Integer> one =
CompletableFuture.supplyAsync(() -> new SumCalculator(100000).call(), threadPool);
CompletableFuture<Integer> two = one.thenApplyAsync(x -> new SumCalculator(x).call(), threadPool);
CompletableFuture<Integer> three = one.thenApplyAsync(x -> new SumCalculator(x).call(), threadPool);
Integer result = one.join() + two.join() + three.join();
System.out.println(result);

Parallel counting - Java

I do not have a background in CS. I am really new to parallel programming and I do not know how exactly the hardware works when running a program. However, I have noticed the following. Say I have:
public class Counter {
private static int parallelCount = 0;
private static int sequentialCount = 0;
public static void main(String[] args) {
int n = 1000;
// I count in parallel:
IntStream.range(0, n).parallel().forEach(i -> {
parallelCount++;
});
// I count sequentially:
for (int i = 0; i < n; i++) {
sequentialCount++;
}
System.out.println("parallelCount = " + parallelCount);
System.out.println("sequentialCount = " + sequentialCount);
}
}
why I may get:
parallelCount = 984
sequentialCount = 1000
I guess this has to do with the hardware and the way the compiler access memory. I am really interested to know why this happens. And what is one possible solution?
Whenever more than one threads can access a value that is mutable then the system goes out of sync meaning the kind of problem that you are facing. No one can be sure what the result will be, and many a times the result will be wrong. You cannot guarantee which thread will write the value last.
Therefore, you need to synchronize the access to the shared resource (the integer you are incrementing) so that all threads get the latest updated value and the answer is always correct.
Coming to your program you can try making the parallelCount variable an Atomic Integer like AtomicInteger parallelCount = new AtomicInteger(); An Atomic Integer is thread safe meaning that they can be concurrently updated without running the system out of sync.
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.IntStream;
public class Counter {
private static AtomicInteger parallelCount = new AtomicInteger();
private static int sequentialCount = 0;
public static void main(String[] args) {
int n = 1000;
// I count in parallel:
IntStream.range(0, n).parallel().forEach(i -> {
parallelCount.getAndIncrement();
});
// I count sequentially:
for (int i = 0; i < n; i++) {
sequentialCount++;
}
System.out.println("parallelCount = " + parallelCount);
System.out.println("sequentialCount = " + sequentialCount);
}
}
As you can expect standard for loop will increment sequentialCount 1000 times
Regarding parallel stream, the application will try to open multiple threads which need to execute your function on parallel. In this situation, multiple threads can increment value at the same time and store value to int.
For example, suppose that we have two threads that working parallel and want to increment value from variable parallelCount. If parallelCount contains value 50. Both threads will read value 50 and calculate the new value 51 and store it to memory.
This approach can produce other concurrent problems. In order to solve this problem, you can use synchronization, locking, atomic classes, or another approach.
Multiple theads do an operation that is not atomic (incrementing a value).
The code you wrote translates to byte code and might cause something like this:
To avoid this, you need to synchronize the access to that critical code.
But note, that if all of your code is critical code, then it's redundant to use multiple threads.
AtomicInteger
We can make use of AtomicInteger class from Java concurrency package while working with parallel streams as the behavior can be unpredictable while using primitive data type
import java.util.stream.IntStream;
import java.util.concurrent.atomic.AtomicInteger;
public class Main
{
private static AtomicInteger parallelCount = new AtomicInteger();
private static int sequentialCount = 0;
public static void main(String[] args) {
System.out.println("Hello World");
int n = 100000;
// I count in parallel:
IntStream.range(0, n).parallel().forEach(i -> {
parallelCount.incrementAndGet();
});
// I count sequentially:
for (int i = 0; i < n; i++) {
sequentialCount++;
}
System.out.println("parallelCount = " + parallelCount);
System.out.println("sequentialCount = " + sequentialCount);
}
}

The issue for multi-thread in java [duplicate]

What is a way to simply wait for all threaded process to finish? For example, let's say I have:
public class DoSomethingInAThread implements Runnable{
public static void main(String[] args) {
for (int n=0; n<1000; n++) {
Thread t = new Thread(new DoSomethingInAThread());
t.start();
}
// wait for all threads' run() methods to complete before continuing
}
public void run() {
// do something here
}
}
How do I alter this so the main() method pauses at the comment until all threads' run() methods exit? Thanks!
You put all threads in an array, start them all, and then have a loop
for(i = 0; i < threads.length; i++)
threads[i].join();
Each join will block until the respective thread has completed. Threads may complete in a different order than you joining them, but that's not a problem: when the loop exits, all threads are completed.
One way would be to make a List of Threads, create and launch each thread, while adding it to the list. Once everything is launched, loop back through the list and call join() on each one. It doesn't matter what order the threads finish executing in, all you need to know is that by the time that second loop finishes executing, every thread will have completed.
A better approach is to use an ExecutorService and its associated methods:
List<Callable> callables = ... // assemble list of Callables here
// Like Runnable but can return a value
ExecutorService execSvc = Executors.newCachedThreadPool();
List<Future<?>> results = execSvc.invokeAll(callables);
// Note: You may not care about the return values, in which case don't
// bother saving them
Using an ExecutorService (and all of the new stuff from Java 5's concurrency utilities) is incredibly flexible, and the above example barely even scratches the surface.
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class DoSomethingInAThread implements Runnable
{
public static void main(String[] args) throws ExecutionException, InterruptedException
{
//limit the number of actual threads
int poolSize = 10;
ExecutorService service = Executors.newFixedThreadPool(poolSize);
List<Future<Runnable>> futures = new ArrayList<Future<Runnable>>();
for (int n = 0; n < 1000; n++)
{
Future f = service.submit(new DoSomethingInAThread());
futures.add(f);
}
// wait for all tasks to complete before continuing
for (Future<Runnable> f : futures)
{
f.get();
}
//shut down the executor service so that this thread can exit
service.shutdownNow();
}
public void run()
{
// do something here
}
}
instead of join(), which is an old API, you can use CountDownLatch. I have modified your code as below to fulfil your requirement.
import java.util.concurrent.*;
class DoSomethingInAThread implements Runnable{
CountDownLatch latch;
public DoSomethingInAThread(CountDownLatch latch){
this.latch = latch;
}
public void run() {
try{
System.out.println("Do some thing");
latch.countDown();
}catch(Exception err){
err.printStackTrace();
}
}
}
public class CountDownLatchDemo {
public static void main(String[] args) {
try{
CountDownLatch latch = new CountDownLatch(1000);
for (int n=0; n<1000; n++) {
Thread t = new Thread(new DoSomethingInAThread(latch));
t.start();
}
latch.await();
System.out.println("In Main thread after completion of 1000 threads");
}catch(Exception err){
err.printStackTrace();
}
}
}
Explanation:
CountDownLatch has been initialized with given count 1000 as per your requirement.
Each worker thread DoSomethingInAThread will decrement the CountDownLatch, which has been passed in constructor.
Main thread CountDownLatchDemo await() till the count has become zero. Once the count has become zero, you will get below line in output.
In Main thread after completion of 1000 threads
More info from oracle documentation page
public void await()
throws InterruptedException
Causes the current thread to wait until the latch has counted down to zero, unless the thread is interrupted.
Refer to related SE question for other options:
wait until all threads finish their work in java
Avoid the Thread class altogether and instead use the higher abstractions provided in java.util.concurrent
The ExecutorService class provides the method invokeAll that seems to do just what you want.
Consider using java.util.concurrent.CountDownLatch. Examples in javadocs
Depending on your needs, you may also want to check out the classes CountDownLatch and CyclicBarrier in the java.util.concurrent package. They can be useful if you want your threads to wait for each other, or if you want more fine-grained control over the way your threads execute (e.g., waiting in their internal execution for another thread to set some state). You could also use a CountDownLatch to signal all of your threads to start at the same time, instead of starting them one by one as you iterate through your loop. The standard API docs have an example of this, plus using another CountDownLatch to wait for all threads to complete their execution.
As Martin K suggested java.util.concurrent.CountDownLatch seems to be a better solution for this. Just adding an example for the same
public class CountDownLatchDemo
{
public static void main (String[] args)
{
int noOfThreads = 5;
// Declare the count down latch based on the number of threads you need
// to wait on
final CountDownLatch executionCompleted = new CountDownLatch(noOfThreads);
for (int i = 0; i < noOfThreads; i++)
{
new Thread()
{
#Override
public void run ()
{
System.out.println("I am executed by :" + Thread.currentThread().getName());
try
{
// Dummy sleep
Thread.sleep(3000);
// One thread has completed its job
executionCompleted.countDown();
}
catch (InterruptedException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}.start();
}
try
{
// Wait till the count down latch opens.In the given case till five
// times countDown method is invoked
executionCompleted.await();
System.out.println("All over");
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
}
If you make a list of the threads, you can loop through them and .join() against each, and your loop will finish when all the threads have. I haven't tried it though.
http://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html#join()
Create the thread object inside the first for loop.
for (int i = 0; i < threads.length; i++) {
threads[i] = new Thread(new Runnable() {
public void run() {
// some code to run in parallel
}
});
threads[i].start();
}
And then so what everyone here is saying.
for(i = 0; i < threads.length; i++)
threads[i].join();
You can do it with the Object "ThreadGroup" and its parameter activeCount:
As an alternative to CountDownLatch you can also use CyclicBarrier e.g.
public class ThreadWaitEx {
static CyclicBarrier barrier = new CyclicBarrier(100, new Runnable(){
public void run(){
System.out.println("clean up job after all tasks are done.");
}
});
public static void main(String[] args) {
for (int i = 0; i < 100; i++) {
Thread t = new Thread(new MyCallable(barrier));
t.start();
}
}
}
class MyCallable implements Runnable{
private CyclicBarrier b = null;
public MyCallable(CyclicBarrier b){
this.b = b;
}
#Override
public void run(){
try {
//do something
System.out.println(Thread.currentThread().getName()+" is waiting for barrier after completing his job.");
b.await();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (BrokenBarrierException e) {
e.printStackTrace();
}
}
}
To use CyclicBarrier in this case barrier.await() should be the last statement i.e. when your thread is done with its job. CyclicBarrier can be used again with its reset() method. To quote javadocs:
A CyclicBarrier supports an optional Runnable command that is run once per barrier point, after the last thread in the party arrives, but before any threads are released. This barrier action is useful for updating shared-state before any of the parties continue.
The join() was not helpful to me. see this sample in Kotlin:
val timeInMillis = System.currentTimeMillis()
ThreadUtils.startNewThread(Runnable {
for (i in 1..5) {
val t = Thread(Runnable {
Thread.sleep(50)
var a = i
kotlin.io.println(Thread.currentThread().name + "|" + "a=$a")
Thread.sleep(200)
for (j in 1..5) {
a *= j
Thread.sleep(100)
kotlin.io.println(Thread.currentThread().name + "|" + "$a*$j=$a")
}
kotlin.io.println(Thread.currentThread().name + "|TaskDurationInMillis = " + (System.currentTimeMillis() - timeInMillis))
})
t.start()
}
})
The result:
Thread-5|a=5
Thread-1|a=1
Thread-3|a=3
Thread-2|a=2
Thread-4|a=4
Thread-2|2*1=2
Thread-3|3*1=3
Thread-1|1*1=1
Thread-5|5*1=5
Thread-4|4*1=4
Thread-1|2*2=2
Thread-5|10*2=10
Thread-3|6*2=6
Thread-4|8*2=8
Thread-2|4*2=4
Thread-3|18*3=18
Thread-1|6*3=6
Thread-5|30*3=30
Thread-2|12*3=12
Thread-4|24*3=24
Thread-4|96*4=96
Thread-2|48*4=48
Thread-5|120*4=120
Thread-1|24*4=24
Thread-3|72*4=72
Thread-5|600*5=600
Thread-4|480*5=480
Thread-3|360*5=360
Thread-1|120*5=120
Thread-2|240*5=240
Thread-1|TaskDurationInMillis = 765
Thread-3|TaskDurationInMillis = 765
Thread-4|TaskDurationInMillis = 765
Thread-5|TaskDurationInMillis = 765
Thread-2|TaskDurationInMillis = 765
Now let me use the join() for threads:
val timeInMillis = System.currentTimeMillis()
ThreadUtils.startNewThread(Runnable {
for (i in 1..5) {
val t = Thread(Runnable {
Thread.sleep(50)
var a = i
kotlin.io.println(Thread.currentThread().name + "|" + "a=$a")
Thread.sleep(200)
for (j in 1..5) {
a *= j
Thread.sleep(100)
kotlin.io.println(Thread.currentThread().name + "|" + "$a*$j=$a")
}
kotlin.io.println(Thread.currentThread().name + "|TaskDurationInMillis = " + (System.currentTimeMillis() - timeInMillis))
})
t.start()
t.join()
}
})
And the result:
Thread-1|a=1
Thread-1|1*1=1
Thread-1|2*2=2
Thread-1|6*3=6
Thread-1|24*4=24
Thread-1|120*5=120
Thread-1|TaskDurationInMillis = 815
Thread-2|a=2
Thread-2|2*1=2
Thread-2|4*2=4
Thread-2|12*3=12
Thread-2|48*4=48
Thread-2|240*5=240
Thread-2|TaskDurationInMillis = 1568
Thread-3|a=3
Thread-3|3*1=3
Thread-3|6*2=6
Thread-3|18*3=18
Thread-3|72*4=72
Thread-3|360*5=360
Thread-3|TaskDurationInMillis = 2323
Thread-4|a=4
Thread-4|4*1=4
Thread-4|8*2=8
Thread-4|24*3=24
Thread-4|96*4=96
Thread-4|480*5=480
Thread-4|TaskDurationInMillis = 3078
Thread-5|a=5
Thread-5|5*1=5
Thread-5|10*2=10
Thread-5|30*3=30
Thread-5|120*4=120
Thread-5|600*5=600
Thread-5|TaskDurationInMillis = 3833
As it's clear when we use the join:
The threads are running sequentially.
The first sample takes 765 Milliseconds while the second sample takes 3833 Milliseconds.
Our solution to prevent blocking other threads was creating an ArrayList:
val threads = ArrayList<Thread>()
Now when we want to start a new thread we most add it to the ArrayList:
addThreadToArray(
ThreadUtils.startNewThread(Runnable {
...
})
)
The addThreadToArray function:
#Synchronized
fun addThreadToArray(th: Thread) {
threads.add(th)
}
The startNewThread funstion:
fun startNewThread(runnable: Runnable) : Thread {
val th = Thread(runnable)
th.isDaemon = false
th.priority = Thread.MAX_PRIORITY
th.start()
return th
}
Check the completion of the threads as below everywhere it's needed:
val notAliveThreads = ArrayList<Thread>()
for (t in threads)
if (!t.isAlive)
notAliveThreads.add(t)
threads.removeAll(notAliveThreads)
if (threads.size == 0){
// The size is 0 -> there is no alive threads.
}
The problem with:
for(i = 0; i < threads.length; i++)
threads[i].join();
...is, that threads[i + 1] never can join before threads[i].
Except the "latch"ed ones, all solutions have this lack.
No one here (yet) mentioned ExecutorCompletionService, it allows to join threads/tasks according to their completion order:
public class ExecutorCompletionService<V>
extends Object
implements CompletionService<V>
A CompletionService that uses a supplied Executor to execute tasks. This class arranges that submitted tasks are, upon completion, placed on a queue accessible using take. The class is lightweight enough to be suitable for transient use when processing groups of tasks.
Usage Examples.
Suppose you have a set of solvers for a certain problem, each returning a value of some type Result, and would like to run them concurrently, processing the results of each of them that return a non-null value, in some method use(Result r). You could write this as:
void solve(Executor e, Collection<Callable<Result>> solvers) throws InterruptedException, ExecutionException {
CompletionService<Result> cs = new ExecutorCompletionService<>(e);
solvers.forEach(cs::submit);
for (int i = solvers.size(); i > 0; i--) {
Result r = cs.take().get();
if (r != null)
use(r);
}
}
Suppose instead that you would like to use the first non-null result of the set of tasks, ignoring any that encounter exceptions, and cancelling all other tasks when the first one is ready:
void solve(Executor e, Collection<Callable<Result>> solvers) throws InterruptedException {
CompletionService<Result> cs = new ExecutorCompletionService<>(e);
int n = solvers.size();
List<Future<Result>> futures = new ArrayList<>(n);
Result result = null;
try {
solvers.forEach(solver -> futures.add(cs.submit(solver)));
for (int i = n; i > 0; i--) {
try {
Result r = cs.take().get();
if (r != null) {
result = r;
break;
}
} catch (ExecutionException ignore) {}
}
} finally {
futures.forEach(future -> future.cancel(true));
}
if (result != null)
use(result);
}
Since: 1.5 (!)
Assuming use(r) (of Example 1) also asynchronous, we had a big advantage. #

Performing a long calculation that returns after a timeout

I want to perform a search using iterative deepening, meaning every time I do it, I go deeper and it takes longer. There is a time limit (2 seconds) to get the best result possible. From what I've researched, the best way to do this is using an ExecutorService, a Future and interrupting it when the time runs out. This is what I have at the moment:
In my main function:
ExecutorService service = Executors.newSingleThreadExecutor();
ab = new AB();
Future<Integer> f = service.submit(ab);
Integer x = 0;
try {
x = f.get(1990, TimeUnit.MILLISECONDS);
}
catch(TimeoutException e) {
System.out.println("cancelling future");
f.cancel(true);
}
catch(Exception e) {
throw new RuntimeException(e);
}
finally {
service.shutdown();
}
System.out.println(x);
And the Callable:
public class AB implements Callable<Integer> {
public AB() {}
public Integer call() throws Exception {
Integer x = 0;
int i = 0;
while (!Thread.interrupted()) {
x = doLongComputation(i);
i++;
}
return x;
}
}
I have two problems:
doLongComputation() isn't being interrupted, the program only checks if Thread.interrupted() is true after it completes the work. Do I need to put checks in doLongComputation() to see if the thread has been interrupted?
Even if I get rid of the doLongComputation(), the main method isn't receiving the value of x. How can I ensure that my program waits for the Callable to "clean up" and return the best x so far?
To answer part 1: Yes, you need to have your long task check the interrupted flag. Interruption requires the cooperation of the task being interrupted.
Also you should use Thread.currentThread().isInterrupted() unless you specifically want to clear the interrupt flag. Code that throws (or rethrows) InterruptedException uses Thread#interrupted as a convenient way to both check the flag and clear it, when you're writing a Runnable or Callable this is usually not what you want.
Now to answer part 2: Cancellation isn't what you want here.
Using cancellation to stop the computation and return an intermediate result doesn't work, once you cancel the future you can't retrieve the return value from the get method. What you could do is make each refinement of the computation its own task, so that you submit one task, get the result, then submit the next using the result as a starting point, saving the latest result as you go.
Here's an example I came up with to demonstrate this, calculating successive approximations of a square root using Newton's method. Each iteration is a separate task which gets submitted (using the previous task's approximation) when the previous task completes:
import java.util.concurrent.*;
import java.math.*;
public class IterativeCalculation {
static class SqrtResult {
public final BigDecimal value;
public final Future<SqrtResult> next;
public SqrtResult(BigDecimal value, Future<SqrtResult> next) {
this.value = value;
this.next = next;
}
}
static class SqrtIteration implements Callable<SqrtResult> {
private final BigDecimal x;
private final BigDecimal guess;
private final ExecutorService xs;
public SqrtIteration(BigDecimal x, BigDecimal guess, ExecutorService xs) {
this.x = x;
this.guess = guess;
this.xs = xs;
}
public SqrtResult call() {
BigDecimal nextGuess = guess.subtract(guess.pow(2).subtract(x).divide(new BigDecimal(2).multiply(guess), RoundingMode.HALF_EVEN));
return new SqrtResult(nextGuess, xs.submit(new SqrtIteration(x, nextGuess, xs)));
}
}
public static void main(String[] args) throws Exception {
long timeLimit = 10000L;
ExecutorService xs = Executors.newSingleThreadExecutor();
try {
long startTime = System.currentTimeMillis();
Future<SqrtResult> f = xs.submit(new SqrtIteration(new BigDecimal("612.00"), new BigDecimal("10.00"), xs));
for (int i = 0; System.currentTimeMillis() - startTime < timeLimit; i++) {
f = f.get().next;
System.out.println("iteration=" + i + ", value=" + f.get().value);
}
f.cancel(true);
} finally {
xs.shutdown();
}
}
}

Parallelize search in a Java set

I have a List<String> called lines and a huge (~3G) Set<String> called voc. I need to find all lines from lines that are in voc. Can I do this multithreaded way?
Currently I have this straightforward code:
for(String line: lines) {
if (voc.contains(line)) {
// Great!!
}
}
Is there a way to search for few lines at the same time? May be there are existing solutions?
PS: I am using javolution.util.FastMap, because it behaves better during filling up.
Here is a possible implementation. Please note that error/interruption handling has been omitted but this might give you a starting point. I included a main method so you could copy and paste this into your IDE for a quick demo.
Edit: Cleaned things up a bit to improve readability and List partitioning
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.CompletionService;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorCompletionService;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ParallelizeListSearch {
public static void main(String[] args) throws InterruptedException, ExecutionException {
List<String> searchList = new ArrayList<String>(7);
searchList.add("hello");
searchList.add("world");
searchList.add("java");
searchList.add("debian");
searchList.add("linux");
searchList.add("jsr-166");
searchList.add("stack");
Set<String> targetSet = new HashSet<String>(searchList);
Set<String> matchSet = findMatches(searchList, targetSet);
System.out.println("Found " + matchSet.size() + " matches");
for(String match : matchSet){
System.out.println("match: " + match);
}
}
public static Set<String> findMatches(List<String> searchList, Set<String> targetSet) throws InterruptedException, ExecutionException {
Set<String> locatedMatchSet = new HashSet<String>();
int threadCount = Runtime.getRuntime().availableProcessors();
List<List<String>> partitionList = getChunkList(searchList, threadCount);
if(partitionList.size() == 1){
//if we only have one "chunk" then don't bother with a thread-pool
locatedMatchSet = new ListSearcher(searchList, targetSet).call();
}else{
ExecutorService executor = Executors.newFixedThreadPool(threadCount);
CompletionService<Set<String>> completionService = new ExecutorCompletionService<Set<String>>(executor);
for(List<String> chunkList : partitionList)
completionService.submit(new ListSearcher(chunkList, targetSet));
for(int x = 0; x < partitionList.size(); x++){
Set<String> threadMatchSet = completionService.take().get();
locatedMatchSet.addAll(threadMatchSet);
}
executor.shutdown();
}
return locatedMatchSet;
}
private static class ListSearcher implements Callable<Set<String>> {
private final List<String> searchList;
private final Set<String> targetSet;
private final Set<String> matchSet = new HashSet<String>();
public ListSearcher(List<String> searchList, Set<String> targetSet) {
this.searchList = searchList;
this.targetSet = targetSet;
}
#Override
public Set<String> call() {
for(String searchValue : searchList){
if(targetSet.contains(searchValue))
matchSet.add(searchValue);
}
return matchSet;
}
}
private static <T> List<List<T>> getChunkList(List<T> unpartitionedList, int splitCount) {
int totalProblemSize = unpartitionedList.size();
int chunkSize = (int) Math.ceil((double) totalProblemSize / splitCount);
List<List<T>> chunkList = new ArrayList<List<T>>(splitCount);
int offset = 0;
int limit = 0;
for(int x = 0; x < splitCount; x++){
limit = offset + chunkSize;
if(limit > totalProblemSize)
limit = totalProblemSize;
List<T> subList = unpartitionedList.subList(offset, limit);
chunkList.add(subList);
offset = limit;
}
return chunkList;
}
}
Simply splitting lines among different threads would (in Oracle JVM at least) spread the work into all CPUs if you are looking for this.
I like using CyclicBarrier, makes those threads controlled in an easier way.
http://javarevisited.blogspot.cz/2012/07/cyclicbarrier-example-java-5-concurrency-tutorial.html
It's absolutely possible to parallelize this using multiple threads. You could do the following:
Break up the list into a different "blocks," one per thread that will do the search.
Have each thread look over its block, checking whether each string is in the set, and if so adding the string to the resulting set.
For example, you might have the following thread routine:
public void scanAndAdd(List<String> allStrings, Set<String> toCheck,
ConcurrentSet<String> matches, int start, int end) {
for (int i = start; i < end; i++) {
if (toCheck.contains(allStrings.get(i))) {
matches.add(allStrings.get(i));
}
}
}
You could then spawn off as many threads as you needed to run the above method and wait for all of them to finish. The resulting matches would then be stored in matches.
For simplicity, I've had the output set be a ConcurrentSet, which automatically eliminates race conditions due to writes. Since you are only doing reads on the list of strings and set of strings to check for, no synchronization is required when reading from allStrings or performing lookups in toCheck.
Hope this helps!
Another option would be to use Akka, it does these kinds of things quite simply.
Actually, having done some search work with Akka, one of the things I can tell you about this too is that it supports two ways of parallelizing such things: through Composable Futures or Agents. For what you want, the Composable Futures would be completely sufficient. Then, Akka is actually not adding that much: Netty is providing the massively parallel io infrastructure, and Futures are part of the jdk, but Akka does make it super simple to put these two together and extend them when/if needed.

Categories