java-Executor Framework - java

Please look at my following code....
private static final int NTHREDS = 10;
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
while(rs.next()){
webLink=rs.getString(1);
FirstName=rs.getString(2);
MiddleName=rs.getString(3);
Runnable worker = new MyRunnable(webLink,FirstName,MiddleName);// this interface has run method....
executor.execute(worker);
}
//added
public class MyRunnable implements Runnable {
MyRunnable(String webLink,String FirstName,String MiddleName){
** Assigning Values...***
}
#Override
public void run() {
long sum = 0;
**Calling method to crawl by passing those Values**
try {
Thread.sleep(200);
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
}
In this part if the resultset(rs) having 100 records excutor creating 100 threads..... I need to run this process with in 10 threads. I need your help to know how to get control of threads.. If any thread has completed its task then it should process the immediate available task from the Result Set. Is it possible to achieve using executor framework.
Thanks...
vijay365

The code you've already posted does this. Your code will not immediately spawn 100 threads. It will spawn 10 threads that consume tasks from a queue containing your Runnables.
From the Executors.newFixedThreadPool Javadocs:
Creates a thread pool that reuses a
fixed set of threads operating off a
shared unbounded queue.
Instead of using a static number of threads (10 in this case) you should determine the number dynamically:
final int NTHREADS = Runtime.getRuntime().availableProcessors();
Also, I don't get why you are calling Thread.sleep?

ResultSet is probably a JDBC query result.
This design is almost certain to be doomed to failure.
The JDBC interface implementations are not thread-safe.
ResultSets are scare resources that should be closed in the same scope in which they were created. If you pass them around, you're asking for trouble.
Multi-threaded code is hard to write well and even harder to debug if incorrect.
You are almost certainly headed in the wrong direction with this design. I'd bet a large sum of money that you're guilty of premature optimization. You are hoping that multiple threads will make your code faster, but what will happen is ten threads time slicing on one CPU and taking the same time or longer. (Context switching takes time, too.)
A slightly better idea would be to load the ResultSet into an object or collection, close the ResultSet, and then do some multi-threaded processing on that returned object.

Try executor.submit(worker);

Related

How to avoid congesting/stalling/deadlocking an executorservice with recursive callable

All the threads in an ExecutorService are busy with tasks that wait for tasks that are stuck in the queue of the executor service.
Example code:
ExecutorService es=Executors.newFixedThreadPool(8);
Set<Future<Set<String>>> outerSet=new HashSet<>();
for(int i=0;i<8;i++){
outerSet.add(es.submit(new Callable<Set<String>>() {
#Override
public Set<String> call() throws Exception {
Thread.sleep(10000); //to simulate work
Set<Future<String>> innerSet=new HashSet<>();
for(int j=0;j<8;j++) {
int k=j;
innerSet.add(es.submit(new Callable<String>() {
#Override
public String call() throws Exception {
return "number "+k+" in inner loop";
}
}));
}
Set<String> out=new HashSet<>();
while(!innerSet.isEmpty()) { //we are stuck at this loop because all the
for(Future<String> f:innerSet) { //callable in innerSet are stuckin the queue
if(f.isDone()) { //of es and can't start since all the threads
out.add(f.get()); //in es are busy waiting for them to finish
}
}
}
return out;
}
}));
}
Are there any way to avoid this other than by making more threadpools for each layer or by having a threadpool that is not fixed in size?
A practical example would be if some callables are submitted to ForkJoinPool.commonPool() and then these tasks use objects that also submit to the commonPool in one of their methods.
You should use a ForkJoinPool. It was made for this situation.
Whereas your solution blocks a thread permanently while it's waiting for its subtasks to finish, the work stealing ForkJoinPool can perform work while in join(). This makes it efficient for these kinds of situations where you may have a variable number of small (and often recursive) tasks that are being run. With a regular thread-pool you would need to oversize it, to make sure that you don't run out of threads.
With CompletableFuture you need to handle a lot more of the actual planning/scheduling yourself, and it will be more complex to tune if you decide to change things. With FJP the only thing you need to tune is the amount of threads in the pool, with CF you need to think about then vs. thenAsync as well.
I would recommend trying to decompose the work to use completion stages via CompletableFuture
CompletableFuture.supplyAsync(outerTask)
.thenCompose(CompletableFuture.allOf(innerTasks)
That way your outer task doesn’t hog the execution thread while processing inner tasks, but you still get a Future that resolves when the entire job is done. It can be hard to split those stages up if they’re too tightly coupled though.
The approach that you are suggesting which basically is based on the hypothesis that there is a possible resolution if the number of threads are more than the number of task, will not work here, if you are already allocating a single thread pool. You may try it to see it. It's a simple case of deadlock as you have stated in the comments of your code.
In such a case, use two separate thread pools, one for the outer and another for the inner. And when the task from the inner pool completes, simply return back the value to the outer.
Or you can simply create a thread on the fly, get the work done in it, get the result and return it back to the outer.

Frequent concurrent method calls in Java data-logger

I'm implementing a Java Data-logger which reads, at precise intervals of time, some datas from different production machines. To avoid having one call blocking the following ones, I was thinking of making a new thread for every call to the parser class.
However, this would require the creation of many threads, and then to stop them, every 10 seconds (which is my reading interval). A non-concurrent approach would cause me to have many delays when the parser gets an exception (due to the possible timeouts of the IoT devices i'm using) making the next calls to be delayed.
while(!error){
//JDBC connections and other calls here
//Queryresult is a ResultSet that returns all the machine addresses needing to be read
while(queryresult.next()){
//Parser.ParseSpeedV is the method I need to call concurrently
Double v = Parser.ParseSpeedV(..Params..);
Double s = v*queryresult.getDouble("const");
st = conn.createStatement();
st.executeUpdate("INSERT INTO ...");
}
st.close();
Thread.sleep(10000);
}
What is the best way to achieve a concurrent method calls (to the method ParseSpeedV) without having the overhead caused by thousands of thread starting every day?
What you want to use is a ScheduledExecutorService. It allows you to add tasks that are repeated at a fixed rate or fixed delay. So you can i.E. add a task that fetches data from a device every 10 seconds. The Executor service then makes sure that it is run in that interval with resonably low deviation.
final ScheduledExecutorService myScheduledExecutor = Executors.newScheduledThreadPool(16);
myScheduledExecutor.scheduleAtFixedRate(myTask, 0L, 10L, TimeUnit.SECONDS);
Your situation is the perfect use case for a Thread Pool. This part of Java's library that's built on top of simple Threads and allows you to create a fixed-sized pool of threads and reuse them over and over:
ExecutorService executor = Executors.newFixedThreadPool(5);
Any time you want to do some work you add it to the executor
executor.execute(new Runnable() {
#Override
public void run() {
// Do some work
}
});
If you call execute more than 5 times, the extra runnables are held in a queue until there's room.
Now, if you need to receive information from these runnning tasks, you need to write a class that implements Runnable and accepts some kind of object that wishes to have the information that your runnable has:
public class Worker implements Runnable {
Consumer consumer;
public Worker(Consumer consumer) {
this.consumer = consumer;
}
#Override public void run() {
// Do work
value = // get value
consumer.put(value);
}
}
Now all you have to do is define a Consumer class that operates on the value (has that put() method, or whatever) and create your Workers like this:
Consumer consumer = new Consumer();
Worker worker = new Worker(myConsumer);
executor.execute(worker);

Java - How to log metrics from a ThreadPool?

I wrapped a ThreadPoolExecutor in an implementation of ExecutorService of my own, just to send it any filesystem writing task, so they would be treated sequencially and one-by-one. (No need to harass this poor disk writing head.)
The wrapper comes in handy by:
allowing me to Inject this ThreadPool as a Guice Singleton pretty much everywhere I need it
telling me in real-time how much more work there is left to do
This last feature is acomplished by the call to logUtils.writingHeartbeat(int) which logs a message about how many jobs are still in the queue if a "sufficient" time has been elapsed since last logging. It works pretty well in regards of writing logs at the desired intervals, but always tells me there is 0 files remaining to write. Which sounds fishy given the execution times.
What am I doing wrong?
#Singleton
public class WritersThreadPool implements ExecutorService {
private final ThreadPoolExecutor innerPool;
private final LogUtils logUtils;
#Inject
public WritersThreadPool(LogUtils logUtils) {
innerPool = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>());
this.logUtils = logUtils;
}
#Override
public Future<?> submit(final Runnable r) {
return innerPool.submit(new Callable<Void>() {
#Override
public Void call() throws Exception {
r.run();
logUtils.writingHeartbeat(innerPool.getQueue().size());
return null;
}
});
}
(...) // Other implemented methods with no special behavior.
}
I share #chubbsondubs opinion, that there must be a synchronization issue elsewhere in the code.
My suggestions to prove some things were:
Try logging getTaskCountand getCompletedTaskCount.
This led to your observation, that there is indeed only 1 Task in the queue at one given time.
Instead of composition, extend ThreadPoolExecutor and use the afterExecute hook. Maybe you can investigate who is synchronizing that should not that way.
So I think the problem is that you are sequentially checking the queue size after you have fully run the Runnable being submitted. So the Runnable has fully completed it's work then it check the queue size which is going to be empty unless you had exhausted the number of threads in the innerPool. In other words there has to be something waiting in the queue for it to print out anything other than 0. The current job is being run by a thread so it's not on the queue.

Weak performance of CyclicBarrier with many threads: Would a tree-like synchronization structure be an alternative?

Our application requires all worker threads to synchronize at a defined point. For this we use a CyclicBarrier, but it does not seem to scale well. With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
EDIT: Synchronization happens very frequently, in the order of 100k to 1M times.
If synchronization of many threads is "hard", would it help building a synchronization tree? Thread 1 waits for 2 and 3, which in turn wait for 4+5 and 6+7, respectively, etc.; after finishing, threads 2 and 3 wait for thread 1, thread 4 and 5 wait for thread 2, etc..
1
| \
2 3
|\ |\
4 5 6 7
Would such a setup reduce synchronization overhead? I'd appreciate any advice.
See also this featured question: What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?
With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
Honestly, there's your problem right there. Figure out a performance benchmark and prove that this is the problem, or risk spending hours / days solving the entirely wrong problem.
You are thinking about the problem in a subtly wrong way that tends to lead to very bad coding. You don't want to wait for threads, you want to wait for work to be completed.
Probably the most efficient way is a shared, waitable counter. When you make new work, increment the counter and signal the counter. When you complete work, decrement the counter. If there is no work to do, wait on the counter. If you drop the counter to zero, check if you can make new work.
If I understand correctly, you're trying to break your solution up into parts and solve them separately, but concurrently, right? Then have your current thread wait for those tasks? You want to use something like a fork/join pattern.
List<CustomThread> threads = new ArrayList<CustomThread>();
for (Something something : somethings) {
threads.add(new CustomThread(something));
}
for (CustomThread thread : threads) {
thread.start();
}
for (CustomThread thread : threads) {
thread.join(); // Blocks until thread is complete
}
List<Result> results = new ArrayList<Result>();
for (CustomThread thread : threads) {
results.add(thread.getResult());
}
// do something with results.
In Java 7, there's even further support via a fork/join pool. See ForkJoinPool and its trail, and use Google to find one of many other tutorials.
You can recurse on this concept to get the tree you want, just have the threads you create generate more threads in the exact same way.
Edit: I was under the impression that you wouldn't be creating that many threads, so this is better for your scenario. The example won't be horribly short, but it goes along the same vein as the discussion you're having in the other answer, that you can wait on jobs, not threads.
First, you need a Callable for your sub-jobs that takes an Input and returns a Result:
public class SubJob implements Callable<Result> {
private final Input input;
public MyCallable(Input input) {
this.input = input;
}
public Result call() {
// Actually process input here and return a result
return JobWorker.processInput(input);
}
}
Then to use it, create an ExecutorService with a fix-sized thread pool. This will limit the number of jobs you're running concurrently so you don't accidentally thread-bomb your system. Here's your main job:
public class MainJob extends Thread {
// Adjust the pool to the appropriate number of concurrent
// threads you want running at the same time
private static final ExecutorService pool = Executors.newFixedThreadPool(30);
private final List<Input> inputs;
public MainJob(List<Input> inputs) {
super("MainJob")
this.inputs = new ArrayList<Input>(inputs);
}
public void run() {
CompletionService<Result> compService = new ExecutorCompletionService(pool);
List<Result> results = new ArrayList<Result>();
int submittedJobs = inputs.size();
for (Input input : inputs) {
// Starts the job when a thread is available
compService.submit(new SubJob(input));
}
for (int i = 0; i < submittedJobs; i++) {
// Blocks until a job is completed
results.add(compService.take())
}
// Do something with results
}
}
This will allow you to reuse threads instead of generating a bunch of new ones every time you want to run a job. The completion service will do the blocking while it waits for jobs to complete. Also note that the results list will be in order of completion.
You can also use Executors.newCachedThreadPool, which creates a pool with no upper limit (like using Integer.MAX_VALUE). It will reuse threads if one is available and create a new one if all the threads in the pool are running a job. This may be desirable later if you start encountering deadlocks (because there's so many jobs in the fixed thread pool waiting that sub jobs can't run and complete). This will at least limit the number of threads you're creating/destroying.
Lastly, you'll need to shutdown the ExecutorService manually, perhaps via a shutdown hook, or the threads that it contains will not allow the JVM to terminate.
Hope that helps/makes sense.
If you have a generation task (like the example of processing columns of a matrix) then you may be stuck with a CyclicBarrier. That is to say, if every single piece of work for generation 1 must be done in order to process any work for generation 2, then the best you can do is to wait for that condition to be met.
If there are thousands of tasks in each generation, then it may be better to submit all of those tasks to an ExecutorService (ExecutorService.invokeAll) and simply wait for the results to return before proceeding to the next step. The advantage of doing this is eliminating context switching and wasted time/memory from allocating hundreds of threads when the physical CPU is bounded.
If your tasks are not generational but instead more of a tree-like structure in which only a subset need to be complete before the next step can occur on that subset, then you might want to consider a ForkJoinPool and you don't need Java 7 to do that. You can get a reference implementation for Java 6. This would be found under whatever JSR introduced the ForkJoinPool library code.
I also have another answer which provides a rough implementation in Java 6:
public class Fib implements Callable<Integer> {
int n;
Executor exec;
Fib(final int n, final Executor exec) {
this.n = n;
this.exec = exec;
}
/**
* {#inheritDoc}
*/
#Override
public Integer call() throws Exception {
if (n == 0 || n == 1) {
return n;
}
//Divide the problem
final Fib n1 = new Fib(n - 1, exec);
final Fib n2 = new Fib(n - 2, exec);
//FutureTask only allows run to complete once
final FutureTask<Integer> n2Task = new FutureTask<Integer>(n2);
//Ask the Executor for help
exec.execute(n2Task);
//Do half the work ourselves
final int partialResult = n1.call();
//Do the other half of the work if the Executor hasn't
n2Task.run();
//Return the combined result
return partialResult + n2Task.get();
}
}
Keep in mind that if you have divided the tasks up too much and the unit of work being done by each thread is too small, there will negative performance impacts. For example, the above code is a terribly slow way to solve Fibonacci.

Java: Best way to retrieve timings form multiple threads

We have 1000 threads that hit a web service and time how long the call takes. We wish for each thread to return their own timing result to the main application, so that various statistics can be recorded.
Please note that various tools were considered for this, but for various reasons we need to write our own.
What would be the best way for each thread to return the timing - we have considered two options so far :-
1. once a thread has its timing result it calls a singleton that provides a synchronised method to write to the file. This ensures that all each thread will write to the file in turn (although in an undetermined order - which is fine), and since the call is done after the timing results have been taken by the thread, then being blocked waiting to write is not really an issue. When all threads have completed, the main application can then read the file to generate the statistics.
2. Using the Executor, Callable and Future interfaces
Which would be the best way, or are there any other better ways ?
Thanks very much in advance
Paul
Use the latter method.
Your workers implement Callable. You then submit them to a threadpool, and get a Future instance for each.
Then just call get() on the Futures to get the results of the calculations.
import java.util.*;
import java.util.concurrent.*;
public class WebServiceTester {
public static class Tester
implements Callable {
public Integer call() {
Integer start = now();
//Do your test here
Integer end = now();
return end - start;
}
}
public static void main(String args[]) throws Exception {
ExecutorService pool = Executors.newFixedThreadPool(1000);
Set<Future<Integer>> set = new HashSet<Future<Integer>>();
for (int i =0 ; i < 1000 i++) {
set.add(pool.submit(new Tester()));
}
Set<Integer> results = new Set<Integer>();
for (Future<Integer> future : set) {
results.put(future.get());
}
//Manipulate results however you wish....
}
}
Another possible solution I can think of would be to use a CountDownLatch (from the java concurrency packages), each thread decrementing it (flagging they are finished), then once all complete (and the CountDownLatch reaches 0) your main thread can happily go through them all, asking them what their time was.
The executor framework can be implemented here. The time processing can be done by the Callable object. The Future can help you identify if the thread has completed processing.
You could pass an ArrayBlockingQueue to the threads to report their results to. You could then have a file writing thread that takes from the queue to write to the file.

Categories