How can I fix my program with ExecutorService? - java

So, I'm new in Java.
I wrote relatively simple program that does something with a lot of files.
It was slow, I wanted to run more threads than one. With little StackOverflow Community help I made something like this:
public class FileProcessor {
public static void main(String[] args)
{
// run 5 threads
ExecutorService executor = Executors.newFixedThreadPool(5);
int i;
// get first and last file ID to process
int start = Integer.parseInt(args[0]);
int end = Integer.parseInt(args[1]);
for (i = start; i < end; i++)
{
final int finalId = i; // final necessary in anonymous class
executor.submit(new Runnable()
{
public void run()
{
processFile(finalId);
}
});
}
}
public static void processFile(int id)
{
//doing work here
}
}
This is really really simple multithreading solution and it does what I want. Now I want to fix/improve it because I guess I'm doing it wrong (program never ends, uses more memory than it should etc.).
Shall I reduce number of Runnable objects existing in memory at the same time? If I should - how can I do it?
How can I detect, that all job is done and exit program (and threads)?

when you said program never ends, uses more memory than it should, it could be due to many reasons like,
1) processFile() might be doing some heavy I/O operations (or) it is blocked for some I/O data.
2) There could be a potential dead lock if there is any common data object share
Your thread logic itself, pretty straight forward with ThreadPoolExecutor and I believe the problem is with the code in processFile().
Since you initialized the pool with 5, The ThreadPoolExecutor makes sure that there are only 5 active threads doing the work irrespective of how many possible threads you want to create.
So In this case, I would focus more on application logic optimization than thread management.
If you are really concerned about how many Runnable objects you want to create? Then That's a trade-off between your application requirement and available resources.
If each thread task is independent and there is a execution time limit for all of those threads, then you create more threads in pool and add more resources.
When you define a PoolExecutor with 5 threads at a time limit, and create 10,000 threads then obviously they have to wait as a Future Task in memory until a thread is available.

If you want to reduce the number of threads running, just reduce the size you pass to the fixed thread pool constructor. As for termination, call shutdown and awaitTermination on the executor service. But that will only reduce the number of active threads, not the number of Runnables you're creating in your loop.

Related

Can I run background tasks in a ThreadPool?

I have an ExecutorService to execute my tasks concurrently. Most of these tasks are simple actions that require ~300ms to complete each. But a few of these tasks are background processing queues that take in new sub-tasks all the time and execute them in order. These background tasks will remain active as long as there are normal tasks running.
The ThreadPool is generated through one of the Executors' methods (don't know which yet) with a user-specified Thread count. My fear is that the following situation might happen: There are less threads than there are background queues. At a given moment, all background queues are working, blocking all the threads of the ExecutorService. No normal tasks will thus be started and the program hang forever.
Is there a possibility this might happen and how can I avoid it? I'm thinking of a possibility to interrupt the background tasks to leave the place to the normal ones.
The goal is to limit the number of threads in my application because Google said having a lot of threads is bad and having them idle for most of the time is bad too.
There are ~10000 tasks that are going to be submitted in a very short amount of time at the begin of the program execution. About ~50 background task queues are needed and most of the time will be spent waiting for a background job to do.
Don't mix up long running tasks with short running tasks in same ExecutorService.
Use two different ExecutorService instances with right pool size. Even if you set the size as 50 for background threads with long running tasks, performance of the pool is not optimal since number of available cores (2 core, 4 core, 8 core etc.) is not in that number.
I would like to create two separate ExecutorService initialized with Runtime.getRuntime().availableProcessors()/2;
Have a look at below posts for more details to effectively utilize available cores:
How to implement simple threading with a fixed number of worker threads
Dynamic Thread Pool
You can have an unlimited number of threads, check out cache thread pool
Creates a thread pool that creates new threads as needed, but will
reuse previously constructed threads when they are available. These
pools will typically improve the performance of programs that execute
many short-lived asynchronous tasks. Calls to execute will reuse
previously constructed threads if available. If no existing thread is
available, a new thread will be created and added to the pool. Threads
that have not been used for sixty seconds are terminated and removed
from the cache. Thus, a pool that remains idle for long enough will
not consume any resources. Note that pools with similar properties but
different details (for example, timeout parameters) may be created
using ThreadPoolExecutor constructors.
Another option is create two different pools and reserve one for priority tasks.
The solution is that the background tasks stop instead of being idle when there is no work and get restarted if there are enough tasks again.
public class BackgroundQueue implements Runnable {
private final ExecutorService service;
private final Queue<Runnable> tasks = new ConcurrentLinkedQueue<>();
private final AtomicBoolean running = new AtomicBoolean(false);
private Future<?> future;
public BackgroundQueue(ExecutorService service) {
this.service = Objects.requireNonNull(service);
// Create a Future that immediately returns null
FutureTask f = new FutureTask<>(() -> null);
f.run();
future = f;
}
public void awaitQueueTermination() throws InterruptedException, ExecutionException {
do {
future.get();
} while (!tasks.isEmpty() || running.get());
}
public synchronized void submit(Runnable task) {
tasks.add(task);
if (running.compareAndSet(false, true))
future = service.submit(this);
}
#Override
public void run() {
while (!running.compareAndSet(tasks.isEmpty(), false)) {
tasks.remove().run();
}
}
}

Is this a good approach for threading and synchronization of multiple functions of a Java class?

I have a class Prefs, which has various methods. I need to rewrite it using threading and synchronization.
I'm looking at this variant: http://tutorials.jenkov.com/java-concurrency/synchronized.html
So currently:
class T_readConfigFile extends Thread {
protected Prefs p = null;
public T_readConfigFile(Prefs p) {
this.p =p;
}
public void run() {
p.readConfigFile();
}
}
and
public synchronized void readConfigFile() { ...
But somehow making N identical classes for each of the methods I want to thread doesn't look like a good idea. I assume it the entire class in this.p = p; gets loaded into memory — do I really need that if I'll be using only one method from there?
So: this works, but I don't like it, are there better ways?
Suppose you want to call some method foo() in a background thread. You have already discovered the most basic way. Here's a somewhat preferable variation on what you did:
new Thread(new Runnable() {
#Override
public void run() {
foo();
}
}).start();
OK, so I wrote six lines of Java code to call one function. Yes, that's kind of verbose. Welcome to Java (or at least, Welcome to Java7. If it can be done more concisely in Java8, I haven't yet learned how.)
This approach has a couple of problems that are worse than verbosity though:
1) You create and destroy a new thread each time you want to call a background method. Creating and destroying threads is relatively expensive.
2) If the background tasks take significant time to perform relative to how often you invoke them, you have no means to control the number of them that are running at the same time. In a busy application, it could keep growing until you get an OutOfMemoryError.
A better approach is to use a thread pool:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
...
final int NUM_THREADS = ...;
final ExecutorService executorService = Executors.newFixedThreadPool(NUM_THREADS);
...
executorService.submit(new Runnable() {
#Override
public void run() {
foo();
}
});
Each time you submit a new task to the thread pool, it will wake an already existing thread, and the thread will perform the task and then go back to sleep. No threads are created or destroyed except when the pool starts up.
Also, if all of the threads are busy when you submit the new task, the task will be added to a queue, and it will be performed later when a worker thread becomes available.
This is just a simple example: The java.util.concurrent package gives you many more options including the ability to limit the size of the queue, the ability to make thread pools that grow or shrink depending on demans, and perhaps most important of all, the ability to wait for a task to complete, and a way to get a return value from a completed task.
Check it out. http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/package-frame.html
A synchronized method locks the method's class object so that only one thread can be executing that method at a time. This is useful for situations where you don't want multiple threads to be reading or writing to the same file or stream at the same time, for example.
If you need each thread to read its own separate configuration file, you probably don't need to synchronize the readConfigFile() method. On the other hand, you do need to synchronize it if every thread reads the same config file.
But if all of the threads are reading the same config file, perhaps you should only have one thread (or perhaps the main parent thread) read the file once, and then pass the resulting parsed config values to each thread. This saves a lot of I/O.

Balancing multiple queues

I suspect this is really easy but I’m unsure if there’s a naïve way of doing it in Java. Here’s my problem, I have two scripts for processing data and both have the same inputs/outputs except one is written for the single CPU and the other is for GPUs. The work comes from a queue server and I’m trying to write a program that sends the data to either the CPU or GPU script depending on which one is free.
I do not understand how to do this.
I know with executorservice I can specify how many threads I want to keep running but not sure how to balance between two different ones. I have 2 GPU’s and 8 CPU cores on the system and thought I could have threadexecutorservice keep 2 GPU and 8 CPU processes running but unsure how to balance between them since the GPU will be done a lot quicker than the CPU tasks.
Any suggestions on how to approach this? Should I create two queues and keep pooling them to see which one is less busy? or is there a way to just put all the work units(all the same) into one queue and have the GPU or CPU process take from the same queue as they are free?
UPDATE: just to clarify. the CPU/GPU programs are outside the scope of the program I'm making, they are simply scripts that I call via two different method. I guess the simplified version of what I'm asking is if two methods can take work from the same queue?
Can two methods take work from the same queue?
Yes, but you should use a BlockingQueue to save yourself some synchronization heartache.
Basically, one option would be to have a producer which places tasks into the queue via BlockingQueue.offer. Then design your CPU/GPU threads to call BlockingQueue.take and perform work on whatever they receive.
For example:
main (...) {
BlockingQueue<Task> queue = new LinkedBlockingQueue<>();
for (int i=0;i<CPUs;i++) {
new CPUThread(queue).start();
}
for (int i=0;i<GPUs;i++) {
new GPUThread(queue).start();
}
for (/*all data*/) {
queue.offer(task);
}
}
class CPUThread {
public void run() {
while(/*some condition*/) {
Task task = queue.take();
//do task work
}
}
}
//etc...
Obviously there is more than one way to do it, usually simplest is the best. I would suggest threadpools, one with 2 threads for CPU tasks, second with 8 threads will run GPU tasks. Your work unit manager can submit work to the pool that has idle threads at the moment (I would recommend synchronizing that block of code). Standard Java ThreadPoolExecutor has getActiveCount() method you can use for it, see
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html#getActiveCount().
Use Runnables like this:
CPUGPURunnable implements Runnable {
run() {
if ( Thread.currentThread() instance of CPUGPUThread) {
CPUGPUThread t = Thread.currentThread();
if ( t.isGPU())
runGPU();
else
runCPU();
}
}
}
CPUGPUThreads is a Thread subclass that knows if it runs in CPU or GPU mode, using a flag. Have a ThreadFactory for ThreadPoolExecutors that creates either a CPU of GPU thread. Set up a ThreadPoolExecutor with two workers. Make sure the Threadfactory creates a CPU and then a GPU thread instance.
I suppose you have two objects that represents two GPUs, with methods like boolean isFree() and void execute(Runnable). Then you should start 8 threads which in a loop take next job from the queue, put it in a free GPU, if any, otherwise execute the job itself.

Kind of load balanced thread pool in java

I am looking for a load balanced thread pool with no success so far. (Not sure whether load balancing is the correct wording).
Let me explain what I try to achieve.
Part 1:
I have Jobs, with 8 to 10 single tasks. On a 6 core CPU I let 8 thread work on this tasks in parallel which seems to deliver best peformance. Whe one task is ready, another one can start. Once all ten tasks are finished, the complete job is done. Usually a job is done in 30 to 60 seconds.
Part two:
Some times, unfortunately, the job takes more then two hours. This is correct due to amount of data that has to be calculated.
The bad thing is, that no other job can start while job1 is running (assuming, that all threads have the same duration) because it is using all threads.
My First idea:
Have 12 threads, allow up to three jobs in parallel.
BUT: that means, the cou is not fully untilized when there is only 1 job.
I am looking for a solution to have full CPU power for job one when there is no other job. But when an other job needs to be started while one other is running, I want the CPU power allocated to both job. And when a third or fourth job shows up, I want the cpu power alocated fairly to all four jobs.
I apreciate your answers...
thanks in advance
One possibility might be to use a standard ThreadPoolExecutor with a different kind of task queue
public class TaskRunner {
private static class PriorityRunnable implements Runnable,
Comparable<PriorityRunnable> {
private Runnable theRunnable;
private int priority = 0;
public PriorityRunnable(Runnable r, int priority) {
this.theRunnable = r;
this.priority = priority;
}
public int getPriority() {
return priority;
}
public void run() {
theRunnable.run();
}
public int compareTo(PriorityRunnable that) {
return this.priority - that.priority;
}
}
private BlockingQueue<Runnable> taskQueue = new PriorityBlockingQueue<Runnable>();
private ThreadPoolExecutor exec = new ThreadPoolExecutor(8, 8, 0L,
TimeUnit.MILLISECONDS, taskQueue);
public void runTasks(Runnable... tasks) {
int priority = 0;
Runnable nextTask = taskQueue.peek();
if(nextTask instanceof PriorityRunnable) {
priority = ((PriorityRunnable)nextTask).getPriority() + 1;
}
for(Runnable t : tasks) {
exec.execute(new PriorityRunnable(t, priority));
priority += 100;
}
}
}
The idea here is that when you have a new job you call
taskRunner.runTasks(jobTask1, jobTask2, jobTask3);
and it will queue up the tasks in such a way that they interleave nicely with any existing tasks in the queue (if any). Suppose you have one job queued, whose tasks have priority numbers j1t1=3, j1t2=103, and j1t3=203. In the absence of other jobs, these tasks will execute one after the other as quickly as possible. But if you submit another job with three tasks of its own, these will be assigned priority numbers j2t1=4, j2t2=104 and j2t3=204, meaning the queue now looks like
j1t1, j2t1, j1t2, j2t2, etc.
This is not perfect however, because if all threads are currently working (on tasks from job 1) then the first task of job 2 can't start until one of the job 1 tasks is complete (unless there's some external way for you to detect this and interrupt and re-queue some of job 1's tasks). The easiest way to make things more fair would be to break down the longer-running tasks into smaller segments and queue those as separate tasks - you need to get to a point where each individual job involves more tasks than there are threads in the pool, so that some of the tasks will always start off in the queue rather than being assigned directly to threads (if there are idle threads then exec.execute() passes the task straight to a thread without going through the queue at all).
The easiest thing to do is to oversubscribe your CPU, as Kanaga suggests, but start 8 threads each. There may be some overhead from the competition, but if you get to a single job situation, it will fully utilize the CPU. The OS will handle giving time to each thread.
Your "first idea" would also work. The idle threads wouldn't take resources from 8 working threads if they aren't actually executing a task. This wouldn't distribute the cpu resources as evenly when there are multiple jobs running, though.
Do you have a setup where you can test these different pipelines to see how they're performing for you?
I think since your machine is 6 core CPU. Better have 6 worker thread for each job-thread. So that when ever one thread got a new job, it starts up to six parallel workers to work on the single job. This will ensure consuming the full cpu power when there is only one job at a time.
Also please have a look at Fork and Join concept in java 7.
References_1 References_2References_3 References_4
Also learn about newcachedthreadpool()
Java newCachedThreadPool() versus newFixedThreadPool

Synchronizing threads in java

Good day!
I got the problem about synchronizing threads in java. I am developing program which creates timers and allows to reset it, delete and stop. Just to learn how to using threads.
The problem is that code gives synchronizing only for some time... I can't understand my mistake. Maybe my way is wrong so i would like to know how to solve this issue.
I have next code:
public class StopWatch
{
//Create and start our timer
public synchronized void startClock( final int id )
{
//Creating new thread.
thisThread = new Thread()
{
#Override
public void run()
{
try
{
while( true )
{
System.out.printf( "Thread [%d] = %d\n", id, timerTime );
timerTime += DELAY; //Count 100 ms
Thread.sleep( DELAY );
}
}
catch( InterruptedException ex )
{
ex.printStackTrace();
}
}
};
thisThread.start();
}
…
//Starting value of timer
private long timerTime = 0;
//Number of ms to add and sleep
private static final int DELAY = 100;
private Thread thisThread;
}
I call this Class like:
StopWatch s = new StopWatch(1);
s.startClock();
StopWatch s2 = new StopWatch(2);
s2.startClock();
I think you may have misunderstood "synchronized".
It does not mean that the threads run in exactly synchronized time - rather that only one thread at a time is allowed to be executing the synchronized code block. In your case "synchronized" makes no difference, since you are calling the startClock method from the same thread....
In general, it is impossible in Java (and indeed most high level languages) to guarantee that two threads perform actions at exactly the same clock time even if you have multiple cores, since they are always vulnerable to being delayed by the OS scheduler or JVM garbage collection pauses etc.
Also, Thread.sleep(...) is unreliable as a timing mechanism, as the amount it sleeps for is only approximate. You're at the mercy of the thread scheduler.
Suggested solution:
use System.currentTimeMillis() if you want a thread-independent timing mechansim.
What do you mean it "only gives you synchronizing for some time?" The only thing you have synchronized here is the startClock method, which just means that two threads will not be within that method at the same time (and it doesn't look like you are doing that anyway). If you wanted to synchronize access to timerTime for example, you would need to put a synchronized block inside thread run method around the incrementing timerTime (or you could use an AtomicLong).
You should probably re-read the documentation for the "synchronize" keyword. I'm pretty sure in this case all it would do is keep the two calls of StartClock() from executing at the same time, which wouldn't happen given this code because they're called one after the other from one thread. Once the timer thread begins, there's nothing keeping them synchronized, if that's your goal.
Your first problem is that this is a time based only solution. This is bad because the program has no control over how long it takes to execute. Some operations take more time than others and each thread within your process doesn't execute at the same time. In general this won't synchronize anything unless you can guarantee everything else is the same . . .
Read about http://download.oracle.com/javase/6/docs/api/java/util/concurrent/Semaphore.html and you can also do
Thread.join(); to make the main loop wait for the execution of the child thread to finish before continuing execution.
I think you misunderstood what synchronized means. Synchronized is to ensure that multiple threads have limited access to a certain block of code so that you don't get conflicts between the two threads.
I think what you may be more interested in is a CyclicBarrier or a CountDownLatch. Both can be used to "synchronize" (overloaded use in this case) multiple threads so that they try to start doing things at the same time.
However, be aware that it's impossible to have multiple threads do things at exactly the same instant. You can only try to encourage to do them at about the same time. The rest is subject to OS scheduling on the cores in the system. And if you have a single core, they will never run at the same time.

Categories