I'm using a java.util.concurrent.ExecutorService that I obtained by calling Executors.newSingleThreadExecutor(). This ExecutorService can sometimes stop processing tasks, even though it has not been shutdown and continues to accept new tasks without throwing exceptions. Eventually, it builds up enough of a queue that my app shuts down with OutOfMemoryError exceptions.
The documentation seem to indicate that this single thread executor should survive task processing errors by firing up a new worker thread if necessary to replace one that has died. Am I missing something?
It sounds like you have two different issues:
1) You're over-feeding the work queue. You can't just keep stuffing new tasks into the queue, with no regard for the consumption rate of the task executors. You need to figure out some logic for knowing when you to block new additions to the work queue.
2) Any uncaught exception in a task's thread can completely kill the thread. When that happens, the ExecutorService spins up a new thread to replace it. But that doesn't mean you can ignore whatever problem is causing the thread to die in the first place! Find those uncaught exceptions and catch them!
This is just a hunch (cuz there's not enough info in your post to know otherwise), but I don't think your problem is that the task executor stops processing tasks. My guess is that it just doesn't process tasks as fast as you're creating them. (And the fact that your tasks sometimes die prematurely is probably orthogonal to the problem.)
At least, that's been my experience working with thread pools and task executors.
Okay, here's another possibility that sounds feasible based on your comment (that everything will run smoothly for hours until suddenly coming to a crashing halt)...
You might have a rare deadlock between your task threads. Most of the time, you get lucky, and the deadlock doesn't manifest itself. But occasionally, two or more of your task threads get into a state where they're waiting for the release of a lock held by the other thread. At that point, no more task processing can take place, and your work queue will pile up until you get the OutOfMemoryError.
Here's how I'd diagnose that problem:
Eliminate ALL shared state between your task threads. At first, this might require each task thread making a defensive copy of all shared data structures it requires. Once you've done that, it should be completely impossible to experience a deadlock.
At this point, gradually reintroduced the shared data structures, one at a time (with appropriate synchronization). Re-run your application after each tiny modification to test for the deadlock. When you get that crashing situation again, take a close look at the access patterns for the shared resource and determine whether you really need to share it.
As for me, whenever I write code that processes parallel tasks with thread pools and executors, I always try to eliminate ALL shared state between those tasks. As far as the application is concerned, they may as well be completely autonomous applications. Hunting down deadlocks is a drag, and in my experience, the best way to eliminate deadlocks is for each thread to have its own local state rather than sharing any state with other task threads.
Good luck!
My guess would be that your tasks are blocking indefinitely, rather than dying. Do you have evidence, such as a log statement at the end of your task, suggest that your tasks are successfully completing?
This could be a deadlock, or an interaction with some external process that is blocking.
Although you don't leave enough detail to be sure, the first thing I'd try is to have your tasks catch "Exception" at the top level and log the message.
I know it doesn't seem right, but occasionally (depending on a lot of variables) I've worked on code where stuff happening in a thread throws an exception and it is never logged, or it just doesn't show up on the console--yet the "executing" code exits out of it's top level loop or whatever code is causing your task to run.
I guess I'm just saying, make sure your tasks are not throwing an exception out.
Related
I try to use a ForkJoinPool to parallelize my CPU intensive calculations.
My understanding of a ForkJoinPool is, that it continues to work as long as any task is available to be executed. Unfortunately I frequently observed worker threads idling/waiting, thus not all CPU are kept busy. Sometimes I even observed additional worker threads.
I did not expect this, as I strictly tried to use non blocking tasks.
My observation is very similar to those of ForkJoinPool seems to waste a thread.
After debugging a lot into ForkJoinPool I have a guess:
I used invokeAll() to distribute work over a list of subtasks. After invokeAll() finished to execute the first task itself it starts joining the other ones. This works fine, until the next task to join is on top of the executing queue. Unfortunately I submitted additional tasks asynchronously without joining them. I expected the ForkJoin framework to continue executing those task first and than turn back to joining any remaining tasks.
But it seems not to work this way. Instead the worker thread gets stalled calling wait() until the task waiting for gets ready (presumably executed by an other worker thread). I did not verify this, but it seems to be a general flaw of calling join().
ForkJoinPool provides an asyncMode, but this is a global parameter and can not be used for individual submissions. But I like to see my asynchronously forked tasks to be executed soon.
So, why does ForkJoinTask.doJoin() not simply executes any available task on top of its queue until it gets ready (either executed by itself or stolen by others)?
Since nobody else seems to understand my question I try to explain what I found after some nights of debugging:
The current implementation of ForkJoinTasks works well if all fork/join calls are strictly paired. Illustrating a fork by an opening bracket and join by a closing one a perfect binary fork join pattern may look like this:
{([][]) ([][])} {([][]) ([][])}
If you use invokeAll() you may also submit list of subtasks like this:
{([][][][]) ([][][][]) ([][][][])}
What I did however looks like this pattern:
{([) ([)} ... ]]
You may argue this looks ill or is a misuse of the fork-join framework. But the only constraint is, that the tasks completion dependencies are acyclic, else you may run into a deadlock. As long as my [] tasks are not dependent on the () tasks, I don't see any problem with it. The offending ]]'s just express that I do not wait for them explicitly; they may finish some day, it does not matter to me (at that point).
Indeed the current implementation is able to execute my interlocked tasks, but only by spawning additional helper threads which is quite inefficient.
The flaw seems to be the current implementation of join(): joining an ) expects to see its corresponding ( on top of its execution queue, but it finds a [ and is perplexed. Instead of simply executing [] to get rid of it, the current thread suspends (calling wait()) until someone else comes around to execute the unexpected task. This causes a drastic performance break down.
My primary intend was to put additional work onto the queue to prevent the worker thread from suspending if the queue runs empty. Unfortunately the opposite happens :-(
You are dead right about join(). I wrote this article two years ago that points out the problem with join().
As I said there, the framework cannot execute newly submitted requests until it finishes the earlier ones. And each WorkThread cannot steal until it's current request finishes which results in the wait().
The additional threads you see are "continuation threads." Since join() eventually issues a wait(), these threads are needed so the entire framework doesn't stall.
You’re not using this framework for the very narrow purpose for which it was intended.
The framework started life as the experiment in the 2000 research paper. It’s been modified since then but the basic design, fork-and-join on large arrays, remains the same. The basic purpose is to teach undergraduates how to walk down the leaves of a balanced tree. When people use it for other than simple array-processing weird things happen. What it is doing in Java7 is beyond me; which is the purpose of the article.
The problems only get worse in Java8. There it’s the engine to drive all stream parallel work. Have a read in part two of that article. The lambda interest lists are filled with reports of thread stalls, stack overflow, and out of memory errors.
You use it at your own risk when you don’t use it for pure recursive decomposition of large data structures. Even then, the excessive threads it creates can cause havoc. I’m not going to pursue this discussion any further.
I am working on a multithreaded game in java. I have several worker threads that fetch modules from a central thread manager, which then executes it on its own. Now I would like to be able to pause such a thread if it temporarily has nothing to execute. I have tried calling the wait() method on it from the thread manager, but that only resulted in it ignoring the notify() call that followed it.
I googled a bit on it too, only finding that most sites refer to functions like suspend(), pause(), etc, which are now marked a deprecated on the java documentation pages.
So in general, what is the way to pause or continue a thread on demand?
You can use an if block in the thread with a sentinal variable that is set to false if you want to halt the thread's action. This works best if the thread is performing loops.
Maybe I'm missing the point, but if they have nothing to do, why not just let them die? Then spawn a new thread when you have work for one to do again.
It sounds to me like you're trying to have the conversation both ways. In my (humble) opinion, you should either have the worker threads responsible for asking the central thread manager for work (or 'modules'), or you should have the central thread manager responsible for doling out work and kicking off the worker threads.
What it sounds like is that most of the time the worker threads are responsible for asking for work. Then, sometimes, the responsibility flips round to the thread manager to tell the workers not to ask for a while. I think the system will stay simpler if this responsibility stays on only one side.
So, given this, and with my limited knowledge of what you're developing, I would suggest either:
Have the thread manager kick of worker threads when there's stuff to do and keep track of their progress, letting them die when they're done and only creating new ones when there's new stuff to do. Or
Have a set number of always existing worker threads that poll the thread manager for work and (if there isn't any) sleep for a period of time using Thread.sleep() before trying again. This seems pretty wasteful to me so I would lean towards option 1 unless you've a good reason not to?
In the grand tradition of not answering your question, and suggest that You Are Doing It Wrong, I Offer this :-)
Maybe you should refactor your code to use a ExecutorService, its a rather good design.
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
There are many ways to do this, but in the commonest (IMO), the worker thread calls wait() on the work queue, while the work generator should call notify(). This causes the worker thread to stop, without the thread manager doing anything. See e.g. this article on thread pools and work queues.
use a blocking queue to fetch those modules using take()
or poll(time,unit) for a timed out wait so you can cleanly shutdown
these will block the current thread until a module is available
When writing a multithread internet server in java, the main-thread starts new
ones to serve incoming requests in parallel.
Is any problem if the main-thread does not wait ( with .join()) for them?
(It is obviously absurd create a new thread and then, wait for it).
I know that, in a practical situation, you should (or "you must"?) implement a pool
of threads to "re-use" them for new requests when they become idle.
But for small applications, should we use a pool of threads?
You don't need to wait for threads.
They can either complete running on their own (if they've been spawned to perform one particular task), or run indefinitely (e.g. in a server-type environment).
They should handle interrupts and respond to shutdown requests, however. See this article on how to do this correctly.
If you need a set of threads I would use a pool and executor methods since they'll look after thread resource management for you. If you're writing a multi-threaded network server then I would investigating using (say) a servlet container or a framework such as Mina.
The only problem in your approach is that it does not scale well beyond a certain request rate. If the requests are coming in faster than your server is able to handle them, the number of threads will rise continuously. As each thread adds some overhead and uses CPU time, the time for handling each request will get longer, so the problem will get worse (because the number of threads rises even faster). Eventually no request will be able to get handled anymore because all of the CPU time is wasted with overhead. Probably your application will crash.
The alternative is to use a ThreadPool with a fixed upper bound of threads (which depends on the power of the hardware). If there are more requests than the threads are able to handle, some requests will have to wait too long in the request queue, and will fail due to a timeout. But the application will still be able to handle the rest of the incoming requests.
Fortunately the Java API already provides a nice and flexible ThreadPool implementation, see ThreadPoolExecutor. Using this is probably even easier than implementing everything with your original approach, so no reason not to use it.
Thread.join() lets you wait for the Thread to end, which is mostly contrary to what you want when starting a new Thread. At all, you start the new thread to do stuff in parallel to the original Thread.
Only if you really need to wait for the spawned thread to finish, you should join() it.
You should wait for your threads if you need their results or need to do some cleanup which is only possible after all of them are dead, otherwise not.
For the Thread-Pool: I would use it whenever you have some non-fixed number of tasks to run, i.e. if the number depends on the input.
I would like to collect the main ideas of this interesting (for me) question.
I can't totally agree with "you
don't need to wait for threads".
Only in the sense that if you don't
join a thread (and don't have a
pointer to it) once the thread is
done, its resources are freed
(right? I'm not sure).
The use of a thread pool is only
necessary to avoid the overhead of
thread creation, because ...
You can limit the number of parallel
running threads by accounting, with shared variables (and without a thread pool), how many of then
were started but not yet finished.
I've been reading around about InterruptedException, and it's immediately apparent that there's no silver bullet solution to handle it properly in all cases.
What I haven't seen yet, is some sample code demonstrating what can go wrong if the exception is handled improperly. Of course I realize some of the effects (such as thread starvation, which I think is one of them) are hard to demonstrate. I want to keep it limited to demonstrating the proper use of Thread.sleep().
How would you go about designing a somewhat realistic sample program for this?
Here are my ideas so far:
Make a simple GUI application to demonstrate reduced responsiveness. There'd be a UI thread, and a simple thread pool to perform some blocking task. The thread pool manager checks the interrupted status of the running threads to manage them. Swallowed InterruptedExceptions cause the pool to run out of threads, so the application becomes less responsive.
This can help point out the different handling strategies for when sleeping in a managed thread vs. an unmanaged one.
Have a bunch of threads that create garbage and sleep. There would be two types of threads: ones that restore the interrupted status when interrupted, and those that don't (swallow the exception). Then the demonstration would be to run the application in a JVM with little memory, and (hopefully) show that swallowing the exception somehow inhibits garbage collection or increases its overhead (due to long interval between invocations).
Do these ideas make sense? Any other (maybe simpler) ideas?
Say you have a Thread which you want to be able to shutdown by interrupting it.
public void run() {
while(!Thread.currentThread().interrupted()) {
doWork();
callMethodWhichIgnoresInterrupted();
}
}
By discarding the interrupt, you can have a thread which sometimes fails to die causing a resource leak you cannot fix without restarting the application.
Ignoring any exception is a very bad idea 95+% of the time. This is why they are checked exceptions in Java. These problems are not limited to interrupts.
I'm currently working on a daemon that will be doing A LOT of different tasks. It's multi threaded and is being built to handle almost any kind of internal-error without crashing. Well I'm getting to the point of handling a shutdown request and I'm not sure how I should go about doing it.
I have a shutdown hook setup, and when it's called it sets a variable telling the main daemon loop to stop running. The problem is, this daemon spawns multiple threads and they can take a long time. For instance, one of these threads could be converting a document. Most of them will be quick (I'm guessing under 10 seconds), but there will be threads that can last as long as 10+ minutes.
What I'm thinking of doing right now is when a shutdown hook has been sent, do a loop for like 5 seconds on ThreadGroup.activeCount() with a 500ms (or so) Sleep (all these threads are in a ThreadGroup) and before this loop, I will send a notification to all threads telling them a shutdown request has been called. Then they will have to instantly no matter what they're doing cleanup and shutdown.
Anyone else have any suggestions? I'm interested in what a daemon like MySQL for instance does when it gets told to stop, it stops instantly. What happens if like 10 query's are running that are very slow are being called? Does it wait or does it just end them. I mean servers are really quick, so there really isn't any kind of operation that I shouldn't be able to do in less than a second. You can do A LOT in 1000ms now days.
Thanks
The java.util.concurrent package provides a number of utilities, such as ThreadPoolExecutor (along with various specialized types of other Executor implementations from the Executors class) and ThreadPoolExecutor.awaitTermination(), which you might want to look into - as they provide the same exact functionality you are looking to implement. This way you can concentrate on implementing the actual functionality of your application/tasks instead of worrying about things like thread and task scheduling.
Are your thread jobs amenable to interruption via Thread#interrupt()? Do they mostly call on functions that themselves advertise throwing InterruptedException? If so, then the aforementioned java.util.concurrent.ExecutorService#shutdownNow() is the way to go. It will interrupt any running threads and return the list of jobs that were never started.
Similarly, if you hang on to the Futures produced by ExecutorService#submit(), you can use Future#cancel(boolean) and pass true to request that a running job be interrupted.
Unless you're calling on code out of your control that swallows interrupt signals (say, by catching InterruptedException without calling Thread.currentThread().interrupt()), using the built-in cooperative interruption facility is a better choice than introducing your own flags to approximate what's already there.