Idle threads left with ScheduledThreadPoolExecutor.schedule - java

I have a Java application that is structured as:
One thread watching a java.nio.Selector for IO.
A java.util.concurrent.ScheduledThreadPoolExecutor thread pool handling either work to be done immediately — dispatching IO read by the IO thread — or work to be done after a delay, usually errors.
The ScheduledThreadPoolExecutor has an upper bound on the number of threads to create; currently 5000 in the app, but I haven't tuned that number at all.
After running the app for a while, I get thousands and thousands of threads that have this stack trace:
"pool-1-thread-5262" prio=10 tid=0x00007f636c2df800 nid=0x2516 waiting on condition [0x00007f60246a5000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000581c49520> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
at java.util.concurrent.DelayQueue.poll(DelayQueue.java:209)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:611)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:602)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:945)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:662)
I assume that the above is being caused by my calls to schedule(java.lang.Runnable, long, java.util.concurrent.TimeUnit), which certainly happens often in the app. Is this the expected behavior?
Having all of these threads hanging around doesn't seem to impact the application at all — if a worker thread is needed, it does not appear like these TIMED_WAITING threads prevent tasks from running when submitted through the submit method, but I'm not totally sure of that. Does having thousands of threads hanging around in this parked state impact the app or system performance?
Tasks that are submitted via the schedule method are very simple: they basically just re-schedule the Channel back with the Selector. So, these tasks are not very long-lived, they just need to execute at some point in the future. Normal worker threads will do traditional blocking-IO to perform their work, and are generally more long-lived.
A related question: is it better to do delayed tasks in an explicit, single thread instead of using the schedule method? That is, have a loop like this:
DelayedQueue<SomeTaskClass> tasks = ...;
while (true) {
task<SomeTaskClass> = tasks.take();
threadpool.submit(task);
}
Does DelayQueue use any worker threads to implement its functionality? I was going to just experiment with it today, but advice would be well appreciated.

After running the app for a while, I get thousands and thousands of threads that have this stack trace.
Unless you actually plan on having 5000 threads all operating at once, that is a too high number. If they are blocked on IO then that should be fine. Unless you are starting with a minimum number of threads that is too large, then their existence in your thread dump means that at some point they were all needed to process the tasks submitted to the executor. So at some point you had 5000 tasks being run at once -- blocking or whatever. If you show the actual executor constructor call I can be more specific.
If you have the time, playing with that upper bound might be good to see if it does affect application behavior.
Does having thousands of threads hanging around in this parked state impact the app or system performance?
They will take up more memory which may affect JVM performance but otherwise it should not impact the application unless too many are running at once. They may just be wasting some system resources which is the only reason why I'd play with the 5000 and other executor constructor args.
is it better to do delayed tasks in an explicit, single thread instead of using the schedule method?
I'd say no. Just about anytime you can replace by-hand thread code with a use of the ExecutorService classes it is a good thing. I think the idea of doing a task and then delaying for a while is a great use of the ScheduledThreadPoolExecutor.
Does DelayQueue use any worker threads to implement its functionality?
No. It is just a BlockingQueue implementation that helps with delaying of tasks. I've never used the class actually, although I would have if I'd known about it. The ScheduledThreadPoolExecutor uses this class to do its job so using DelayQueue yourself is again a waste. Just stick with STPE.

Related

Does the current worker participate in work stealing?

In a ForkJoinPool ForkJoinTask, does the current worker thread participate in work stealing?
I have read implications that a fork join pool can work steal from blocked or waiting threads. The current worker seems an obvious candidate. Once the worker calls .join() on another task, then that task is essentially blocked.
On the other hand, I see many articles that imply different conclusions. For example, the general consensus that the current worker thread should do work before waiting for forked tasks.
There are a few articles that discuss the use of ForkJoinTask.getSurplusQueuedTaskCount as a method of balancing the work in the queue by having the current worker do some of the work. If the current worker is also stealing, then this doesn't seem necessary.
Naturally, I would like to maximize thread operations and keep all workers running maximally. Understanding if the current thread also steals work (for example when .join is called) will help to clarify.
It is the responsibility of the ForkJoinPool to manage threads. Client code should feed it tasks, not micromanage the threading. Note that tasks and threads are two different things; tasks are units of work to be executed, and threads execute that work.
ForkJoinTask.compute() should fork() into smaller subtasks if the task is large enough to benefit from running parts of the task in parallel, and simply process the task if the task is small enough that it would better be run in a single thread. If the work turns out to be more than expected, it can fork() some of the work and do the rest of it.
If ForkJoinTask.compute() forks into smaller subtasks, it can call join() before returning. The ForkJoinPool will then either free the thread to work on other tasks, or spawn a temporary thread to work on other tasks to ensure the available parallelism is fully utilized.
I think it's reasonable to assume that the appropriate number of worker threads are kept busy for as long as there are uncompleted tasks, unless you explicitly block the thread in the compute() method.
The Sun tutorial provides more specifics on how to use these classes:
https://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html

What is busy spin in a multi-threaded environment?

What is "Busy Spin" in multi-threaded environment?
How it is useful and how can it be implemented in java in a multi-threaded environment?
In what way can it be useful in improving the performance of an application?
Some of the other answers miss the real problem with busy waiting.
Unless you're talking about an application where you are concerned with conserving electrical power, then burning CPU time is not, in and of itself, a Bad Thing. It's only bad when there is some other thread or process that is ready-to-run. It's really bad when one of the ready-to-run threads is the thread that your busy-wait loop is waiting for.
That's the real issue. A normal, user-mode program running on a normal operating system has no control over which threads run on which processors, a normal operating system has no way to tell the difference between a thread that is busy waiting and a thread that is doing work, and even if the OS knew that the thread was busy-waiting, it would have no way to know what the thread was waiting for.
So, it's entirely possible for the busy waiter to wait for many milliseconds (practically an eternity), waiting for an event, while the the only thread that could make the event happen sits on the sideline (i.e., in the run queue) waiting for its turn to use a CPU.
Busy waiting is often used in systems where there is tight control over which threads run on which processors. Busy waiting can be the most efficient way to wait for an event when you know that the thread that will cause it is actually running on a different processor. That often is the case when you're writing code for the operating system itself, or when you're writing an embedded, real-time application that runs under a real-time operating system.
Kevin Walters wrote about the case where the time to wait is very short. A CPU-bound, ordinary program running on an ordinary OS may be allowed to execute millions of instructions in each time slice. So, if the program uses a spin-lock to protect a critical section consisting of just a few instructions, then it is highly unlikely that any thread will lose its time slice while it is in the critical section. That means, if thread A finds the spin-lock locked, then it is highly likely that thread B, which holds the lock, actually is running on a different CPU. That's why it can be OK to use spin-locks in an ordinary program when you know it's going to run on a multi-processor host.
Busy-waiting or spinning is a technique in which a process repeatedly checks to see if a condition is true instead of calling wait or sleep method and releasing CPU.
1.It is mainly useful in multicore processor where condition is going to be true quite quickly i.e. in millisecond or micro second
2.Advantage of not releasing CPU is that, all cached data and instruction are remained unaffected, which may be lost, had this thread is suspended on one core and brought back to another thread
Busy spin is one of the techniques to wait for events without releasing CPU. It's often done to avoid losing data in CPU cached which is lost if the thread is paused and resumed in some other core.
So, if you are working on a low latency system where your order processing thread currently doesn't have any order, instead of sleeping or calling wait(), you can just loop and then again check the queue for new messages. It's only beneficial if you need to wait for a very small amount of time e.g. in microseconds or nanoseconds.
LMAX Disrupter framework, a high-performance inter-thread messaging library has a BusySpinWaitStrategy which is based on this concept and uses a busy spin loop for EventProcessors waiting on the barrier.
A "busy spin" is constantly looping in one thread to see if the other thread has completed some work. It is a "Bad Idea" as it consumes resources as it is just waiting. The busiest of spins don't even have a sleep in them, but spin as fast as possible waiting for the work to get finished. It is less wasteful to have the waiting thread notified by the completion of the work directly and just let it sleep until then.
Note, I call this a "Bad Idea", but it is used in some cases on low-level code to minimize latency, but this is rarely (if ever) needed in Java code.
Busy spinning/waiting is normally a bad idea from a performance standpoint. In most cases, it is preferable to sleep and wait for a signal when you are ready to run, than to do spinning. Take the scenario where there are two threads, and thread 1 is waiting for thread 2 to set a variable (say, it waits until var == true. Then, it would busy spin by just doing
while (var == false)
;
In this case, you will take up a lot of time that thread 2 can potentially be running, because when you wake up you are just executing the loop mindlessly. So, in a scenario where you are waiting for something like this to happen, it is better to let thread 2 have all control by putting yourself to sleep and having it wake you up when it is done.
BUT, in rare cases where the time you need to wait is very short, it is actually faster to spinlock. This is because of the time it takes to perform the signalng functions; spinning is preferable if the time used spinning is less than the time it would take to perform the signaling. So, in that way it may be beneficial and could actually improve performance, but this is definitely not the most frequent case.
Spin Waiting is that you constantly wait for a condition comes true. The opposite is waiting for a signal (like thread interruption by notify() and wait()).
There are two ways of waiting, first semi-active (sleep / yield) and active (busy waiting).
On busy waiting a program idles actively using special op codes like HLT or NOP or other time consuming operations. Other use just a while loop checking for a condition comming true.
The JavaFramework provides Thread.sleep, Thread.yield and LockSupport.parkXXX() methods for a thread to hand over the cpu. Sleep waits for a specific amount of time but alwasy takes over a millisecond even if a nano second was specified. The same is true for LockSupport.parkNanos(1). Thread.yield allows for a resolution of 100ns for my example system (win7 + i5 mobile).
The problem with yield is the way it works. If the system is utilized fully yield can take up to 800ms in my test scenario (100 worker threads all counting up a number (a+=a;) indefinitively). Since yield frees the cpu and adds the thread to the end of all threads within its priority group, yield is therefore unstable unless the cpu is not utilized to a certain extend.
Busy waiting will block a CPU (core) for multiple milliseconds.
The Java Framework (check Condition class implementations) uses active (busy) wait for periodes less then 1000ns (1 microsecond). At my system an average invocation of System.nanoTime takes 160ns so busy waiting is like checking the condition spend 160ns on nanoTime and repeat.
So basically the concurrency framework of Java (queues etc) has something like wait under a microsecond spin and hit the waiting periode within a N granulairty where N is the number of nanoseconds for checking time constraints and wait for one ms or longer (for my current system).
So active busy waiting increases utilization but aid in the overall reactiveness of the system.
While burning CPU time one should use special instructions reducing the power consumption of the core executing the time consuming operations.
Busy spin is nothing but looping over until thread(s) completes. E.g. You have say 10 threads, and you want to wait all the thread to finish and then want to continue,
while(ALL_THREADS_ARE_NOT_COMPLETE);
//Continue with rest of the logic
For example in java you can manage multiple thread with ExecutorService
ExecutorService executor = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
Runnable worker = new WorkerThread('' + i);
executor.execute(worker);
}
executor.shutdown();
//With this loop, you are looping over till threads doesn't finish.
while (!executor.isTerminated());
It is a to busy spins as it consumes resources as CPU is not sitting ideal, but keeping running over the loop. We should have mechanism to notify the main thread
(parent thread) to indicate that all thread are done and it can continue with the rest of the task.
With the preceding example, instead of having busy spin, you can use different mechanism to improve performance.
ExecutorService executor = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
Runnable worker = new WorkerThread('' + i);
executor.execute(worker);
}
executor.shutdown();
try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
log.fatal("Exception ",e);
}

Java's ExecutorService performance

I have a main thread which dispatches jobs to a thread pool. I'm using Java's Executor framework.
From the profiler (VirtualVM) I can see each thread's activity: I can see that the main thread is waiting a lot (because the executor's queue has a upper limit) which means the executor's queue is full most of the time. However the executor's threads are not as busy as I would have thought. Most of them have a waiting time of 75%. In virtualVM it says it waits on Monitor.
Can anyone explain why is this happenning? why would the executor threads wait while there is still plenty of work available to do? And how to improve the performance of the executor? thus to improve the performance overall? More detail on the executor's wait on monitor would be great.
The job runs in the workers is just some computation, which don't depends on anything else and don't communicate to any other thread (no synchronisation), except in the end, it put data in the database, using it is own connection.
Parallel execution will yield significantly better results that a synchronous execution if:
the work to be done is independent from each other (no or few and very short critical sections)
each single executed work takes enough time to make up for thread start / executor's internal synchronization
the work does not use the same resource - for example reading multiple files from the same disk will probably be slower than reading them sequentially.
you actually have enough system resources (processor cores, memory, network speed) to use at once
Threading does not mean that all the threads will work in parallel all the time. Threads will surely go to waiting state due to various reasons, mostly depend on how the scheduler assigns the CPU to each of them. Is there some synchronized code in your thread class? If yes then if one thread is executing a synchronized method then all the other threads have to wait. If there is too much of synchronized code then threads waiting time will increase.
After doing a thread dump, it turns out that it is the database layer that has the synchronisation. Hibernate's Sequence Generator is synchronised.
"pool-2-thread-1" - Thread t#13
java.lang.Thread.State: BLOCKED
at org.hibernate.id.SequenceHiLoGenerator.generate(SequenceHiLoGenerator.java:73)
- waiting to lock <61fcb35> (a org.hibernate.id.SequenceHiLoGenerator) owned by "pool-2-thread-5" t#23
at org.hibernate.internal.StatelessSessionImpl.insert(StatelessSessionImpl.java:117)
at org.hibernate.internal.StatelessSessionImpl.insert(StatelessSessionImpl.java:110)
at ac.uk.ebi.kraken.unisave.storage.impl.HibernateStorageEngine.saveEntryIndex(HibernateStorageEngine.java:269)
at ac.uk.ebi.kraken.unisave.storage.impl.EntryStoreImpl.storeEntryIndex(EntryStoreImpl.java:302)
at ac.uk.ebi.kraken.unisave.impl.MTEntryIndexLoader$EntryIndexLoader.run(MTEntryIndexLoader.java:129)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Locked ownable synchronizers:
- locked <3d360c93> (a java.util.concurrent.ThreadPoolExecutor$Worker)
Thread is scheduled by the scheduler to assign cpu cycles to run them, that means if the machine has 4 cpus, at a time only 4 threads could be run in parallel, so other threads have to wait for the scheduler to assign cpu to them.

Is dangerous to start threads in Java and not to wait for them (with .join())?

When writing a multithread internet server in java, the main-thread starts new
ones to serve incoming requests in parallel.
Is any problem if the main-thread does not wait ( with .join()) for them?
(It is obviously absurd create a new thread and then, wait for it).
I know that, in a practical situation, you should (or "you must"?) implement a pool
of threads to "re-use" them for new requests when they become idle.
But for small applications, should we use a pool of threads?
You don't need to wait for threads.
They can either complete running on their own (if they've been spawned to perform one particular task), or run indefinitely (e.g. in a server-type environment).
They should handle interrupts and respond to shutdown requests, however. See this article on how to do this correctly.
If you need a set of threads I would use a pool and executor methods since they'll look after thread resource management for you. If you're writing a multi-threaded network server then I would investigating using (say) a servlet container or a framework such as Mina.
The only problem in your approach is that it does not scale well beyond a certain request rate. If the requests are coming in faster than your server is able to handle them, the number of threads will rise continuously. As each thread adds some overhead and uses CPU time, the time for handling each request will get longer, so the problem will get worse (because the number of threads rises even faster). Eventually no request will be able to get handled anymore because all of the CPU time is wasted with overhead. Probably your application will crash.
The alternative is to use a ThreadPool with a fixed upper bound of threads (which depends on the power of the hardware). If there are more requests than the threads are able to handle, some requests will have to wait too long in the request queue, and will fail due to a timeout. But the application will still be able to handle the rest of the incoming requests.
Fortunately the Java API already provides a nice and flexible ThreadPool implementation, see ThreadPoolExecutor. Using this is probably even easier than implementing everything with your original approach, so no reason not to use it.
Thread.join() lets you wait for the Thread to end, which is mostly contrary to what you want when starting a new Thread. At all, you start the new thread to do stuff in parallel to the original Thread.
Only if you really need to wait for the spawned thread to finish, you should join() it.
You should wait for your threads if you need their results or need to do some cleanup which is only possible after all of them are dead, otherwise not.
For the Thread-Pool: I would use it whenever you have some non-fixed number of tasks to run, i.e. if the number depends on the input.
I would like to collect the main ideas of this interesting (for me) question.
I can't totally agree with "you
don't need to wait for threads".
Only in the sense that if you don't
join a thread (and don't have a
pointer to it) once the thread is
done, its resources are freed
(right? I'm not sure).
The use of a thread pool is only
necessary to avoid the overhead of
thread creation, because ...
You can limit the number of parallel
running threads by accounting, with shared variables (and without a thread pool), how many of then
were started but not yet finished.

Java daemon - handling shutdown requests

I'm currently working on a daemon that will be doing A LOT of different tasks. It's multi threaded and is being built to handle almost any kind of internal-error without crashing. Well I'm getting to the point of handling a shutdown request and I'm not sure how I should go about doing it.
I have a shutdown hook setup, and when it's called it sets a variable telling the main daemon loop to stop running. The problem is, this daemon spawns multiple threads and they can take a long time. For instance, one of these threads could be converting a document. Most of them will be quick (I'm guessing under 10 seconds), but there will be threads that can last as long as 10+ minutes.
What I'm thinking of doing right now is when a shutdown hook has been sent, do a loop for like 5 seconds on ThreadGroup.activeCount() with a 500ms (or so) Sleep (all these threads are in a ThreadGroup) and before this loop, I will send a notification to all threads telling them a shutdown request has been called. Then they will have to instantly no matter what they're doing cleanup and shutdown.
Anyone else have any suggestions? I'm interested in what a daemon like MySQL for instance does when it gets told to stop, it stops instantly. What happens if like 10 query's are running that are very slow are being called? Does it wait or does it just end them. I mean servers are really quick, so there really isn't any kind of operation that I shouldn't be able to do in less than a second. You can do A LOT in 1000ms now days.
Thanks
The java.util.concurrent package provides a number of utilities, such as ThreadPoolExecutor (along with various specialized types of other Executor implementations from the Executors class) and ThreadPoolExecutor.awaitTermination(), which you might want to look into - as they provide the same exact functionality you are looking to implement. This way you can concentrate on implementing the actual functionality of your application/tasks instead of worrying about things like thread and task scheduling.
Are your thread jobs amenable to interruption via Thread#interrupt()? Do they mostly call on functions that themselves advertise throwing InterruptedException? If so, then the aforementioned java.util.concurrent.ExecutorService#shutdownNow() is the way to go. It will interrupt any running threads and return the list of jobs that were never started.
Similarly, if you hang on to the Futures produced by ExecutorService#submit(), you can use Future#cancel(boolean) and pass true to request that a running job be interrupted.
Unless you're calling on code out of your control that swallows interrupt signals (say, by catching InterruptedException without calling Thread.currentThread().interrupt()), using the built-in cooperative interruption facility is a better choice than introducing your own flags to approximate what's already there.

Categories