Finished Threads: should i ignore them? - java

I wrote a web crawler that opens many web pages. As you can see in the image below, some threads seem to be finished (white color), but what does it mean? Do I have some bug? Is there a leak of resources? And, how can I have an idea about where those threads are generated and why are they finished? Should I worry about them?
VisualVM
The problem is that if i keep it running for a day, i get thousands of that threads, so i'm worried about it.

It's fine to start lots of threads, as long as not too many of them are alive at the same time. "Finished" threads are no longer alive, so they won't cause issues.
Having said that, in Java threads are rather expensive to create (this can be different in other languages, like Erlang), and you usually don't want / don't need to create lots of threads during the course of your app's life. You may want to use a Thread Pool. This will re-use threads instead of starting a new one many times.

The finished threads will not kill your application. But instead of creating new threads that we'll be finished, use a thread pool that will re-use them.

Related

How to decide on number of threads for an application?

I have read that each core can have two threads. So, if my application in prod uses three octa core servers, does that mean my application can only handle 48 concurrent requests? Or am I mixing two different things here?
Would appreciate any clarity here.
In Java, you can have as many threads as you like and you're not limited by how many CPU cores you have. I.e. Even if you only had one processor with a single core, you could still write a multi-threaded application.
The JVM will perform context switching - it will execute Thread 1 for some time, then Thread 2 for some time, then maybe Thread 1 again, and so on, switching between the threads. These switches between threads can occur after just a few milliseconds, so it can give the illusion that the threads are running in parallel.
On some applications, it is faster to just use a single thread - because this process of context switching just adds further expense.
I did actually write a small multi-threaded application the other day though. It had about 30 threads, and this was a use case where multithreading did make the app more efficient.
I had about 30 URLs that I needed to hit and retrieve some data from. If I did this in a single thread, there would be waiting time each time I made a request and waited for a response (thus, blocking the application). When multi-threading, other threads will have been able to run whilst this waiting went on.
I hope this makes sense. It'll be worth reading up on Java Context Switching for more info.
This is a good source on the topic: https://docs.oracle.com/javase/tutorial/essential/concurrency/index.html

Terminating a Swing Worker

Good Day to all:
My job is to create custom tools for our customers. My employer has a flagship product that we sell that has a Java interface. I then build tools (with Java) for our customers to make life easier for them. These tools run as plugins launched from within the Java interface.
My problem is that, during normal operation, I like to use SwingWorkers to do things. During debugging, I began to notice (with the NetBeans debugger) that even though a given SwingWorker was done, the thread was still running. Normally, this would not concern me, but I then began to notice that, even if the tool I was working on was closed, the Swing Worker was still hanging out in the pool. If I closed the primary application, of course all the threads would die as the JVM terminated, but if the primary app was still running, my SwingWorkers would hang around (even though the done & cancel flags are set).
Clearly this means that my Java app is using the primary applications EDT (which makes sense), but it leaves me with a problem. If I have a user who runs my tool multiple times during a single session with the primary app, then I'll start stacking up zombie SwingWorkers who aren't doing anything except chewing up a spot of memory & CPU.
So, the question I have for the hive mind, is there anyway to force terminate a zombie SwingWorker? Or, absent that, is there any way to re-attach to a Zombie SwingWorker if, for instance, I know it's name?
Thank you!
I think this is expected if you use a thread pool, which keeps threads hanging around (probably with an idle loop, causing them to show up as 'running').
The docs say that swingworkers can only execute once, so re-attaching them isn't possible. They aren't threads themselves, but executed on a worker thread.
You could limit the number of threads in the pool by using an ExecutorService.
See this SO question for more details.

Java threads, keep same number of threads running

I need to process 80 files of info, and I'm doing it via groups of 8 threads, what i would like to do is to to always have 8 threads running (right now i have 8 threads, and after those 8 finish their job, another 8 are generated, and so on).
So I Would like to know if there is a way to do his:
launch 8 threads.
after 1 thread finishes its job, launch another thread (so all the
time I have 8 threads running, until the job is done)
Why not use a thread pool, and in particular a fixed size thread pool ? Configure your thread pool size to be 8 threads, and then submit all your work items as Runnable/Callable objects. The thread pool will execute these using the 8 configured threads.
So, everyone is quick to jump in and tell you to use a thread pool. Sure, that's the right way to achieve what you want. The question is, is it the right thing to want? It's not as simple as throw a bunch of threads at the problem, and magically everything is solved.
You haven't told us the nature of the processing. Are the jobs I/O bound, or CPU bound1? If they are CPU bound, the threads do nothing. If they are I/O bound, the threading might help.
You haven't told us if you have eight cores (or compute units). If you can't guarantee that you'll have that, it might not be best to have eight threads running.
There's a lot to think about. You're increasing the complexity of your solution. Maybe it's getting you what you want, maybe not.
1: Yes, you said you're processing files, but that doesn't tell us enough. Maybe the processing is intensive (think: rendering a video file). Or maybe you're reading the files from a very fast disk (think: SSD or memory-mapped files).

Java Memory Usage / Thread Pool Performance Problem

These things obviously require close inspection and availability of code to thoroughly analyze and give good suggestions. Nevertheless, that is not always possible and I hope it may be possible to provide me with good tips based on the information I provide below.
I have a server application that uses a listener thread to listen for incoming data. The incoming data is interpreted into application specific messages and these messages then give rise to events.
Up to that point I don't really have any control over how things are done.
Because this is a legacy application, these events were previously taken care of by that same listener thread (largely a single-threaded application). The events are sent to a blackbox and out comes a result that should be written to disk.
To improve throughput, I wanted to employ a threadpool to take care of the events. The idea being that the listener thread could just spawn new tasks every time an event is created and the threads would take care of the blackbox invocation. Finally, I have a background thread performing the writing to disk.
With just the previous setup and the background writer, everything works OK and the throughput is ~1.6 times more than previously.
When I add the thread pool however performance degrades. At the start, everything seems to run smoothly but then after awhile everything is very slow and finally I get OutOfMemoryExceptions. The weird thing is that when I print the number of active threads each time a task is added to the pool (along with info on how many tasks are queued and so on) it looks as if the thread pool has no problem keeping up with the producer (the listener thread).
Using top -H to check for CPU usage, it's quite evenly spread out at the outset, but at the end the worker threads are barely ever active and only the listener thread is active. Yet it doesn't seem to be submitting more tasks...
Can anyone hypothesize a reason for these symptoms? Do you think it's more likely that there's something in the legacy code (that I have no control over) that just goes bad when multiple threads are added? The out of memory issue should be because some queue somewhere grows too large but since the threadpool almost never contains queued tasks it can't be that.
Any ideas are welcome. Especially ideas of how to more efficiently diagnose a situation like this. How can I get a better profile on what my threads are doing etc.
Thanks.
Slowing down then out of memory implies a memory leak.
So I would start by using some Java memory analyzer tools to identify if there is a leak and what is being leaked. Sometimes you get lucky and the leaked object is well-known and it becomes pretty clear who is hanging on to things that they should not.
Thank you for the answers. I read up on Java VisualVM and used that as a tool. The results and conclusions are detailed below. Hopefully the pictures will work long enough.
I first ran the program and created some heap dumps thinking I could just analyze the dumps and see what was taking up all the memory. This would probably have worked except the dump file got so large and my workstation was of limited use in trying to access it. After waiting two hours for one operation, I realized I couldn't do this.
So my next option was something I, stupidly enough, hadn't thought about. I could just reduce the number of messages sent to the application, and the trend of increasing memory usage should still be there. Also, the dump file will be smaller and faster to analyze.
It turns out that when sending messages at a slower rate, no out of memory issue occured! A graph of the memory usage can be seen below.
The peaks are results of cumulative memory allocations and the troughs that follow are after the garbage collector has run. Although the amount of memory usage certainly is quite alarming and there are probably issues there, no long term trend of memory leakage can be observed.
I started to incrementally increase the rate of messages sent per second to see where the application hits the wall. The image below shows a very different scenario then the previous one...
Because this happens when the rate of messages sent are increased, my guess is that my freeing up the listener thread results in it being able to accept a lot of messages very quickly and this causes more and more allocations. The garbage collector doesn't run and the memory usage hits a wall.
There's of course more to this issue but given what I have found out today I have a fairly good idea of where to go from here. Of course, any additional suggestions/comments are welcome.
This questions should probably be recategorized as dealing with memory usage rather than threadpools... The threadpool wasn't the problem at all.
I agree with #djna.
Thread Pool of java concurrency package works. It does not create threads if it does not need them. You see that number of threads is as expected. This means that probably something in your legacy code is not ready for multithreading. For example some code fragment is not synchronized. As a result some element is not removed from collection. Or some additional elements are stored in collection. So, the memory usage is growing.
BTW I did not understand exactly which part of the application uses threadpool now. Did you have one thread that processes events and now you have several threads that do this? Have you probably changed the inter-thread communication mechanism? Added queues? This may be yet another direction of your investigation.
Good luck!
As mentioned by djna, it's likely some type of memory leak. My guess would be that you're keeping a reference to the request around somewhere:
In the dispatcher thread that's queuing the requests
In the threads that deal with the requests
In the black box that's handling the requests
In the writer thread that writes to disk.
Since you said everything works find before you add the thread pool into the mix, my guess would be that the threads in the pool are keeping a reference to the request somewhere. Th idea being that, without the threadpool, you aren't reusing threads so the information goes away.
As recommended by djna, you can use a Java memory analyzer to help figure out where the data is stacking up.

How can I limit the performance of sandboxed Java code?

I'm working on a multi-user Java webapp, where it is possible for clients to use the webapp API to do potentially naughty things, by passing code which will execute on our server in a sandbox.
For example, it is possible for a client to write a tight while(true) loop that impacts the performance of other clients.
Can you guys think of ways to limit the damage caused by these sorts of behaviors to other clients' performance?
We are using Glassfish for our application server.
The halting problem show that there is no way that a computer can reliably identify code that will not terminate.
The only way to do this reliably is to execute your code in a separate JVM which you then ask the operating system to shut down when it times out. A JVM not timing out can process more tasks so you can just reuse it.
One more idea would be byte-code instrumentation. Before you load the code sent by your client, manipulate it so it adds a short sleep in every loop and for every method call (or method entry).
This avoids clients clogging a whole CPU until they are done. Of course, they still block a Thread object (which takes some memory), and the slowing down is for every client, not only the malicious ones. Maybe make the first some tries free, then scale the waiting time up with each try (and set it down again if the thread has to wait for other reasons).
Modern day app servers use Thread Pooling for better performance. The problem is that one bad apple can spoil the bunch. What you need is an app server with one thread or maybe process per request. Of course there are going to be trade offs. but the OS will handle making sure that processing time gets allocated evenly.
NOTE: After researching a little more what you need is an engine that will create another process per request. If not a user can either cripple you servlet engine by having servlets with infinite loops and then posting multiple requests. Or he could simply do a System.exit in his code and bring everybody down.
You could use a parent thread to launch each request in a separate thread as suggested already, but then monitor the CPU time used by the threads using the ThreadMXBean class. You could then have the parent thread kill any threads that are misbehaving. This is if, of course, you can establish some kind of reasonable criteria for how much CPU time a thread should or should not be using. Maybe the rule could be that a certain initial amount of time plus a certain additional amount per second of wall clock time is OK?
I would make these client request threads have lower priority than the thread responsible for monitoring them.

Categories