Under certain conditions, one of our servers running legacy code in a Wildfly application server, is suffering thread starvation and needs to be restarted.
After an arduous investigation, I stumbled upon this silly code:
private void addToQueue (Item e) {
if (!_queue.offer(e, 200L, TimeUnit.MILLISECONDS)) {
ThreadService.getInstance().schedule("retry process", () -> {
addToQueue(e);
return null;
}, 5L, TimeUnit.SECONDS);
}
}
ThreadService is the Wildfly implementation of the Java SE Executor Service, that provides a limited number of threads (16).
Sometimes, during reconnections between services, we receive in short time a huge amount of items to be processed (~100k), and as you can see it spams the ThreadService with scheduled tasks.
An obvious solution would be to increase the queue, which has currently a capacity of 20k. However I am afraid this would end up in other problems. Obviously this thread creation spamming needs to be eliminated.
Since processing these items is a non critical task I was thinking on using a disk backed queue, so it can be done in a separated process at slow pace.
Searching a bit, I have seen this project: Tape by Square
I would like to know your opinion about this solution, which kind of reminds me a bit to the pipes in Linux I used years ago. What do you think?
Related
I'm using Executors.newFixedThreadPool(THREADS) where THREADS=20 for sending events asynchronously through a rest service.
A couple weeks ago I experienced some unexpected problems when the rest service I'm consuming went down or began to response slower.
1. First, the LinkedBlockingQueue managed by the thread pool, suffered a bottleneck since it could not finish processes faster than new processes were created. So eventually it reached the max capacity allowed (Integer.MAX_VALUE) and great part of the heap memory was in use by the queue.
2. Second, although the rest service recovered and began to response in time again, the thread pool seemed to be in a "hanging state" where no more events were sent. I had to restart the application to fix the problem and send events back.
Here's the output of the Memory Analyzer Tool with the heap dump at issue:
One instance of "java.util.concurrent.ThreadPoolExecutor" loaded by
"" occupies 2,518,687,016 (94.69%) bytes. The
instance is referenced by
com.despegar.flights.keeper.concurrent.trace.TraceAwareExecutorServiceImpl
# 0x695cf3550 , loaded by "sun.misc.Launcher$AppClassLoader #
0x694cfd810".
Keywords java.util.concurrent.ThreadPoolExecutor
sun.misc.Launcher$AppClassLoader # 0x694cfd810
And the static state of the ThreadPoolExecutor:
Type |Name |Value
----------------------------------
int |TERMINATED |1610612736
int |TIDYING |1073741824
int |STOP |536870912
int |SHUTDOWN |0
int |RUNNING |-536870912
int |CAPACITY |536870911
int |COUNT_BITS |29
----------------------------------
Here's the code:
public AsyncDispatchManager() {
this.executorService = new TraceAwareExecutorServiceImpl(Executors.newFixedThreadPool(THREADS));
}
public void execute(String name, Runnable runnable) {
this.executorService.submit(this.adaptRunnable(name, runnable));
}
Regarding to the queue, I don't want to spend more resources in it, so I would implement a limited queue and a rejection policy for this situations since I don't care to loss some events.
Do you think there's a more appropriate solution?
With the hanging threads I'm stuck. I don't understand the background problem and how to resolve it.
Did someone have to deal with such problems?
I'd really appreciate any proposed solution or information.
This question already has answers here:
Does multi-threading improve performance? How?
(2 answers)
Closed 8 years ago.
I have a List<Object> objectsToProcess.Lets say it contains 1000000 item`s. For all items in the array you then process each one like this :
for(Object : objectsToProcess){
Go to database retrieve data.
process
save data
}
My question is : would multi threading improve performance? I would of thought that multi threads are allocated by default by the processor anyways?
In the described scenario, given that process is a time-consuming task, and given that the CPU has more than one core, multi-threading will indeed improve the performance.
The processor is not the one who allocates the threads. The processor is the one who provides the resources (virtual CPUs / virtual processors) that can be used by threads by providing more than one execution unit / execution context. Programs need to create multiple threads themselves in order to utilize multiple CPU cores at the same time.
The two major reasons for multi-threading are:
Making use of multiple CPU cores which would otherwise be unused or at least not contribute to reducing the time it takes to solve a given problem - if the problem can be divided into subproblems which can be processed independently of each other (parallelization possible).
Making the program act and react on multiple things at the same time (i.e. Event Thread vs. Swing Worker).
There are programming languages and execution environments in which threads will be created automatically in order to process problems that can be parallelized. Java is not (yet) one of them, but since Java 8 it's on a good way to that, and Java 9 maybe will bring even more.
Usually you do not want significantly more threads than the CPU provides CPU cores, for the simple reason that thread-switching and thread-synchronization is overhead that slows down.
The package java.util.concurrent provides many classes that help with typical problems of multithreading. What you want is an ExecutorService to which you assign the tasks that should be run and completed in parallel. The class Executors provides factor methods for creating popular types of ExecutorServices. If your problem just needs to be solved in parallel, you might want to go for Executors.newCachedThreadPool(). If your problem is urgent, you might want to go for Executors.newWorkStealingPool().
Your code thus could look like this:
final ExecutorService service = Executors.newWorkStealingPool();
for (final Object object : objectsToProcess) {
service.submit(() -> {
Go to database retrieve data.
process
save data
}
});
}
Please note that the sequence in which the objects would be processed is no longer guaranteed if you go for this approach of multithreading.
If your objectsToProcess are something which can provide a parallel stream, you could also do this:
objectsToProcess.parallelStream().forEach(object -> {
Go to database retrieve data.
process
save data
});
This will leave the decisions about how to handle the threads to the VM, which often will be better than implementing the multi-threading ourselves.
Further reading:
http://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html#executing_streams_in_parallel
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html
Depends on where the time is spent.
If you have a load of calculations to do then allocating work to more threads can help, as you say each thread may execute on a separate CPU. In such a situation there is no value in having more threads than CPUs. As Corbin says you have to figure out how to split the work across the threads and have responsibility for starting the threads, waiting for completion and aggregating the results.
If, as in your case, you are waiting for a database then there can be additional value in using threads. A database can serve several requests in paraallel (the database server itself is multi-threaded) so instead of coding
for(Object : objectsToProcess){
Go to database retrieve data.
process
save data
}
Where you wait for each response before issuing the next, you want to have several worker threads each performing
Go to database retrieve data.
process
save data
Then you get better throughput. The trick though is not to have too many worker threads. Several reasons for that:
Each thread is uses some resources, it has it's own stack, its own
connection to the database. You would not want 10,000 such threads.
Each request uses resources on the server, each connection uses memory, each database server will only serve so many requests in parallel. You have no benefit in submitting thousands of simultaneous requests if it can only server tens of them in parallel. Also If the database is shared you probably don't want to saturate the database with your requests, you need to be a "good citizen".
Net: you will almost certainly get benefit by having a number of worker threads. The number of threads that helps will be determined by factors such as the number of CPUs you have and the ratio between the amount of processing you do and the response time from the DB. You can only really determine that by experiment, so make the number of threads configurable and investigate. Start with say 5, then 10. Keep your eye on the load on the DB as you increase the number of threads.
In my web application running tomcat 6, an object (not Servlet) is scheduled to execute for reading files from a defined folder. After reading a file, the file content is saved into a database.
In order to have higher performance, multitasking is needed. My original approach is to build a new thread once a file is read, tasks of each file run in parallel in background. For example, if three files are found, three threads are created.
However, although tomcat configuration has set maxthreads to more than 200 as well as 32GB memory has been assigned, every time only 7-8 threads are running simultaneously. What's wrong? Or multithreading is not a best practice for multitasking? Please help.
Addition (14 Mar 2014)
Thanks for your advice. So my question can be more specific:
1. Can ThreadPoolExecutor improve the performance?
2. Can NIO improve the performance?
Here is the original code:
String[] listFiles = folder.list();
for(int i=0; i<listFiles.length; i++) {
synchronized(globalHashMap) {
MyTask myTask = new MyTask(listFiles[i]);
globalHashMap.put(listFiles[i], myTask );
myTask.start();
}
}
MyTask {
String myFile;
Thread myThread;
public MyTask(String file) {
myFile = file;
}
public void start() {
myThread = new Thread(new Runnable() {
do {
readCnt = bufferedInputStream.read(bytesArray, 1024, 1);
...
} while(not end);
postProcessFunction();
synchronized(globalHashMap) {
globalHashMap.remove(myFile);
globalHashMap.notifyAll();
}
}
myThread.start();
}
}
The maxThreads setting in Tomcat does not mean the max. # of threads the JVM can have. Tomcat has no control over that. It specifies the max. # of worker threads Tomcat itself will create to service incoming HTTP requests. Your Java code can still create any threads it needs.
As for why you only get 7 - 8 threads, I'd have to see the code to know for sure. How many files are in this directory?
I am not sure what analysis you've done, but I often hear "multi-threading" as the canned solution for making something faster and that is a very dangerous way to tackle things. Threading is meant to tackle very specific set of problems. It should be a last resort. Especially in a web application. Web containers use multiple class-loaders to deploy and undeploy and redeploy applications on the fly. Threads create a maintenance nightmare and often prevent proper class loader cleanup.
I have actually seen occasions where multi-threading masks a problem. When I first joined my current company, an effort was underway to multi-thread the process which deploys SQL scripts against our databases to apply bug fixes. The complaint was the process was too slow, so the solution of course was to do multiple DB's in parallel via multi-threading. I recently discovered that the script execution process runs a SQL statement (for GRANT) at the end of every script against every database that takes 2 minutes. This statement is rarely ever needed. If this process was properly profiled to begin with, my recommendation would have been to remove the unneeded code, which would have dropped the process from 2-3 hours to < 10 minutes. Now we are stuck maintaining a mess of thread management code.
So, now my question to you is, have you profiled your code? As #wallenborn pointed out, disk I/O may be the bottleneck. There could also be optimizations in your code that could be made.
The MaxThreads parameter in Tomcat only controls how many threads are used for serving web requests. There is no limit (besides available memory) on how many additional threads your web application can create. There must be something wrong with the code.
Creating new threads inside application than is run on application server is not good idea. This is bad practice. People usually say to never do this because you can run out of threads for processing http requests.
For solving your problem the best way is to utilize jms. You background task will send a message to jms broker to process each each file found on the disk. Jms broker can process messages multithreaded and very efficiently and will controll all multithreading for you.
I am working on a project that is both memory and computationally intensive. A significant portion of the execution utilizes multi-threading by a FixedThreadPool. In short; I have 1 thread for fetching data from several remote locations (using URL connections) and populating a BlockingQueue with objects to be analyzed and n threads that pick these objects and run the analysis. edit: see code below
Now this setup works like a charm on my Linux machine running OpenSUSE 11.3, but a colleague is testing it on a very similar machine running Win7 is getting custom notifications of timeouts on the queue polling (see code below), lots of them actually. I have been trying to monitor the processor use on her machine, and it appears that the software does not get any more than 15% of the CPUs while on my machine the processor usage hits the roof, just as I intended.
My question is, then, can this be a sign of "starvation" of the queue? Could it be so that the producer thread is not getting enough cpu time? If so how do I go about giving one particular thread in the pool higher priority?
UPDATE:
I have been trying to pinpoint the problem, with no joy... I did however gain some new insights.
Profiling the execution of the code with JVisualVM demonstrates a very peculiar behavior. The methods are called in short bursts of CPU-time with several seconds of no progress in between. This to me means that somehow the OS is hitting the brakes on the process.
Disabling the anti-virus and back-up daemons do not have any significant affect on the matter
Changing the priority of java.exe (the only instance) through task manager (adviced here) does not change anything either. (That being said, I could not give "realtime" priority to java, and had to be content with "high" prio)
Profiling the network usage shows good flow of data in and out, so I am guessing that is not the bottleneck (while it is a considerable part of the execution time of the process, but that I know already and is pretty much the same percentage as what I get on my Linux machine).
Any ideas as to how the Win7 OS might be limiting the cpu time to my project? if it's not the OS, what could be the limiting factor? I would like to stress yet again that the machine is NOT running any other computation intensive at the same time and there is almost no load on the cpus other than my software. This is driving me crazy...
EDIT: relevant code
public ConcurrencyService(Dataset d, QueryService qserv, Set<MyObject> s){
timeout = 3;
this.qs = qserv;
this.bq = qs.getQueue();
this.ds = d;
this.analyzedObjects = s;
this.drc = DebugRoutineContainer.getInstance();
this.started = false;
int nbrOfProcs = Runtime.getRuntime().availableProcessors();
poolSize = nbrOfProcs;
pool = (ThreadPoolExecutor) Executors.newFixedThreadPool(poolSize);
drc.setScoreLogStream(new PrintStream(qs.getScoreLogFile()));
}
public void serve() throws InterruptedException {
try {
this.ds.initDataset();
this.started = true;
pool.execute(new QueryingAction(qs));
for(;;){
MyObject p = bq.poll(timeout, TimeUnit.MINUTES);
if(p != null){
if (p.getId().equals("0"))
break;
pool.submit(new AnalysisAction(ds, p, analyzedObjects, qs.getKnownAssocs()));
}else
drc.log("Timed out while waiting for an object...");
}
} catch (Exception ex) {
ex.printStackTrace();
String exit_msg = "Unexpected error in core analysis, terminating execution!";
}finally{
drc.log("--DEBUG: Termination criteria found, shutdown initiated..");
drc.getMemoryInfo(true); // dump meminfo to log
pool.shutdown();
int mins = 2;
int nCores = poolSize;
long totalTasks = pool.getTaskCount(),
compTasks = pool.getCompletedTaskCount(),
tasksRemaining = totalTasks - compTasks,
timeout = mins * tasksRemaining / nCores;
drc.log("--DEBUG: Shutdown commenced, thread pool will terminate once all objects are processed, " +
"or will timeout in : " + timeout + " minutes... \n" + compTasks + " of " + (totalTasks -1) +
" objects have been analyzed so far, " + "mean process time is: " +
drc.getMeanProcTimeAsString() + " milliseconds.");
pool.awaitTermination(timeout, TimeUnit.MINUTES);
}
}
The class QueryingAction is a simple Runnable that calls the data acquisition method in the designated QueryService object which then populates a BlockingQueue. The AnalysisAction class does all the number-crunching for a single instance of MyObject.
I suspect the producer thread is not getting/loading the source data fast enough. This might not be a lack of CPU but an IO related issue. (not sure why you have time outs on your BlockingQueue)
It might be worth having a thread which periodically logs things like the number of tasks added and the length of the queue (e.g. every 5-15 seconds)
So, if I correctly understand your problem, you have one thread to fetch data, and several threads to analyse the fetched data. Your problem is that the threads are not correctly synchronized to run together and take full advantage of the processor.
You have a tipical producer-consumer problem with a single producer and several consumers.
I advise you to remake your code a bit to have, instead, several independent consumer threads that are always waiting for resources to be available and only then running. This way you guarantee the maximum processor use.
Consumer thread:
while (!terminate)
{
synchronized (Producer.getLockObject())
{
try
{
//sleep (no processing at all)
Producer.getLockObject().wait();
}
catch (Exceptions..)
}
MyObject p = Producer.getObjectFromQueue(); //this function should be synchronized
//Analyse fetched data, and submit it to somewhere...
}
Producer thread:
while (!terminate)
{
MyObject newData = fetchData(); //fetch data from remote location
addDataToQueueu(newData); //this should also be synchronized
synchronized (getLockObject())
{
//wake up one thread to deal with the data
getLockObject().notify();
}
}
You see that this way, your threads are always performing useful work or sleeping.
This is just draft code to exemplify.
See more explanation here: http://www.javamex.com/tutorials/wait_notify_how_to.shtml
and here: http://www.java-samples.com/showtutorial.php?tutorialid=306
Priority won't help, since the problem is not an issue of deciding who gets precious resources -- resource usage isn't maxed. The only way the producer thread would not be getting enough CPU time is if it wasn't ready-to-run. Priority won't help, since the problem is not an issue.
How many cores does the machine have? It's possible that the producer thread is running full speed and there still just isn't enough CPU to go around. It's also possible the producer is I/O bound.
You can try to separate the producer thread from the pool (i.e. create a distinct Thread and set the pool to have -1 the current capacity) and then set its priority to maximum via setPriority. See what happens, although priority rarely accounts for such a difference in performance.
When you say URL connection, do you mean local or remote? It could be that network speed is slowing your producer down
So after weeks of fiddling, wrestling in code and other types of suffering I think I had a breakthrough, "a moment of clarity" if you will...
I managed to show that the program can exhibits the same slow behavior on my Linux machine and can indeed run full throttle on the problematic Win-7 machine. The crux of the problem appears to be some sort of corruption of the system/cache files that are used to store the results of previous queries, and overall, speed up the analysis. You got to love the irony, in this case they appeared to be the reason for EXTREME slow analysis. In retrospect, I should have known (a la Occam's razor)...
I am still not sure what how the corruption occurs, but at least it's probably not related to different OS. Using the system files from my machine increases the output on the Win7 host up to about 40% only however. Profiling the process more has also revealed that, oddly enough, there is significantly more GC activity on Win7, which apparently took lots of CPU time from number crunching. Giving -Xmx2g takes care of excessive garbage collection and the CPU usage for the process shoots up to 95-96%, and threads run smoothly.
Now that my original question is answered, I have to say that overall java responsiveness is definitely better on Linux environment, even without allocating more heap memory, I can easily multi-task while I am running an extensive analysis in the background. Things are not as smooth in Win-7, e.x. resizing the GUI is significantly slow once the analysis takes off at full speed.
Thanks for all the replies, I am sorry for the partially misleading problem description. I merely shared what I found out while debugging to the best of my abilities. Anyways, I believe the bounty goes to Peter Lawrey, since he early on pointed to an I/O issue and it was his suggestion about a logger thread which eventually led me to the answer.
I would think it was some OS specific issue because that is the core difference between the two units. More specifically, something is slowing down the data arriving through the remote connection.
Find some traffic analysis tool such as Wireshark and/or Networx and try to discover if there is anything throttling the Win PC. Perhaps it is going through a proxy that has some kind of rate cap configured.
Sorry not really an answer but did not fit inside comment and still it is worth the read I think:
well i am not JAVA friendly
but i have recently the same problem with C++ projects for machine control through USB.
On XP or W2K all goes perfectly for months of 24/7 operation on any 2 or more core machine
On W7 and strong enough machine all goes OK but sometimes (cca 1x per few hours) freezes for few seconds without obvious reason.
On W7 and relatively weak machine (2 core 1.66GHz T2300E notebook) the threads are freezing for some time and run again which under/overflows USB/WIN/App FIFOs and collapse communication ...
it appears that nothing is blocked but the W7 sheduler just do not give CPU to the right threads occasionally.
i thought that USB driver (JUNGO) communication freezes bud that is not true I measured it and it is OK even in freeze
the freeze was about 6-15 seconds cca once per minute.
after adding some safety sleeps to threads loops the freeze has shorten to about 0.5 sec
but still there
even if App do not Under/Overflows FIFOs the windows USB driver side do (few times per minute for few ms)
Change of exe/threads priority and class do not affect performance on W7 (on XP,W2K work as it should)
As you can see it seems we have most likely the same problem. In my case:
is not I/O related (when i replace USB thread with simulation of device it behaves similar)
adding Sleep to time critical code helps a lot
error is present also in low count of threads [2 fast (17ms) + 1 slow (250ms) + App code = 4]
my CPU consumption on W7 slow machine is also not 100% but about 95% which is OK because I have sleeps everywhere
my Apps use about 40-100MB of memory but are CPU computation demanding ...
but not that much it could run safely on much slower machines
but because of USB driver connection and multiple device support it need at least 2 cores
my next step is to add some kind of execution time logging/analyze to see what is happening in more detail
and also little rewrite of send/receive threads to see if it helps
When i learn something new/useful will add it.
I have a Spring-MVC, Hibernate, (Postgres 9 db) Web app. An admin user can send in a request to process nearly 200,000 records (each record collected from various tables via joins). Such operation is requested on a weekly or monthly basis (OR whenever the data reaches to a limit of around 200,000/100,000 records). On the database end, i am correctly implementing batching.
PROBLEM: Such a long running request holds up the server thread and that causes the the normal users to suffer.
REQUIREMENT: The high response time of this request is not an issue. Whats required is not make other users suffer because of this time consuming process.
MY SOLUTION:
Implementing threadpool using Spring taskExecutor abstraction. So i can initialize my threadpool with say 5 or 6 threads and break the 200,000 records into smaller chunks say of size 1000 each. I can queue in these chunks. To further allow the normal users to have a faster db access, maybe I can make every runnable thread sleep for 2 or 3 secs.
Advantages of this approach i see is: Instead of executing a huge db interacting request in one go, we have a asynchronous design spanning over a larger time. Thus behaving like multiple normal user requests.
Can some experienced people please give their opinion on this?
I have also read about implementing the same beahviour with a Message Oriented Middleware like JMS/AMQP OR Quartz Scheduling. But frankly speaking, i think internally they are also gonna do the same thing i.e making a thread pool and queueing in the jobs. So why not go with the Spring taskexecutors instead of adding a completely new infrastructure in my web app just for this feature?
Please share your views on this and let me know if there is other better ways to do this?
Once again: the time to completely process all the records in not a concern, whats required is that normal users accessing the web app during that time should not suffer in any way.
You can parallelize the tasks and wait for all of them to finish before returning the call. For this, you want to use ExecutorCompletionService which is available in Java standard since 5.0
In short, you use your container's service locator to create an instance of ExecutorCompletionService
ExecutorCompletionService<List<MyResult>> queue = new ExecutorCompletionService<List<MyResult>>(executor);
// do this in a loop
queue.submit(aCallable);
//after looping
queue.take().get(); //take will block till all threads finish
If you do not want to wait then, you can process the jobs in the background without blocking the current thread but then you will need some mechanism to inform the client when the job has finished. That can be through JMS or if you have an ajax client then, it can poll for updates.
Quartz also has a job scheduling mechanism but, Java provides a standard way.
EDIT:
I might have misunderstood the question. If you do not want a faster response but rather you want to throttle the CPU, use this approach
You can make an inner class like this PollingThread where batches containing java.util.UUID for each job and the number of PollingThreads are defined in the outer class. This will keep going forever and can be tuned to keep your CPUs free to handle other requests
class PollingThread implements Runnable {
#SuppressWarnings("unchecked")
public void run(){
Thread.currentThread().setName("MyPollingThread");
while (!Thread.interrupted()) {
try {
synchronized (incomingList) {
if (incomingList.size() == 0) {
// incoming is empty, wait for some time
} else {
//clear the original
list = (LinkedHashSet<UUID>)
incomingList.clone();
incomingList.clear();
}
}
if (list != null && list.size() > 0) {
processJobs(list);
}
// Sleep for some time
try {
Thread.sleep(seconds * 1000);
} catch (InterruptedException e) {
//ignore
}
} catch (Throwable e) {
//ignore
}
}
}
}
Huge-db-operations are usually triggered at wee hours, where user traffic is pretty less. (Say something like 1 Am to 2 Am.. ) Once you find that out, you can simply schedule a job to run at that time. Quartz can come in handy here, with time based triggers. (Note: Manually triggering a job is also possible.)
The processed result could now be stored in different table(s). (I'll refer to it as result tables) Later when a user wants this result, the db operations would be against these result tables which have minimal records and hardly any joins would be involved.
instead of adding a completely new infrastructure in my web app just for this feature?
Quartz.jar is ~ 350 kb and adding this dependency shouldn't be a problem. Also note that there's no reason this need to be as a web-app. These few classes that do ETL could be placed in a standalone module.The request from the web-app needs to only fetch from the result tables
All these apart, if you already had a master-slave db model(discuss on that with your dba) then you could do the huge-db operations with the slave-db rather than the master, which normal users would be pointed to.