ExecutorService task execution intermittently delayed - java

I'm running a Java 7 Dropwizard app on a CentOS 6.4 server that basically serves as a layer on top of a data store (Cassandra) and does some additional processing. It also has an interface to Zookeeper using the Curator framework for some other stuff. This all works well and good most of the time, CPU and RAM load is never above 50% and usually about 10% and our response times are good.
My problem is that recently we've discovered that occasionally we get blips of about 1-2 seconds where seemingly all tasks scheduled via thread pools get delayed. We noticed this because of connection timeouts to Cassandra and session timeouts with Zookeeper. What we've done to narrow it down:
Used Wireshark and Boundary to make sure all network activity from our app was getting stalled, not just a single component. All network activity was stalling at the same time.
Wrote a quick little Python script to send timestamp strings to netcat on one of the servers we were seeing timeouts connecting to to make sure it's not an overall network issue between the boxes. We saw all timestamps come through smoothly during periods where our app had timeouts.
Disabled hyperthreading on the server.
Checked garbage collection timing logs for the timeout periods. They were consistent and well under 1ms through the timeout periods.
Checked our CPU and RAM resources during the timeout periods. Again, consistent, and well under significant load.
Added an additional Dropwizard resource to our app for diagnostics that would send timestamp strings to netcat on another server, just like the Python script. In this case, we did see delays in the timestamps when we saw timeouts in our app. With half-second pings, we would generally see a whole second missing entirely, and then four pings in the next second, the extra two being the delayed pings from the previous second.
To remove the network from the equation, we changed the above to just write to the console and a local file instead of to the network. We saw the same results (delayed pings) with both of those.
Profiled and checked our thread pool settings to see if we were using too many OS threads. /proc/sys/kernel/threads-max is 190115 and we never get above 1000.
Code for #7 (#6 is identical except for using a Socket and PrintWriter in place of the FileWriter):
public void start() throws IOException {
fileWriter = new FileWriter(this.fileName, false);
executor = Executors.newSingleThreadScheduledExecutor();
executor.scheduleAtFixedRate(this, 0, this.delayMillis, TimeUnit.MILLISECONDS);
}
#Override
public synchronized void run() {
try {
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
Date now = new Date();
String debugString = "ExecutorService test " + this.content + " : " + sdf.format(now) + "\n";
fileWriter.write(debugString);
fileWriter.flush();
} catch (Exception e) {
logger.error("Error running ExecutorService test: " + e.toString());
}
}
So it seems like the Executor is scheduling the tasks to be run, but they're being delayed in starting (because the timestamps are delayed and there's no way the first two lines of the try block in the run method are delaying the task execution). Any ideas on what might cause this or other things we can try? Hopefully we won't get to the point where we start reverting the code until we find what change caused it...
TL;DR: Scheduled tasks are being delayed and we don't know why.
UPDATE 1: We modified the executor task to push timestamps every half-second into a ring buffer instead of straight out to a file, and then dump the buffer every 20 seconds. This removes I/O as a possible cause of blocking task execution but still gives us the same info. From this, we still saw the same pattern of timestamps, from which it appears that the issue is not something in the task occasionally blocking the next execution of the task, but something in the task execution engine itself delaying execution for some reason.

When you use scheduleAtFixedRate, your expressing a desire that your task should be run as close to that rate as possible. The executor will do its best to keep to it, but sometimes it can't.
Your using Executors.newSingleThreadScheduledExecutor(), and so the executor only has a single thread to play with. If each execution of the task is taking longer than the period you specified in your schedule, then the executor won't be able to keep up, since the single thread may not have finished executing the previous run before the schedule kicked in the execute the next run. The result would manifest itself as delays in the schedule. This would seem a plausible explanation, since you say your real code is writing to a socket. That can easily block and send your timing off kilter.
You can find out if this is indeed the case by adding more logging at the end of the run method (i.e. after the flush). If the IO is taking too long, you'll see that in the logs.
As a fix, you could consider using scheduleWithFixedDelay instead, which will add a delay between each execution of the task, so long-running tasks don't run into each other. Failing that, then you need to ensure that the socket write completes on time, allowing each subsequent task execution to start on schedule.

The first step to diagnose a liveness issue is usually taking a thread dump when the system is stalled, and check what the threads were doing. In your case, the executor threads would be of particular interest. Are they processing, or are they waiting for work?
If they are all processing, the executor service has run out of worker threads, and can only schedule new tasks once a current task has been completed. This may be caused by tasks temporarily taking longer to complete. The stack traces of the worker threads may yield a clue just what is taking longer.
If many worker threads are idle, you have found a bug in the JDK. Congratulations!

Related

Selenium Fixed Thread Pool performance issues

I am currently trying to fire off a large number of selenium processes via
ExecutorService executor = Executors.newFixedThreadPool(10);
and am noticing performance issues.
The code that I'm deploying I first test and keep a copy of in a TestModule that runs the process once and uses the apache Stopwatch to monitor the time it takes to run the single thread.
When I run this code I see the following results:
Stopwatch time: 00:00:11.043
This is the time it takes from the initial driver.get(MY_WEBSITE_URL) all the way through inspecting elements and other tasking I wish to accomplish.
However, if I do the following in code, I get really slow results.
QueryAgent queryAgent = new QueryAgent();
queryAgent.startUp();
new Thread(queryAgent).start();
Inside of QueryAgent
private ExecutorService executor = Executors.newFixedThreadPool(10);
MyPojo pojo = MyPojoImpl.doStuff();
All of the code inside of "doStuff()" is the same code in my Test Module. If I am running 10 threads and each one should take no more than ~20 seconds to process. I would expect to see ~30 runs accomplished in a minute ~1800 in an hour.
Yet, looking at my logs I'm getting no more than 5 requests in a minute.
Is there a better way to run these requests in parallel?
EDIT1:
After looking at the comment below and doing a "top" it appears that once I hit about 4 instances of phantomjs - my CPUs are hitting 100%
But my memory usage is shy of 1GB. At this point it appears the bottle neck is the CPU(s). Any ideas?

Loop a java application in ticks

I'm making a Java server application. The application would comsume alot of resources if it just ran when possible.
As far as I know if I added a sleep method, it would run like this:
Do task (Might take 10ms to do. Can also take longer or less)
Sleep 50ms
Do task (Might take 10ms to do. Can also take longer or less)
Sleep 50ms
So how can I make it run every 50ms (20 tick)?
Thanks
You can use a ScheduledExecutorService
ScheduledExecutorService service = Executors.newScheduledThreadPool(10);
service.scheduleAtFixedRate(() -> {
System.out.println("whatever");
}, 0, 50, TimeUnit.MILLISECONDS);
// ^ rate
The scheduledAtFixedRate() method will schedule the given task for execution at a fixed rate, regardless of the time the task took. You could possibly have one execution take longer than 50ms, and the next one would still run (assuming you have enough threads).
Without knowing what your application does (you could've included it in your question), you could use a scheduler (Quartz, java.util.Timer). Which task are you trying to perform every 50ms?
Edit:
While the "game loop" is all well and good in games, servers rarely have them. Receiving data is a continuous action, and the state should change accordingly. This is a larger design issue in the server. With proper design you don't need to create artificial pauses.
For example a simple design would be having threads waiting to receive input from the clients, and when a message is received, it's processed, and a message is sent to all clients to inform of the changes. No busy waiting, nothing will happen unless a message arrives from a client.

Appengine Deferred task limited to 60 seconds

In Google Appengine documentation it says that tasks are limited to 10 minutes. However when I run deferred tasks they die in 60 seconds. I couldn't find anywhere this to be mentioned.
Does it mean that Appengine deferred tasks are limited to 60 seconds, or maybe I am doing something wrong?
UPDATE: The first task is triggered from request, but I am not waiting for it to return (and how could I anyway, there are no callbacks). The subsequent ones
I am triggering, kind of recursively, from within the task itself.
DeferredTask df = new QuoteReader(params);
QueueFactory.getDefaultQueue().add(withPayload(df));
Many of them just work, but for the ones which reach 1 minute limit I get ApiProxy$ApiDeadlineExceededException
com.googlecode.objectify.cache.Pending completeAllPendingFutures: Error cleaning up pending Future: com.googlecode.objectify.cache.CachingAsyncDatastoreService$3#17f5ddc
java.util.concurrent.ExecutionException: com.google.apphosting.api.ApiProxy$ApiDeadlineExceededException: The API call datastore_v3.Get() took too long to respond and was cancelled.
Another thing I noticed, this affects the other request to that server happening at the same time and that goes down with DeadlineExceededException.
The error is coming from a Datastore operation that is exceeding 60s. It's not really related to Taskqueue deadlines as such. You are correct that they are 10 minutes (see here)
However as per Old related issue (maybe it changed to 60s since)
From Google: Even though offline requests can currently live up to 10 minutes (and background instances can live forever) datastore queries can still only live for 30 seconds.
It seems from the exception that your code completed and it's Objectify (later in the request filters) that's actually where the timeout occurs. I'd suggest you split up your data operations so datastore queries are quicker and if necessary use .now() on your data operations so exceptions occur in your code.

Sporadic problems in running a multi-threaded Java project in Win7

I am working on a project that is both memory and computationally intensive. A significant portion of the execution utilizes multi-threading by a FixedThreadPool. In short; I have 1 thread for fetching data from several remote locations (using URL connections) and populating a BlockingQueue with objects to be analyzed and n threads that pick these objects and run the analysis. edit: see code below
Now this setup works like a charm on my Linux machine running OpenSUSE 11.3, but a colleague is testing it on a very similar machine running Win7 is getting custom notifications of timeouts on the queue polling (see code below), lots of them actually. I have been trying to monitor the processor use on her machine, and it appears that the software does not get any more than 15% of the CPUs while on my machine the processor usage hits the roof, just as I intended.
My question is, then, can this be a sign of "starvation" of the queue? Could it be so that the producer thread is not getting enough cpu time? If so how do I go about giving one particular thread in the pool higher priority?
UPDATE:
I have been trying to pinpoint the problem, with no joy... I did however gain some new insights.
Profiling the execution of the code with JVisualVM demonstrates a very peculiar behavior. The methods are called in short bursts of CPU-time with several seconds of no progress in between. This to me means that somehow the OS is hitting the brakes on the process.
Disabling the anti-virus and back-up daemons do not have any significant affect on the matter
Changing the priority of java.exe (the only instance) through task manager (adviced here) does not change anything either. (That being said, I could not give "realtime" priority to java, and had to be content with "high" prio)
Profiling the network usage shows good flow of data in and out, so I am guessing that is not the bottleneck (while it is a considerable part of the execution time of the process, but that I know already and is pretty much the same percentage as what I get on my Linux machine).
Any ideas as to how the Win7 OS might be limiting the cpu time to my project? if it's not the OS, what could be the limiting factor? I would like to stress yet again that the machine is NOT running any other computation intensive at the same time and there is almost no load on the cpus other than my software. This is driving me crazy...
EDIT: relevant code
public ConcurrencyService(Dataset d, QueryService qserv, Set<MyObject> s){
timeout = 3;
this.qs = qserv;
this.bq = qs.getQueue();
this.ds = d;
this.analyzedObjects = s;
this.drc = DebugRoutineContainer.getInstance();
this.started = false;
int nbrOfProcs = Runtime.getRuntime().availableProcessors();
poolSize = nbrOfProcs;
pool = (ThreadPoolExecutor) Executors.newFixedThreadPool(poolSize);
drc.setScoreLogStream(new PrintStream(qs.getScoreLogFile()));
}
public void serve() throws InterruptedException {
try {
this.ds.initDataset();
this.started = true;
pool.execute(new QueryingAction(qs));
for(;;){
MyObject p = bq.poll(timeout, TimeUnit.MINUTES);
if(p != null){
if (p.getId().equals("0"))
break;
pool.submit(new AnalysisAction(ds, p, analyzedObjects, qs.getKnownAssocs()));
}else
drc.log("Timed out while waiting for an object...");
}
} catch (Exception ex) {
ex.printStackTrace();
String exit_msg = "Unexpected error in core analysis, terminating execution!";
}finally{
drc.log("--DEBUG: Termination criteria found, shutdown initiated..");
drc.getMemoryInfo(true); // dump meminfo to log
pool.shutdown();
int mins = 2;
int nCores = poolSize;
long totalTasks = pool.getTaskCount(),
compTasks = pool.getCompletedTaskCount(),
tasksRemaining = totalTasks - compTasks,
timeout = mins * tasksRemaining / nCores;
drc.log("--DEBUG: Shutdown commenced, thread pool will terminate once all objects are processed, " +
"or will timeout in : " + timeout + " minutes... \n" + compTasks + " of " + (totalTasks -1) +
" objects have been analyzed so far, " + "mean process time is: " +
drc.getMeanProcTimeAsString() + " milliseconds.");
pool.awaitTermination(timeout, TimeUnit.MINUTES);
}
}
The class QueryingAction is a simple Runnable that calls the data acquisition method in the designated QueryService object which then populates a BlockingQueue. The AnalysisAction class does all the number-crunching for a single instance of MyObject.
I suspect the producer thread is not getting/loading the source data fast enough. This might not be a lack of CPU but an IO related issue. (not sure why you have time outs on your BlockingQueue)
It might be worth having a thread which periodically logs things like the number of tasks added and the length of the queue (e.g. every 5-15 seconds)
So, if I correctly understand your problem, you have one thread to fetch data, and several threads to analyse the fetched data. Your problem is that the threads are not correctly synchronized to run together and take full advantage of the processor.
You have a tipical producer-consumer problem with a single producer and several consumers.
I advise you to remake your code a bit to have, instead, several independent consumer threads that are always waiting for resources to be available and only then running. This way you guarantee the maximum processor use.
Consumer thread:
while (!terminate)
{
synchronized (Producer.getLockObject())
{
try
{
//sleep (no processing at all)
Producer.getLockObject().wait();
}
catch (Exceptions..)
}
MyObject p = Producer.getObjectFromQueue(); //this function should be synchronized
//Analyse fetched data, and submit it to somewhere...
}
Producer thread:
while (!terminate)
{
MyObject newData = fetchData(); //fetch data from remote location
addDataToQueueu(newData); //this should also be synchronized
synchronized (getLockObject())
{
//wake up one thread to deal with the data
getLockObject().notify();
}
}
You see that this way, your threads are always performing useful work or sleeping.
This is just draft code to exemplify.
See more explanation here: http://www.javamex.com/tutorials/wait_notify_how_to.shtml
and here: http://www.java-samples.com/showtutorial.php?tutorialid=306
Priority won't help, since the problem is not an issue of deciding who gets precious resources -- resource usage isn't maxed. The only way the producer thread would not be getting enough CPU time is if it wasn't ready-to-run. Priority won't help, since the problem is not an issue.
How many cores does the machine have? It's possible that the producer thread is running full speed and there still just isn't enough CPU to go around. It's also possible the producer is I/O bound.
You can try to separate the producer thread from the pool (i.e. create a distinct Thread and set the pool to have -1 the current capacity) and then set its priority to maximum via setPriority. See what happens, although priority rarely accounts for such a difference in performance.
When you say URL connection, do you mean local or remote? It could be that network speed is slowing your producer down
So after weeks of fiddling, wrestling in code and other types of suffering I think I had a breakthrough, "a moment of clarity" if you will...
I managed to show that the program can exhibits the same slow behavior on my Linux machine and can indeed run full throttle on the problematic Win-7 machine. The crux of the problem appears to be some sort of corruption of the system/cache files that are used to store the results of previous queries, and overall, speed up the analysis. You got to love the irony, in this case they appeared to be the reason for EXTREME slow analysis. In retrospect, I should have known (a la Occam's razor)...
I am still not sure what how the corruption occurs, but at least it's probably not related to different OS. Using the system files from my machine increases the output on the Win7 host up to about 40% only however. Profiling the process more has also revealed that, oddly enough, there is significantly more GC activity on Win7, which apparently took lots of CPU time from number crunching. Giving -Xmx2g takes care of excessive garbage collection and the CPU usage for the process shoots up to 95-96%, and threads run smoothly.
Now that my original question is answered, I have to say that overall java responsiveness is definitely better on Linux environment, even without allocating more heap memory, I can easily multi-task while I am running an extensive analysis in the background. Things are not as smooth in Win-7, e.x. resizing the GUI is significantly slow once the analysis takes off at full speed.
Thanks for all the replies, I am sorry for the partially misleading problem description. I merely shared what I found out while debugging to the best of my abilities. Anyways, I believe the bounty goes to Peter Lawrey, since he early on pointed to an I/O issue and it was his suggestion about a logger thread which eventually led me to the answer.
I would think it was some OS specific issue because that is the core difference between the two units. More specifically, something is slowing down the data arriving through the remote connection.
Find some traffic analysis tool such as Wireshark and/or Networx and try to discover if there is anything throttling the Win PC. Perhaps it is going through a proxy that has some kind of rate cap configured.
Sorry not really an answer but did not fit inside comment and still it is worth the read I think:
well i am not JAVA friendly
but i have recently the same problem with C++ projects for machine control through USB.
On XP or W2K all goes perfectly for months of 24/7 operation on any 2 or more core machine
On W7 and strong enough machine all goes OK but sometimes (cca 1x per few hours) freezes for few seconds without obvious reason.
On W7 and relatively weak machine (2 core 1.66GHz T2300E notebook) the threads are freezing for some time and run again which under/overflows USB/WIN/App FIFOs and collapse communication ...
it appears that nothing is blocked but the W7 sheduler just do not give CPU to the right threads occasionally.
i thought that USB driver (JUNGO) communication freezes bud that is not true I measured it and it is OK even in freeze
the freeze was about 6-15 seconds cca once per minute.
after adding some safety sleeps to threads loops the freeze has shorten to about 0.5 sec
but still there
even if App do not Under/Overflows FIFOs the windows USB driver side do (few times per minute for few ms)
Change of exe/threads priority and class do not affect performance on W7 (on XP,W2K work as it should)
As you can see it seems we have most likely the same problem. In my case:
is not I/O related (when i replace USB thread with simulation of device it behaves similar)
adding Sleep to time critical code helps a lot
error is present also in low count of threads [2 fast (17ms) + 1 slow (250ms) + App code = 4]
my CPU consumption on W7 slow machine is also not 100% but about 95% which is OK because I have sleeps everywhere
my Apps use about 40-100MB of memory but are CPU computation demanding ...
but not that much it could run safely on much slower machines
but because of USB driver connection and multiple device support it need at least 2 cores
my next step is to add some kind of execution time logging/analyze to see what is happening in more detail
and also little rewrite of send/receive threads to see if it helps
When i learn something new/useful will add it.

Scheduling tasks, making sure task is ever being executed

I have an application that checks a resource on the internet for new mails. If there is are new mails it does some processing on them. This means that depending on the amount of mails it might take just a few seconds to hours of processing.
Now the object/program that does the processing is already a singleton. So right now I already took care of there really only being 1 instance that's handling the checking and processing.
However I only have it running once now and I'd like to have it continuously running, checking for new mails more or less every 10 minutes or so to handle them in a timely manner.
I understand I can take care of this with Timer/Timertask or even better I found a resource here: http://www.ibm.com/developerworks/java/library/j-schedule/index.html that uses Scheduler/SchedulerTask. But what I am afraid of.. is if I set it to run every 10 minutes and a previous session is already processing data it will put the new task in a stack waiting to be executed once the previous one is done. So what I'm afraid of is for instance the first run running for 5 hours and then, because it was busy all the time, after that it will launch 5*6-1=29 runs immediately after each other checking for mails and/do some processing without giving the server a break.
Does anyone know how I can solve this?
P.S. the way I have my application set up right now is I'm using a Java Servlet on my tomcat server that's launched upon server start where it creates a Singleton instance of my main program, then calls some method to do the fetching/processing. And what I want is to repeat that fetching/processing every "x" amount of time (10 minutes or so), making sure that really only 1 instance is doing this and that really after each run 10 minutes or so are given to rest.
Actually, Timer + TimerTask can deal with this pretty cleanly. If you schedule something with Timer.scheduleAtFixedRate() You will notice that the docs say that it will attempt to "make up" late events to maintain the long-term period of execution. However, this can be overcome by using TimerTask.scheduledExecutionTime(). The example therein lets you figure out if the task is too tardy to run, and you can just return instead of doing anything. This will, in effect, "clear the queue" of TimerTask.
Of note: TimerTask uses a single thread to execute, so it won't spawn two copies of your task side-by-side.
On the side note part, you don't have to process all 10k emails in the queue in a single run. I would suggest processing for a fixed amount of time using TimerTask.scheduledExecutionTime() to figure out how long you have, then returning. That keeps your process more limber, cleans up the stack between runs, and if you are doing aggregates, ensures that you don't have to rebuild too much data if, for example, the server is restarted in the middle of the task. But this recommendation is based on generalities, since I don't know what you're doing in the task :)

Categories