I am working on a project that is both memory and computationally intensive. A significant portion of the execution utilizes multi-threading by a FixedThreadPool. In short; I have 1 thread for fetching data from several remote locations (using URL connections) and populating a BlockingQueue with objects to be analyzed and n threads that pick these objects and run the analysis. edit: see code below
Now this setup works like a charm on my Linux machine running OpenSUSE 11.3, but a colleague is testing it on a very similar machine running Win7 is getting custom notifications of timeouts on the queue polling (see code below), lots of them actually. I have been trying to monitor the processor use on her machine, and it appears that the software does not get any more than 15% of the CPUs while on my machine the processor usage hits the roof, just as I intended.
My question is, then, can this be a sign of "starvation" of the queue? Could it be so that the producer thread is not getting enough cpu time? If so how do I go about giving one particular thread in the pool higher priority?
UPDATE:
I have been trying to pinpoint the problem, with no joy... I did however gain some new insights.
Profiling the execution of the code with JVisualVM demonstrates a very peculiar behavior. The methods are called in short bursts of CPU-time with several seconds of no progress in between. This to me means that somehow the OS is hitting the brakes on the process.
Disabling the anti-virus and back-up daemons do not have any significant affect on the matter
Changing the priority of java.exe (the only instance) through task manager (adviced here) does not change anything either. (That being said, I could not give "realtime" priority to java, and had to be content with "high" prio)
Profiling the network usage shows good flow of data in and out, so I am guessing that is not the bottleneck (while it is a considerable part of the execution time of the process, but that I know already and is pretty much the same percentage as what I get on my Linux machine).
Any ideas as to how the Win7 OS might be limiting the cpu time to my project? if it's not the OS, what could be the limiting factor? I would like to stress yet again that the machine is NOT running any other computation intensive at the same time and there is almost no load on the cpus other than my software. This is driving me crazy...
EDIT: relevant code
public ConcurrencyService(Dataset d, QueryService qserv, Set<MyObject> s){
timeout = 3;
this.qs = qserv;
this.bq = qs.getQueue();
this.ds = d;
this.analyzedObjects = s;
this.drc = DebugRoutineContainer.getInstance();
this.started = false;
int nbrOfProcs = Runtime.getRuntime().availableProcessors();
poolSize = nbrOfProcs;
pool = (ThreadPoolExecutor) Executors.newFixedThreadPool(poolSize);
drc.setScoreLogStream(new PrintStream(qs.getScoreLogFile()));
}
public void serve() throws InterruptedException {
try {
this.ds.initDataset();
this.started = true;
pool.execute(new QueryingAction(qs));
for(;;){
MyObject p = bq.poll(timeout, TimeUnit.MINUTES);
if(p != null){
if (p.getId().equals("0"))
break;
pool.submit(new AnalysisAction(ds, p, analyzedObjects, qs.getKnownAssocs()));
}else
drc.log("Timed out while waiting for an object...");
}
} catch (Exception ex) {
ex.printStackTrace();
String exit_msg = "Unexpected error in core analysis, terminating execution!";
}finally{
drc.log("--DEBUG: Termination criteria found, shutdown initiated..");
drc.getMemoryInfo(true); // dump meminfo to log
pool.shutdown();
int mins = 2;
int nCores = poolSize;
long totalTasks = pool.getTaskCount(),
compTasks = pool.getCompletedTaskCount(),
tasksRemaining = totalTasks - compTasks,
timeout = mins * tasksRemaining / nCores;
drc.log("--DEBUG: Shutdown commenced, thread pool will terminate once all objects are processed, " +
"or will timeout in : " + timeout + " minutes... \n" + compTasks + " of " + (totalTasks -1) +
" objects have been analyzed so far, " + "mean process time is: " +
drc.getMeanProcTimeAsString() + " milliseconds.");
pool.awaitTermination(timeout, TimeUnit.MINUTES);
}
}
The class QueryingAction is a simple Runnable that calls the data acquisition method in the designated QueryService object which then populates a BlockingQueue. The AnalysisAction class does all the number-crunching for a single instance of MyObject.
I suspect the producer thread is not getting/loading the source data fast enough. This might not be a lack of CPU but an IO related issue. (not sure why you have time outs on your BlockingQueue)
It might be worth having a thread which periodically logs things like the number of tasks added and the length of the queue (e.g. every 5-15 seconds)
So, if I correctly understand your problem, you have one thread to fetch data, and several threads to analyse the fetched data. Your problem is that the threads are not correctly synchronized to run together and take full advantage of the processor.
You have a tipical producer-consumer problem with a single producer and several consumers.
I advise you to remake your code a bit to have, instead, several independent consumer threads that are always waiting for resources to be available and only then running. This way you guarantee the maximum processor use.
Consumer thread:
while (!terminate)
{
synchronized (Producer.getLockObject())
{
try
{
//sleep (no processing at all)
Producer.getLockObject().wait();
}
catch (Exceptions..)
}
MyObject p = Producer.getObjectFromQueue(); //this function should be synchronized
//Analyse fetched data, and submit it to somewhere...
}
Producer thread:
while (!terminate)
{
MyObject newData = fetchData(); //fetch data from remote location
addDataToQueueu(newData); //this should also be synchronized
synchronized (getLockObject())
{
//wake up one thread to deal with the data
getLockObject().notify();
}
}
You see that this way, your threads are always performing useful work or sleeping.
This is just draft code to exemplify.
See more explanation here: http://www.javamex.com/tutorials/wait_notify_how_to.shtml
and here: http://www.java-samples.com/showtutorial.php?tutorialid=306
Priority won't help, since the problem is not an issue of deciding who gets precious resources -- resource usage isn't maxed. The only way the producer thread would not be getting enough CPU time is if it wasn't ready-to-run. Priority won't help, since the problem is not an issue.
How many cores does the machine have? It's possible that the producer thread is running full speed and there still just isn't enough CPU to go around. It's also possible the producer is I/O bound.
You can try to separate the producer thread from the pool (i.e. create a distinct Thread and set the pool to have -1 the current capacity) and then set its priority to maximum via setPriority. See what happens, although priority rarely accounts for such a difference in performance.
When you say URL connection, do you mean local or remote? It could be that network speed is slowing your producer down
So after weeks of fiddling, wrestling in code and other types of suffering I think I had a breakthrough, "a moment of clarity" if you will...
I managed to show that the program can exhibits the same slow behavior on my Linux machine and can indeed run full throttle on the problematic Win-7 machine. The crux of the problem appears to be some sort of corruption of the system/cache files that are used to store the results of previous queries, and overall, speed up the analysis. You got to love the irony, in this case they appeared to be the reason for EXTREME slow analysis. In retrospect, I should have known (a la Occam's razor)...
I am still not sure what how the corruption occurs, but at least it's probably not related to different OS. Using the system files from my machine increases the output on the Win7 host up to about 40% only however. Profiling the process more has also revealed that, oddly enough, there is significantly more GC activity on Win7, which apparently took lots of CPU time from number crunching. Giving -Xmx2g takes care of excessive garbage collection and the CPU usage for the process shoots up to 95-96%, and threads run smoothly.
Now that my original question is answered, I have to say that overall java responsiveness is definitely better on Linux environment, even without allocating more heap memory, I can easily multi-task while I am running an extensive analysis in the background. Things are not as smooth in Win-7, e.x. resizing the GUI is significantly slow once the analysis takes off at full speed.
Thanks for all the replies, I am sorry for the partially misleading problem description. I merely shared what I found out while debugging to the best of my abilities. Anyways, I believe the bounty goes to Peter Lawrey, since he early on pointed to an I/O issue and it was his suggestion about a logger thread which eventually led me to the answer.
I would think it was some OS specific issue because that is the core difference between the two units. More specifically, something is slowing down the data arriving through the remote connection.
Find some traffic analysis tool such as Wireshark and/or Networx and try to discover if there is anything throttling the Win PC. Perhaps it is going through a proxy that has some kind of rate cap configured.
Sorry not really an answer but did not fit inside comment and still it is worth the read I think:
well i am not JAVA friendly
but i have recently the same problem with C++ projects for machine control through USB.
On XP or W2K all goes perfectly for months of 24/7 operation on any 2 or more core machine
On W7 and strong enough machine all goes OK but sometimes (cca 1x per few hours) freezes for few seconds without obvious reason.
On W7 and relatively weak machine (2 core 1.66GHz T2300E notebook) the threads are freezing for some time and run again which under/overflows USB/WIN/App FIFOs and collapse communication ...
it appears that nothing is blocked but the W7 sheduler just do not give CPU to the right threads occasionally.
i thought that USB driver (JUNGO) communication freezes bud that is not true I measured it and it is OK even in freeze
the freeze was about 6-15 seconds cca once per minute.
after adding some safety sleeps to threads loops the freeze has shorten to about 0.5 sec
but still there
even if App do not Under/Overflows FIFOs the windows USB driver side do (few times per minute for few ms)
Change of exe/threads priority and class do not affect performance on W7 (on XP,W2K work as it should)
As you can see it seems we have most likely the same problem. In my case:
is not I/O related (when i replace USB thread with simulation of device it behaves similar)
adding Sleep to time critical code helps a lot
error is present also in low count of threads [2 fast (17ms) + 1 slow (250ms) + App code = 4]
my CPU consumption on W7 slow machine is also not 100% but about 95% which is OK because I have sleeps everywhere
my Apps use about 40-100MB of memory but are CPU computation demanding ...
but not that much it could run safely on much slower machines
but because of USB driver connection and multiple device support it need at least 2 cores
my next step is to add some kind of execution time logging/analyze to see what is happening in more detail
and also little rewrite of send/receive threads to see if it helps
When i learn something new/useful will add it.
Related
I tested this with 2.14.0 and 2.13.3
I used the JDBC Appender in combination with the DynamicThresholdFilter and tried out a normal Logger
and also the AsyncLogger.
In the JDBC Appender i also tried out the PoolingDriver and the ConnectionFactory approach.
It turns out that the Threads are not started parallel because of Log4j2.
Using the AsyncLogger even made it worse since the Output said that the Appender is not started and of 15.000 expected logs only 13.517 are in the DB.
To reproduce the issue i made a github repo see here: https://github.com/stefanwendelmann/Log4j_JDBC_Test
EDIT
I replaced the mssql-jdbc with a h2db and the threads dont block.
JMC auto analysis say that there are locking instances of JdbcDatabaseManager.
Is there any configuration problem in my PoolableConnectionFactory for mssql-jdbc or is there a general problem with dbcp / jdbc driver pooling?
Edit 2
Created Ticket on Apaches LOGJ2 Jira: https://issues.apache.org/jira/browse/LOG4J2-3022
Edit 3
Added a longer flight recording for mssql and h2:file
https://github.com/stefanwendelmann/Log4j_JDBC_Test/blob/main/recording_local_docker_mssql_asynclogger_10000_runs.jfr
https://github.com/stefanwendelmann/Log4j_JDBC_Test/blob/main/recording_local_h2_file_asynclogger_10000_runs.jfr
Thanks for getting the flight recordings up. This is a pretty interesting scenario, but I'm afraid I can't give conclusive answers, mostly because for some reason
The information in your flight recordings is weirdly incomplete. I'll explain a little more shortly
There seems to be other things going on in your system that may be muddying the diagnosis. You might benefit from killing any other running process in your machine.
So, what now/(TL;DR)
You need to be sure that your connection source to the database is pooled
Make sure you start your load test on a calm, clear-headed CPU
Configure your next flight recording to take sufficient, intermittent thread dumps. This is probably the most important next step, if you're interested in figuring out what exactly all these threads are waiting for. Don't post up another flight recording until you're positive it contains multiple thread dumps that feature all the live threads in your JVM.
Maybe 10k threads isn't reasonable for your local machine
I also noticed from the flight recording that you have a heap size maxed at 7GB. If you're not on a 64bit OS, that could actually be harmful. A 32-bit OS can address a max of 4GB.
Make sure there aren't any actual database failures causing the whole thing to thrash. Are you running out of connections? Are there any SQLExceptions blowing up somewhere? Any exceptions at all?
Here's what I could tell from your recordings:
CPU
Both flight recordings show that your CPU was struggling for good chunks of both of your recordings:
The MSSQL recording (46 mins total)
JFR even warns in the MSSQL recording that:
An average CPU load of 42 % was caused by other processes during 1 min 17 s starting at 2/18/21 7:28:58 AM.
The H2 recording (20.3s total)
I noticed that your flight recordings are titled XXXX_10000. If this means "10k concurrent requests", it may simply mean that your machine simply can't deal with the load you're putting on it. You may also benefit from first ensuring that your cores don't have a bunch of other things hogging their time before you kick off another test. At any rate, hitting 100% CPU utilization is bound to cause lock contention as a matter of course, due to context switching. Your flight recording shows that you're running on an 8-core machine; but you noted that you're running a dockerized MSSQL. How many cores did you allocate to Docker?
Blocked Threads
There's a tonne of blocking in your setup, and there are smoking guns everywhere. The thread identified by Log4j2-TF-1-AsyncLoggerConfig-1 was blocked a lot by the garbage collector, just as the CPU was thrashing:
The H2 flight recording:
All but the last 3 ticks across that graph were blockings of the log4j2 thread. There was still significant blocking of the other pooled threads by GC (more on that further down)
The MSSQL flight recording had smoother GC, but both flight recordings featured blocking by GC and the consequent super high CPU utilization. One thing was clear from the MSSQL and H2 recording: every other pooled thread was blocked, waiting for a lock on the same object ID
For MSSQL, lock ID: 0x1304EA4F#40; for H2, lock ID: 0x21A1100D7D0
Every thread except the main thread and pool-1-thread-1 (which was blocked by garbage collection) exhibits this behavior.
These 7 threads are all waiting for the same object. There is definitely some blocking or even a deadlock somewhere in your setup.
The small specks of green also corroborate the intermittent transfer of monitor locks between the various threads, confirming that they're sort of deadlocked. The pane that shows the threads at the bottom gives a timeline of each thread's blockage. Red indicates blocked; green indicates running. If you hover over each thread's red portion, it shows you that
The thread is blocked, waiting to acquire a lock (RED)
The ID of the lock that the thread is trying to acquire and is currently unable
The ID of the thread that last held the lock
Green indicates a running, unblocked thread.
When you hover over the red slices in your flight recording, you'll see that they're all waiting to acquire the same lock. That lock is intermittently held between the various pooled threads.
MSSQL (threads blocked waiting for 0x1304EA4F40):
H2 (threads blocked waiting for 0x21A1100D7D0):
In both flight recordings, pool-1-thread-1 is the sole thread that isn't blocked while trying to acquire a lock. That blank row for pool-1-thread-1 is solely due to garbage collection, which I covered earlier.
Dumps
Ideally, your flight recordings should contain a bunch of thread dumps, especially the one that you ran for over 40 mins; never mind the 20s one. Unfortunately, both recordings contain just 2 recordings each; only one of each of them even contains the stacktrace for pool-1-thread-1. Singular thread dumps are worthless. You'll need multiple snapshots over a length of time to make use of them. With a thread dump (or a heap dump), one could identify what objects the IDs 0x1304EA4F40 and 0x21A1100D7D0 refer to. The most I could figure out from the dumps is that they're all waiting for an instance of "Object":
It literally could be anything. Your very first flight recording at least showed that the threads were locked on org.apache.logging.log4j.core.appender.db.jdbc.JdbcDatabaseManager:
That very first recording shows the same pattern in the locks pane, that all the threads were waiting for that single object:
That first recording also shows us what pool-1-thread-1 was up to at that one instant:
From there, I would hazard a guess that that thread was in the middle of closing a database connection? Nothing conclusive can be said until multiple successive thread dumps show the thread activity over a span of time.
I tested on MySQL db and I found lock on following method:
org.apache.logging.log4j.core.appender.db.AbstractDatabaseManager.write(org.apache.logging.log4j.core.LogEvent, java.io.Serializable) (line: 261)
because in the source code you can see synchronization on write method:
/**
* This method manages buffering and writing of events.
*
* #param event The event to write to the database.
* #param serializable Serializable event
*/
public final synchronized void write(final LogEvent event, final Serializable serializable) {
if (isBuffered()) {
buffer(event);
} else {
writeThrough(event, serializable);
}
}
I think if you specify buffer size it will increase throughput, because logs will be collected into batches and synchronization will be pretty low.
After updating log4j2 config file on using AsyncLogger you will see lock on:
org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.enqueue(org.apache.logging.log4j.core.LogEvent, org.apache.logging.log4j.core.async.AsyncLoggerConfig) (line: 375)
and implementation of that method:
private void enqueue(final LogEvent logEvent, final AsyncLoggerConfig asyncLoggerConfig) {
if (synchronizeEnqueueWhenQueueFull()) {
synchronized (queueFullEnqueueLock) {
disruptor.getRingBuffer().publishEvent(translator, logEvent, asyncLoggerConfig);
}
} else {
disruptor.getRingBuffer().publishEvent(translator, logEvent, asyncLoggerConfig);
}
}
synchronizeEnqueueWhenQueueFull is true by default, and it produces locks on threads, you can manage these parameters:
/**
* LOG4J2-2606: Users encountered excessive CPU utilization with Disruptor v3.4.2 when the application
* was logging more than the underlying appender could keep up with and the ringbuffer became full,
* especially when the number of application threads vastly outnumbered the number of cores.
* CPU utilization is significantly reduced by restricting access to the enqueue operation.
*/
static final boolean ASYNC_LOGGER_SYNCHRONIZE_ENQUEUE_WHEN_QUEUE_FULL = PropertiesUtil.getProperties()
.getBooleanProperty("AsyncLogger.SynchronizeEnqueueWhenQueueFull", true);
static final boolean ASYNC_CONFIG_SYNCHRONIZE_ENQUEUE_WHEN_QUEUE_FULL = PropertiesUtil.getProperties()
.getBooleanProperty("AsyncLoggerConfig.SynchronizeEnqueueWhenQueueFull", true);
But you should know about side effect of using these parameters, as mentioned in code snippet.
Ideas why can a DB become a bottleneck?:
remoteness db(vpn and etc.)
check what strategy is used for id column (SEQUENCE, TABLE, IDENTITY) to avoid additional db call
are there indexes on columns? (it can produce reindex operation on each transaction commit)
I'm running a Java 7 Dropwizard app on a CentOS 6.4 server that basically serves as a layer on top of a data store (Cassandra) and does some additional processing. It also has an interface to Zookeeper using the Curator framework for some other stuff. This all works well and good most of the time, CPU and RAM load is never above 50% and usually about 10% and our response times are good.
My problem is that recently we've discovered that occasionally we get blips of about 1-2 seconds where seemingly all tasks scheduled via thread pools get delayed. We noticed this because of connection timeouts to Cassandra and session timeouts with Zookeeper. What we've done to narrow it down:
Used Wireshark and Boundary to make sure all network activity from our app was getting stalled, not just a single component. All network activity was stalling at the same time.
Wrote a quick little Python script to send timestamp strings to netcat on one of the servers we were seeing timeouts connecting to to make sure it's not an overall network issue between the boxes. We saw all timestamps come through smoothly during periods where our app had timeouts.
Disabled hyperthreading on the server.
Checked garbage collection timing logs for the timeout periods. They were consistent and well under 1ms through the timeout periods.
Checked our CPU and RAM resources during the timeout periods. Again, consistent, and well under significant load.
Added an additional Dropwizard resource to our app for diagnostics that would send timestamp strings to netcat on another server, just like the Python script. In this case, we did see delays in the timestamps when we saw timeouts in our app. With half-second pings, we would generally see a whole second missing entirely, and then four pings in the next second, the extra two being the delayed pings from the previous second.
To remove the network from the equation, we changed the above to just write to the console and a local file instead of to the network. We saw the same results (delayed pings) with both of those.
Profiled and checked our thread pool settings to see if we were using too many OS threads. /proc/sys/kernel/threads-max is 190115 and we never get above 1000.
Code for #7 (#6 is identical except for using a Socket and PrintWriter in place of the FileWriter):
public void start() throws IOException {
fileWriter = new FileWriter(this.fileName, false);
executor = Executors.newSingleThreadScheduledExecutor();
executor.scheduleAtFixedRate(this, 0, this.delayMillis, TimeUnit.MILLISECONDS);
}
#Override
public synchronized void run() {
try {
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
Date now = new Date();
String debugString = "ExecutorService test " + this.content + " : " + sdf.format(now) + "\n";
fileWriter.write(debugString);
fileWriter.flush();
} catch (Exception e) {
logger.error("Error running ExecutorService test: " + e.toString());
}
}
So it seems like the Executor is scheduling the tasks to be run, but they're being delayed in starting (because the timestamps are delayed and there's no way the first two lines of the try block in the run method are delaying the task execution). Any ideas on what might cause this or other things we can try? Hopefully we won't get to the point where we start reverting the code until we find what change caused it...
TL;DR: Scheduled tasks are being delayed and we don't know why.
UPDATE 1: We modified the executor task to push timestamps every half-second into a ring buffer instead of straight out to a file, and then dump the buffer every 20 seconds. This removes I/O as a possible cause of blocking task execution but still gives us the same info. From this, we still saw the same pattern of timestamps, from which it appears that the issue is not something in the task occasionally blocking the next execution of the task, but something in the task execution engine itself delaying execution for some reason.
When you use scheduleAtFixedRate, your expressing a desire that your task should be run as close to that rate as possible. The executor will do its best to keep to it, but sometimes it can't.
Your using Executors.newSingleThreadScheduledExecutor(), and so the executor only has a single thread to play with. If each execution of the task is taking longer than the period you specified in your schedule, then the executor won't be able to keep up, since the single thread may not have finished executing the previous run before the schedule kicked in the execute the next run. The result would manifest itself as delays in the schedule. This would seem a plausible explanation, since you say your real code is writing to a socket. That can easily block and send your timing off kilter.
You can find out if this is indeed the case by adding more logging at the end of the run method (i.e. after the flush). If the IO is taking too long, you'll see that in the logs.
As a fix, you could consider using scheduleWithFixedDelay instead, which will add a delay between each execution of the task, so long-running tasks don't run into each other. Failing that, then you need to ensure that the socket write completes on time, allowing each subsequent task execution to start on schedule.
The first step to diagnose a liveness issue is usually taking a thread dump when the system is stalled, and check what the threads were doing. In your case, the executor threads would be of particular interest. Are they processing, or are they waiting for work?
If they are all processing, the executor service has run out of worker threads, and can only schedule new tasks once a current task has been completed. This may be caused by tasks temporarily taking longer to complete. The stack traces of the worker threads may yield a clue just what is taking longer.
If many worker threads are idle, you have found a bug in the JDK. Congratulations!
I have a thread pool (executor) which I would like to monitor for excessive resource usage (time since cpu and memory seem to be quite harder). I would like to 'kill' threads that are running too long, like killing an OS process. The workers spend most time calculating, but significant time is also spent waiting for I/O, mostly database...
I have been reading up on stopping threads in java and how it is deprecated for resource cleanup reasons (not properly releasing locks, closing sockets and files and so on). The recommended way is to periodically check in a worker thread whether it should stop and then exit. This obviously expect that client threads be written in certain ways and that they are not blocked waiting on some external I/O. There is also ThreadDeth and InterruptedException which might be able to do the job, but they may actually be circumvented in improperly/malicously written worker threads, and also I got an impression (though no testing yet) they InterruptedException might not work properly in some (or even all) cases when the worker thread is waiting for I/O.
Another way to mitigate it would be to use multiple OS processes to isolate parts of the system, but it brings some unwanted increases in resource consumption.
That led me to that old story about isolates and/or MVM from more than five years ago, but nothing seems to have happened on that front, maybe in java 8 or 9...
So, actually, this all has made me to wander whether some poor mans simulation of processes could be achieved through using threads that would each have their own classloader? Could that be used to simulate processes if each thread (or group) would be loaded in its own classloader? I am not sure how much an increase in resource consumption would that bring (as there would not be much code sharing and code is not tiny). At least process copy-on-write semantics enable code sharing..
Any recommendations/ideas?
EDIT:
I am asking because of general interest and kind of disappointment that no solutions for this exist in the JVM to date (I mean shared application servers are not really possible - application domains, or something like that, in .NET seem to address exactly this kind of problem). I understand that killing a process does not guarantee reverting all system state to some initial condition, but at least all resorces like handles, memory and cpu are released. I was thinking of using classloaders since they might help with releasing locks held by the thread, which is one of the reasons that Thread.stop is deprecated. In my current situation the only other thing should be released (I can think of currently) is database connection, that could be handled separately/externally (by watchdog thread) if needed..
Though, really, in my case Thread.stop might actually be workable, I just dislike using deprecated methods..
Also I am considering this as a safety net for misbehaving processes, Ideally they should behave nicely, and are in a quite high degree under my control.
So, to clarify, I am asking how do for example java people on the server side handle runaway threads? I suspect by using many machines in the cluster to offset the problem and restarting misbehaving ones - when the application is stateless at least..
The difference between a thread and a process is that thread implicitly share memory and resources like sockets and files (making thread local memory a workaround). Processes implicitly have private memory and resources.
Killing the thread is not the problem. The problem is that a poorly behaving thread or even a reasonable behaviour thread can leave resources in an inconsistent state. Using a class loader will not help you track this, or solve the problem for you. For processes its easier to track what resources they are using, as most of the resources are isolated. Even for processes they can leave locks, temporary files and shared IPC resources in an incorrect state if killed.
The real solution is to write code which behaves properly so it can be managed and working around and trying to handle every possible poorly behaving code is next to impossible. If you have a bad third party library you have to use, you can try killing and cleaning it up and you can come up with an ok solution, but you can't expect it to be a clean one.
EDIT: Here is a simple program which will deadlock between two processes or machines because it has a bug in it. The way to stop deadlocks is to fix the code.
public static void main(String... args) throws IOException {
switch(args.length) {
case 1: {
// server
ServerSocket ss = new ServerSocket(Integer.parseInt(args[0]));
Socket s = ss.accept();
ObjectInputStream ois = new ObjectInputStream(s.getInputStream());
ObjectOutputStream oos = new ObjectOutputStream(s.getOutputStream());
// will deadlock before it gets here
break;
}
case 2: {
Socket s = new Socket(args[0], Integer.parseInt(args[1]));
ObjectInputStream ois = new ObjectInputStream(s.getInputStream());
ObjectOutputStream oos = new ObjectOutputStream(s.getOutputStream());
// will deadlock before it gets here
break;
}
default:
System.err.println("Must provide either a port as server or hostname port as client");
}
}
When programming animations and little games I've come to know the incredible importance of Thread.sleep(n); I rely on this method to tell the operating system when my application won't need any CPU, and using this making my program progress in a predictable speed.
My problem is that the JRE uses different methods of implementation of this functionality on different operating systems. On UNIX-based (or influenced) OS:es such as Ubuntu and OS X, the underlying JRE implementation uses a well-functioning and precise system for distributing CPU-time to different applications, and so making my 2D game smooth and lag-free. However, on Windows 7 and older Microsoft systems, the CPU-time distribution seems to work differently, and you usually get back your CPU-time after the given amount of sleep, varying with about 1-2 ms from target sleep. However, you get occasional bursts of extra 10-20 ms of sleep time. This causes my game to lag once every few seconds when this happens. I've noticed this problem exists on most Java games I've tried on Windows, Minecraft being a noticeable example.
Now, I've been looking around on the Internet to find a solution to this problem. I've seen a lot of people using only Thread.yield(); instead of Thread.sleep(n);, which works flawlessly at the cost of the currently used CPU core getting full load, no matter how much CPU your game actually needs. This is not ideal for playing your game on laptops or high energy consumption workstations, and it's an unnecessary trade-off on Macs and Linux systems.
Looking around further I found a commonly used method of correcting sleep time inconsistencies called "spin-sleep", where you only order sleep for 1 ms at a time and check for consistency using the System.nanoTime(); method, which is very accurate even on Microsoft systems. This helps for the normal 1-2 ms of sleep inconsistency, but it won't help against the occasional bursts of +10-20 ms of sleep inconsistency, since this often results in more time spent than one cycle of my loop should take all together.
After tons of looking I found this cryptic article of Andy Malakov, which was very helpful in improving my loop: http://andy-malakov.blogspot.com/2010/06/alternative-to-threadsleep.html
Based on his article I wrote this sleep method:
// Variables for calculating optimal sleep time. In nanoseconds (1s = 10^-9ms).
private long timeBefore = 0L;
private long timeSleepEnd, timeLeft;
// The estimated game update rate.
private double timeUpdateRate;
// The time one game loop cycle should take in order to reach the max FPS.
private long timeLoop;
private void sleep() throws InterruptedException {
// Skip first game loop cycle.
if (timeBefore != 0L) {
// Calculate optimal game loop sleep time.
timeLeft = timeLoop - (System.nanoTime() - timeBefore);
// If all necessary calculations took LESS time than given by the sleepTimeBuffer. Max update rate was reached.
if (timeLeft > 0 && isUpdateRateLimited) {
// Determine when to stop sleeping.
timeSleepEnd = System.nanoTime() + timeLeft;
// Sleep, yield or keep the thread busy until there is not time left to sleep.
do {
if (timeLeft > SLEEP_PRECISION) {
Thread.sleep(1); // Sleep for approximately 1 millisecond.
}
else if (timeLeft > SPIN_YIELD_PRECISION) {
Thread.yield(); // Yield the thread.
}
if (Thread.interrupted()) {
throw new InterruptedException();
}
timeLeft = timeSleepEnd - System.nanoTime();
}
while (timeLeft > 0);
}
// Save the calculated update rate.
timeUpdateRate = 1000000000D / (double) (System.nanoTime() - timeBefore);
}
// Starting point for time measurement.
timeBefore = System.nanoTime();
}
SLEEP_PRECISION I usually put to about 2 ms, and SPIN_YIELD_PRECISION to about 10 000 ns for best performance on my Windows 7 machine.
After tons of hard work, this is the absolute best I can come up with. So, since I still care about improving the accuracy of this sleep method, and I'm still not satisfied with the performance, I would like to appeal to all of you java game hackers and animators out there for suggestions on a better solution for the Windows platform. Could I use a platform-specific way on Windows to make it better? I don't care about having a little platform specific code in my applications, as long as the majority of the code is OS independent.
I would also like to know if there is anyone who knows about Microsoft and Oracle working out a better implementation of the Thread.sleep(n); method, or what's Oracle's future plans are on improving their environment as the basis of applications requiring high timing accuracy, such as music software and games?
Thank you all for reading my lengthy question/article. I hope some people might find my research helpful!
You could use a cyclic timer associated with a mutex. This is IHMO the most efficient way of doing what you want. But then you should think about skipping frames in case the computer lags (You can do it with another nonblocking mutex in the timer code.)
Edit: Some pseudo-code to clarify
Timer code:
While(true):
if acquireIfPossible(mutexSkipRender):
release(mutexSkipRender)
release(mutexRender)
Sleep code:
acquire(mutexSkipRender)
acquire(mutexRender)
release(mutexSkipRender)
Starting values:
mutexSkipRender = 1
mutexRender = 0
Edit: corrected initialization values.
The following code work pretty well on windows (loops at exactly 50fps with a precision to the millisecond)
import java.util.Date;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.Semaphore;
public class Main {
public static void main(String[] args) throws InterruptedException {
final Semaphore mutexRefresh = new Semaphore(0);
final Semaphore mutexRefreshing = new Semaphore(1);
int refresh = 0;
Timer timRefresh = new Timer();
timRefresh.scheduleAtFixedRate(new TimerTask() {
#Override
public void run() {
if(mutexRefreshing.tryAcquire()) {
mutexRefreshing.release();
mutexRefresh.release();
}
}
}, 0, 1000/50);
// The timer is started and configured for 50fps
Date startDate = new Date();
while(true) { // Refreshing loop
mutexRefresh.acquire();
mutexRefreshing.acquire();
// Refresh
refresh += 1;
if(refresh % 50 == 0) {
Date endDate = new Date();
System.out.println(String.valueOf(50.0*1000/(endDate.getTime() - startDate.getTime())) + " fps.");
startDate = new Date();
}
mutexRefreshing.release();
}
}
}
Your options are limited, and they depend on what exactly you want to do. Your code snippet mentions the max FPS, but the max FPS would require that you never sleep at all, so I'm not entirely sure what you intend with that. None of that sleep or yield checking is going to make any difference in most of the problem situations however - if some other app needs to run now and the OS doesn't want to switch back soon, it doesn't matter which one of those you call, you'll get control back when the OS decides to do so, which will almost certainly be more than 1ms in the future. However, the OS can certainly be coaxed into making switches more often - Win32 has the timeBeginPeriod call for precisely this purpose, which you may be able to use somehow. But there is a good reason for not switching too often - it's less efficient.
The best thing to do, although somewhat more complex, is usually to go for a game loop that doesn't require real-time updates, but instead performs logic updates at fixed intervals (eg. 20x a second) and renders whenever possible (perhaps with arbitrary short sleeps to free up CPU for other apps, if not running in full-screen). By buffering a past logic state as well as the current one you can interpolate between them to make the rendering appear as smooth as if you were doing logic updates each time. For more information on this approach, you can see the Fix Your Timestep article.
I would also like to know if there is anyone who knows about Microsoft and Oracle working out a better implementation of the Thread.sleep(n); method, or what's Oracle's future plans are on improving their environment as the basis of applications requiring high timing accuracy, such as music software and games?
No, this won't be happening. Remember, sleep is just a method saying how long you want your program to be asleep for. It is not a specification for when it will or should wake up, and never will be. By definition, any system with sleep and yield functionality is a multitasking system, where the requirements of other tasks have to be considered, and the operating system always gets the final call on the scheduling of this. The alternative wouldn't work reliably, because if a program could somehow demand to be reactivated at a precise time of its choosing it could starve other processes of CPU power. (eg. A program that spawned a background thread and had both threads performing 1ms of work and calling sleep(1) at the end could take turns to hog a CPU core.) Thus, for a user-space program, sleep (and functionality like it) will always be a lower bound, never an upper bound. To do better than that requires the OS itself to allow certain apps to pretty much own the scheduling, and this is not a desirable feature in operating systems for consumer hardware (while being a common and useful feature for industrial applications).
Thread.Sleep says you're app needs no more time. This means that in a worst case scenario you'll have to wait for an entire thread slice (40ms or so).
Now in bad cases when a driver or something takes up more time it could be you have to wait for 120ms (3*40ms) so Thread.Sleep is not the way to go. Go another way, like registering a 1ms callback and starting draw code very X callbacks.
(This is on windows, i'd use MultiMedia tools to get those 1ms resolution callbacks)
Timing stuff is notoriously bad on windows. This article is a good place to start. Not sure if you care, but also note that there can be worse problems (especially with System.nanoTime) on virtual systems as well (when windows is the guest operating system).
Thread.sleep is inaccurate and makes the animation jittery most of the time.
If you replace it completely with Thread.yield you'll get a solid FPS without lag or jitter, however the CPU usage increases greatly. I moved to Thread.yield a long time ago.
This problem has been discussed on Java Game Development forums for years.
When programming animations and little games I've come to know the incredible importance of Thread.sleep(n); I rely on this method to tell the operating system when my application won't need any CPU, and using this making my program progress in a predictable speed.
My problem is that the JRE uses different methods of implementation of this functionality on different operating systems. On UNIX-based (or influenced) OS:es such as Ubuntu and OS X, the underlying JRE implementation uses a well-functioning and precise system for distributing CPU-time to different applications, and so making my 2D game smooth and lag-free. However, on Windows 7 and older Microsoft systems, the CPU-time distribution seems to work differently, and you usually get back your CPU-time after the given amount of sleep, varying with about 1-2 ms from target sleep. However, you get occasional bursts of extra 10-20 ms of sleep time. This causes my game to lag once every few seconds when this happens. I've noticed this problem exists on most Java games I've tried on Windows, Minecraft being a noticeable example.
Now, I've been looking around on the Internet to find a solution to this problem. I've seen a lot of people using only Thread.yield(); instead of Thread.sleep(n);, which works flawlessly at the cost of the currently used CPU core getting full load, no matter how much CPU your game actually needs. This is not ideal for playing your game on laptops or high energy consumption workstations, and it's an unnecessary trade-off on Macs and Linux systems.
Looking around further I found a commonly used method of correcting sleep time inconsistencies called "spin-sleep", where you only order sleep for 1 ms at a time and check for consistency using the System.nanoTime(); method, which is very accurate even on Microsoft systems. This helps for the normal 1-2 ms of sleep inconsistency, but it won't help against the occasional bursts of +10-20 ms of sleep inconsistency, since this often results in more time spent than one cycle of my loop should take all together.
After tons of looking I found this cryptic article of Andy Malakov, which was very helpful in improving my loop: http://andy-malakov.blogspot.com/2010/06/alternative-to-threadsleep.html
Based on his article I wrote this sleep method:
// Variables for calculating optimal sleep time. In nanoseconds (1s = 10^-9ms).
private long timeBefore = 0L;
private long timeSleepEnd, timeLeft;
// The estimated game update rate.
private double timeUpdateRate;
// The time one game loop cycle should take in order to reach the max FPS.
private long timeLoop;
private void sleep() throws InterruptedException {
// Skip first game loop cycle.
if (timeBefore != 0L) {
// Calculate optimal game loop sleep time.
timeLeft = timeLoop - (System.nanoTime() - timeBefore);
// If all necessary calculations took LESS time than given by the sleepTimeBuffer. Max update rate was reached.
if (timeLeft > 0 && isUpdateRateLimited) {
// Determine when to stop sleeping.
timeSleepEnd = System.nanoTime() + timeLeft;
// Sleep, yield or keep the thread busy until there is not time left to sleep.
do {
if (timeLeft > SLEEP_PRECISION) {
Thread.sleep(1); // Sleep for approximately 1 millisecond.
}
else if (timeLeft > SPIN_YIELD_PRECISION) {
Thread.yield(); // Yield the thread.
}
if (Thread.interrupted()) {
throw new InterruptedException();
}
timeLeft = timeSleepEnd - System.nanoTime();
}
while (timeLeft > 0);
}
// Save the calculated update rate.
timeUpdateRate = 1000000000D / (double) (System.nanoTime() - timeBefore);
}
// Starting point for time measurement.
timeBefore = System.nanoTime();
}
SLEEP_PRECISION I usually put to about 2 ms, and SPIN_YIELD_PRECISION to about 10 000 ns for best performance on my Windows 7 machine.
After tons of hard work, this is the absolute best I can come up with. So, since I still care about improving the accuracy of this sleep method, and I'm still not satisfied with the performance, I would like to appeal to all of you java game hackers and animators out there for suggestions on a better solution for the Windows platform. Could I use a platform-specific way on Windows to make it better? I don't care about having a little platform specific code in my applications, as long as the majority of the code is OS independent.
I would also like to know if there is anyone who knows about Microsoft and Oracle working out a better implementation of the Thread.sleep(n); method, or what's Oracle's future plans are on improving their environment as the basis of applications requiring high timing accuracy, such as music software and games?
Thank you all for reading my lengthy question/article. I hope some people might find my research helpful!
You could use a cyclic timer associated with a mutex. This is IHMO the most efficient way of doing what you want. But then you should think about skipping frames in case the computer lags (You can do it with another nonblocking mutex in the timer code.)
Edit: Some pseudo-code to clarify
Timer code:
While(true):
if acquireIfPossible(mutexSkipRender):
release(mutexSkipRender)
release(mutexRender)
Sleep code:
acquire(mutexSkipRender)
acquire(mutexRender)
release(mutexSkipRender)
Starting values:
mutexSkipRender = 1
mutexRender = 0
Edit: corrected initialization values.
The following code work pretty well on windows (loops at exactly 50fps with a precision to the millisecond)
import java.util.Date;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.Semaphore;
public class Main {
public static void main(String[] args) throws InterruptedException {
final Semaphore mutexRefresh = new Semaphore(0);
final Semaphore mutexRefreshing = new Semaphore(1);
int refresh = 0;
Timer timRefresh = new Timer();
timRefresh.scheduleAtFixedRate(new TimerTask() {
#Override
public void run() {
if(mutexRefreshing.tryAcquire()) {
mutexRefreshing.release();
mutexRefresh.release();
}
}
}, 0, 1000/50);
// The timer is started and configured for 50fps
Date startDate = new Date();
while(true) { // Refreshing loop
mutexRefresh.acquire();
mutexRefreshing.acquire();
// Refresh
refresh += 1;
if(refresh % 50 == 0) {
Date endDate = new Date();
System.out.println(String.valueOf(50.0*1000/(endDate.getTime() - startDate.getTime())) + " fps.");
startDate = new Date();
}
mutexRefreshing.release();
}
}
}
Your options are limited, and they depend on what exactly you want to do. Your code snippet mentions the max FPS, but the max FPS would require that you never sleep at all, so I'm not entirely sure what you intend with that. None of that sleep or yield checking is going to make any difference in most of the problem situations however - if some other app needs to run now and the OS doesn't want to switch back soon, it doesn't matter which one of those you call, you'll get control back when the OS decides to do so, which will almost certainly be more than 1ms in the future. However, the OS can certainly be coaxed into making switches more often - Win32 has the timeBeginPeriod call for precisely this purpose, which you may be able to use somehow. But there is a good reason for not switching too often - it's less efficient.
The best thing to do, although somewhat more complex, is usually to go for a game loop that doesn't require real-time updates, but instead performs logic updates at fixed intervals (eg. 20x a second) and renders whenever possible (perhaps with arbitrary short sleeps to free up CPU for other apps, if not running in full-screen). By buffering a past logic state as well as the current one you can interpolate between them to make the rendering appear as smooth as if you were doing logic updates each time. For more information on this approach, you can see the Fix Your Timestep article.
I would also like to know if there is anyone who knows about Microsoft and Oracle working out a better implementation of the Thread.sleep(n); method, or what's Oracle's future plans are on improving their environment as the basis of applications requiring high timing accuracy, such as music software and games?
No, this won't be happening. Remember, sleep is just a method saying how long you want your program to be asleep for. It is not a specification for when it will or should wake up, and never will be. By definition, any system with sleep and yield functionality is a multitasking system, where the requirements of other tasks have to be considered, and the operating system always gets the final call on the scheduling of this. The alternative wouldn't work reliably, because if a program could somehow demand to be reactivated at a precise time of its choosing it could starve other processes of CPU power. (eg. A program that spawned a background thread and had both threads performing 1ms of work and calling sleep(1) at the end could take turns to hog a CPU core.) Thus, for a user-space program, sleep (and functionality like it) will always be a lower bound, never an upper bound. To do better than that requires the OS itself to allow certain apps to pretty much own the scheduling, and this is not a desirable feature in operating systems for consumer hardware (while being a common and useful feature for industrial applications).
Thread.Sleep says you're app needs no more time. This means that in a worst case scenario you'll have to wait for an entire thread slice (40ms or so).
Now in bad cases when a driver or something takes up more time it could be you have to wait for 120ms (3*40ms) so Thread.Sleep is not the way to go. Go another way, like registering a 1ms callback and starting draw code very X callbacks.
(This is on windows, i'd use MultiMedia tools to get those 1ms resolution callbacks)
Timing stuff is notoriously bad on windows. This article is a good place to start. Not sure if you care, but also note that there can be worse problems (especially with System.nanoTime) on virtual systems as well (when windows is the guest operating system).
Thread.sleep is inaccurate and makes the animation jittery most of the time.
If you replace it completely with Thread.yield you'll get a solid FPS without lag or jitter, however the CPU usage increases greatly. I moved to Thread.yield a long time ago.
This problem has been discussed on Java Game Development forums for years.