This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How should I unit test threaded code?
The classical unit testing is basically just putting x in and expecting y out, and automating that process. So it's good for testing anything that doesn't involve time. But then, most of the nontrivial bugs I've come across have had something to do with timing. Threads corrupt each others' data, or cause deadlocks. Nondeterministic behavior happens – in one run out of million. Hard stuff.
Is there anything useful out there for "unit testing" parts of multithreaded, concurrent systems? How do such tests work? Isn't it necessary to run the subject of such test for a long time and vary the environment in some clever manner, to become reasonably confident that it works correctly?
Most of the work I do these days involves multi-threaded and/or distributed systems. The majority of bugs involve "happens-before" type errors, where the developer assumes (wrongly) that event A will always happen before event B. But every 1000000th time the program is run, event B happens first, and this causes unpredictable behavior.
Additionally, there aren't really any good tools to detect timing issues, or even data corruption caused by race conditions. Tools like Helgrind and drd from the Valgrind toolkit work great for trivial programs, but they are not very useful in diagnosing large, complex systems. For one thing, they report false positives quite frequently (Helgrind especially). For another thing, it's difficult to actually detect certain errors while running under Helgrind/drd simply because programs running under Helgrind run almost 1000x slower, and you often need to run a program for quite a long time to even reproduce the race condition. Additionally, since running under Helgrind totally changes the timing of the program, it may become impossible to reproduce a certain timing issue. That's the problem with subtle timing issues; they're almost Heisenbergian in the sense that altering a program to detect timing issues may obscure the original issue.
The sad fact is, the human race still isn't adequately prepared to deal with complex, concurrent software. So unfortunately, there's no easy way to unit-test it. For distributed systems especially, you should plan your program carefully using Lamport's happens-before diagrams to help you identify the necessary order of events in your program. But ultimately, you can't really get away from brute-force unit testing with randomly varying inputs. It also helps to vary the frequency of thread context-switching during your unit-test by, e.g. running another background process which just takes up CPU cycles. Also, if you have access to a cluster, you can run multiple unit-tests in parallel, which can detect bugs much quicker and save you a lot of time.
If you can run your tests under Linux, valgrind includes a tool called helgrind which purports to detect race conditions and potential deadlocks in programs that use pthreads; you might get some benefit from running your multithreaded code under that, since it will report potential errors even if they didn't actually occur in that particular test run.
I have never heard of anything that can.
I guess if someone was to design one, it would have to have exact control over the execution of the threads and execute all possible combinations of stepping of the threads.
Sounds like a major task, not to mention the mathematical combinations for non-trivial sized threads when there are a handful or more of them...
Although, a quick search of stackoverflow... Unit testing a multithreaded application?
If the tested system is simple enough you could control the concurrency quite well by blocking operations in external mockup systems. This blocking can be done for example by waiting for some other operation to be started. If you can control all external calls this might work quite well by implementing different blocking sequences. I have tried this and it does reveal lock-level bugs quite well if you know possible problematic sequences well. And compared to many other concurrency testing it is quite deterministic. However this approach doesn't detect low level race conditions too well. I usually just go for load testing to find those, but I quess that isn't exactly unit testing.
I have seen these concurrency testing frameworks for .net, I'd assume its only matter of time before someone writes one for Java (hopefully).
And not to forget good old code reading. One of the best ways to find concurrency bugs is to just read through the code once again giving it your full concentration.
Perhaps the answer is that you shouldn't. In concurrent systems, there may not always be a single deterministic answer that is correct.
Take the example of people boarding a train and choosing a seat. You are going to end up with different results everytime.
Awaitility is a useful framework when you need to deal with asynchronicity in your tests. It allows you to wait until some state somewhere in your system is updated. For example:
await().untilCall( to(myService).myMethod(), equalTo(3) );
or
await().until( fieldIn(myObject).ofType(int.class), greaterThan(1));
It also has Scala and Groovy support.
Related
I wonder why I should use JMH for benchmarking if I can switch off JIT?
Is JMH not suppressing optimizations which can be prevented by disabling JIT?
TL;DR; Assess the Formula 1 performance by riding a bycicle at the same track.
The question is very odd, especially if you ask yourself a simple follow-up question. What would be the point of running the benchmark in the conditions that are drastically different from your production environment? In other words, how would a knowledge gained running in interpreted mode apply to real world?
The issue is not black and white here: you need optimizations to happen as they happen in the real world, and you need them broken in some carefully selected places to make a good experimental setup. That's what JMH is doing: it provides the means for constructing the experimental setups. JMH samples explain the intricacies and scenarios quite well.
And, well, benchmarking is not about fighting the compiler only. Lots and lots of non-compiler (and non-JVM!) issues need to be addressed. Of course, it can be done by hand (JMH is not magical, it's just a tool that was also written by humans), but you will spend most of your time budget addressing simple issues, while having no time left to address the really important ones, specific to your experiment.
The JIT is not bulletproof and almighty. For instance, it will not kick in before a certain piece of code is run a certain number of times, or it will not kick in if a piece of bytecode is too large/too deeply buried/etc. Also, consider live instrumentation which, more often than not, prevents the JIT from operating at all (think profiling).
Therefore the interest remains in being able to either turn it off or on; if a piece of code is indeed time critical, you may want to know how it will perform depending on whether the JIT kicks in.
But those situations are very rare, and are becoming more and more rare as the JVM (therefore the JIT) evolves.
I have a java web based application running in production. I need some way to be able to see which all parts of the code is being actually used, by the actions of the end user.
Just to clarify my requirement further.
I do not want to put a logging based solution. Any solution that needs me to put some logs and analyse the logs is not something that I am looking from.
I need some solution that works on similar lines like unit test coverage reporter. Like cobertura or emma reports, after running the unit tests, it shows me which all part of my code was fired up by the unit tests. I need something that will listen to JVM in production and tell me which all parts of my code is being fired up in production by the action of end user.
Why am I trying to do this?
I have a code that I have inherited. It is a big piece - some 25,000 classes. One of the bits that I need to do is to chop off parts of the application that is not being used too much. If I can show to management that there are parts of the application that are being scarcely used, I can chop off those parts from this product and effectively make this product a little more manageable (as in the manual regression test suite that needs to run every week or so and takes a couple of days, can be shortened).
Hope there is some ready solution to this.
As Joachim Sauer said in the comments below your question: the most straightforward approach is to just use a Code Coverage Tool that you'd use for unit testing and instrument the production code with it.
There's a major catch: overhead. Code Coverage analysis can really slow things down and while an informed user-base will tolerate some temporary performance degradation, the whole thing needs to remain useable.
From my experience JaCoCo is relatively light and doesn't impose much overhead, whereas Cobertura will impose a tremendous slowdown. On the other hand, JaCoCo merely flags "hit or no hit" whereas Cobertura gives you per-line hit counts. This means that JaCoCo will only let you find dead spots, whereas Cobertura will let you find rarely hit spots.
Whichever of these two tools you use (possibly one after the other), you may end up with giant class whitelists and class blacklists to restrict the coverage counting to places where it makes sense to do so, thereby keeping the performance overhead down. For example, if the entire thing has a single front controller Servlet, including that in the analysis will maximize the performance overhead while providing no information of value. This could turn into a lot of work and a lot of application deployments.
It may actually be quicker and less work to identify bottlenecks/gateways into specific subsystems and slap a counter on each of those (e.g. perf4j or even a full blown Nagios). Queries are another good place to slap a counter on. If you suspect some part of the application is rarely used, put a few counters there and see what happens.
I was wondering if there is any framework or application(app)/program out there that can analyze the concurrency of any java code?
If the tool knows all the implementations of the jre shipped classes and methods then it comes down to a simple analyzing of synchronized blocks and method and their call hierarchies. From here it can create a petri net and tell you for sure if you could ever experience a deadlock.
Am I missing out on something or is this really so easy? Then there must be some cool tool doing that kind of stuff? Or would such a tool report too many possible deadlocks that are completely save because of some underlying program/business logic? Petri nets should be powerful enough to handle these situations?
This would save so many man hours of searching for bugs that might or might not be related to dead locking issues.
Although (many) concurrency related bugs can be found using static code analysis, it doesn't apply to every type of bug. Some bugs only appear at runtime under certain conditions.
IBM has a tool called ConTest that "schedules the execution of program threads such that program scenarios that are likely to contain race conditions, deadlocks, and other intermittent bugs (collectively called synchronization problems) are forced to appear with high frequency".
This requires running (unit)tests against an instrumented version of your app. More background info in this developerWorks article.
This paper describes a tool that performs static analysis of a library and determines if deadlock is possible.
Some more :
klocwork
CheckThread
For university, I perform bytecode modifications and analyze their influence on performance of Java programs. Therefore, I need Java programs---in best case used in production---and appropriate benchmarks. For instance, I already got HyperSQL and measure its performance by the benchmark program PolePosition. The Java programs running on a JVM without JIT compiler. Thanks for your help!
P.S.: I cannot use programs to benchmark the performance of the JVM or of the Java language itself (such as Wide Finder).
Brent Boyer, wrote a nice article series for IBM developer works: Robust Java benchmarking, which is accompanied by a micro-benchmarking framework which is based on a sound statistical approach. Article and the Resources Page.
Since, you do that for university, you might be interested in Andy Georges, Dries Buytaert, Lieven Eeckhout: Statistically rigorous java performance evaluation in OOPSLA 2007.
Caliper is a tool provided by Google for micro-benchmarking. It will provide you with graphs and everything. The folks who put this tool together are very familiar with the principle of "Premature Optimization is the root of all evil," (to jwnting's point) and are very careful in explaining the role of benchmarking.
Any experienced programmer will tell you that premature optimisation is worse than no optimisation.
It's a waste of resources at best, and a source of infinite future (and current) problems at worst.
Without context, any application, even with benchmark logs, will tell you nothing.
I may have a loop in there that takes 10 hours to complete, the benchmark will show it taking almost forever, but I don't care because it's not performance critical.
Another loop takes only a millisecond but that may be too long because it causes me to fail to catch incoming data packets arriving at 100 microsecond intervals.
Two extremes, but both can happen (even in the same application), and you'd never know unless you knew that application, how it is used, what it does, under which conditions and requirements.
If a user interface takes 1/2 second to render it may be too long or no problem, what's the context? What are the user expectations?
I make a tool and provide an API for external world, but I am not sure whether it is thread safe. Because users may want t use it in multiple-thread environment. Is there any way or tool that I can use to verify whether my API is thread safe in Java?
No. There is no such tool. Proving that a complex program is thread safe is very hard.
You have to analyze your program very carefully to ensure that is thread safe. Consider buying "Java concurrency in practice" (very good explanation of concurrency in java).
Stress tests, or static analysis tools like PMD and FindBugs can uncover some concurrency bugs in your code. So these can show if your code is not thread-safe. However they can never prove if it is thread-safe.
The most effective method is a thorough code review by developer(s) experienced in concurrency.
You can always stress-test it with tools like jmeter.
But the main problem with threads is that they're mostly unpredictable, so even with stress-tests etc. you can't be 100% sure that it will be totally thread safe.
Resources :
Wikipedia - Thread-safety
This is a variant (or so called "reduction") of the Halting Problem. Therefore it is provably unsolvable. for all non-trivial cases. (Yes, that's an edit)
That means you can find errors by any usual means (statistics, logic) but you can never completely prove that there are none.
I suppose those people saying proving an arbitrary multithreaded program is thread-safe is impossible are, in a way, correct. An arbitrary multithreaded program, coded without following strict guidelines, simply will have threading bugs, and you can't validly prove something that isn't true.
The trick is not to write an arbitrary program, but one with threading logic simple enough to possibly be correct. This then can be unambiguously validated by a tool.
The best such tool I'm aware of is CheckThread. It works on the basis of either annotations, or xml config files. If you mark a method as '#ThreadSafe' and it isn't, then you get a compile-time error. This is checked by looking at the byte code for thread-unsafe operations, e.g. reads/write sequences on unsynchronised data fields.
It also handles those APIs that require methods to be called on specific threads, e.g. Swing.
It doesn't actually handle deadlocks, but those can be statically eliminated without even requiring annotation, by using a tool such as Jlint. You just need to follow some minimal standards like ensuring locks are acquired according to a DAG, not willy-nilly.
You cannot and never will be able to automatically proof that a program is threadsafe anymore that you can prove that a program is correct (unless you think you solved the halting program, which you didn't).
So, no, you cannot verify that an API is threadsafe.
However in quite some case you can prove that it is broken, which is great!
You may also be interested in automatic deadlock detection, which in quite some case simply "just work". I'm shipping a Java program on hundreds of desktops with such a deadlock detector installed and it is a wonderful tool. For example:
http://www.javaspecialists.eu/archive/Issue130.html
You can also stress test your application in various ways.
Bogus multi-threaded programs tend to not work very well when a high load is present on the system.
Here's a question I asked about how to create easily create a high CPU load on a Un*x system, for example:
Bash: easy way to put a configurable load on a system?