How to measure robustness?

How to measure robustness? - java

I am working on a thesis about meassuring quality of a product. The product in this case is a website. I have identified several quality attributes and meassurement techniques.
One quality attribute is "Robustness". I want to meassure that somehow, but I can't find any useful information how to do this in an objective manner.
Is there any static or dynamic metric that could meassure robustness? Ie, like unit test coverage, is there a way to meassure robustness like that? If so, is there any (free) tool that can do such a thing?
Does anyone have any experience with such tooling?
Last but not least, perhaps there are other ways to determine robustness, if you have any ideas about that I am all ears.
Thanks a lot in advance.

Well, the short answer is "no." Robust can mean a lot of things, but the best definition I can come up with is "performing correctly in every situation." If you send a bad HTTP header to a robust web server, it shouldn't crash. It should return exactly the right kind of error, and it should log the event somewhere, perhaps in a configurable way. If a robust web server runs for a very long time, its memory footprint should stay the same.
A lot of what makes a system robust is its handling of edge cases. Good unit tests are a part of that, but it's quite likely that there will not be unit tests for any of the problems that a system has (if those problems were known, the developers probably would have fixed them and only then added a test).
Unfortunately, it's nearly impossible to measure the robustness of an arbitrary program because in order to do that you need to know what that program is supposed to do. If you had a specification, you could write a huge number of tests and then run them against any client as a test. For example, look at the Acid2 browser test. It carefully measures how well any given web browser complies with a standard in an easy, repeatable fashion. That's about as close as you can get, and people have pointed out many flaws with such an approach (for instance, is a program that crashes more often but does one extra thing according to spec more robust?)
There are, though, various checks that you could use as a rough, numerical estimate of the health of a system. Unit test coverage is a pretty standard one, as are its siblings, branch coverage, function coverage, statement coverage, etc. Another good choice is "lint" programs like FindBugs. These can indicate the potential for problems. Open source projects are often judged by how frequently and recently commits are made or releases released. If a project has a bug system, you can measure how many bugs have been fixed and the percentage. If there's a specific instance of the program you're measuring, especially one with a lot of activity, MTBF (Mean Time Between Failures) is a good measure of robustness (See Philip's Answer)
These measurements, though, don't really tell you how robust a program is. They're merely ways to guess at it. If it were easy to figure out if a program was robust, we'd probably just make the compiler check for it.
Good luck with your thesis! I hope you come up with some cool new measurements!

You could look into mean time between failures as a robustness measure. The problem is that it is a theoretical quantity which is difficult to measure, particularly before you have deployed your product to a real-world situation with real-world loads. Part of the reason for this is that testing often does not cover real-world scalability issues.

In our Fuzzing book (by Takanen, DeMott, Miller) we have several chapters dedicated for metrics and coverage in negative testing (robustness, reliability, grammar testing, fuzzing, many names for the same thing). Also I tried to summarize most important aspects in our company whitepaper here:
http://www.codenomicon.com/products/coverage.shtml
Snippet from there:
Coverage can be seen as the sum of two features, precision and accuracy. Precision is concerned with protocol coverage. The precision of testing is determined by how well the tests cover the different protocol messages, message structures, tags and data definitions. Accuracy, on the other hand, measures how accurately the tests can find bugs within different protocol areas. Therefore, accuracy can be regarded as a form of anomaly coverage. However, precision and accuracy are fairly abstract terms, thus, we will need to look at more specific metrics for evaluating coverage.
The first coverage analysis aspect is related to the attack surface. Test requirement analysis always starts off by identifying the interfaces that need testing. The number of different interfaces and the protocols they implement in various layers set the requirements for the fuzzers. Each protocol, file format, or API might require its own type of fuzzer, depending on the security requirements.
Second coverage metric is related to the specification that a fuzzer supports. This type of metric is easy to use with model-based fuzzers, as the basis of the tool is formed by the specifications used to create the fuzzer, and therefore they are easy to list. A model-based fuzzer should cover the entire specification. Whereas, mutation-based fuzzers do not necessarily fully cover the specification, as implementing or including one message exchange sample from a specification does not guarantee that the entire specification is covered. Typically when a mutation-based fuzzer claims specification support, it means it is interoperable with test targets implementing the specification.
Especially regarding protocol fuzzing, the third-most critical metric is the level of statefulness of the selected Fuzzing approach. An entirely random fuzzer will typically only test the first messages in complex stateful protocols. The more state-aware the fuzzing approach you are using is, the deeper the fuzzer can go in complex protocols exchanges. The statefulness is a difficult requirement to define for Fuzzing tools, as it is more a metric for defining the quality of the used protocol model, and can, thus, only be verified by running the tests.
I hope this was helpful. We also have studies in other metrics such as looking at code coverage and other more or less useless data. ;) Metrics is a great topic for a thesis. Email me at ari.takanen#codenomicon.com if you are interested to get access to our extensive research on this topic.

Robustness is very subjective but you could have a look at FingBugs, Cobertura and Hudson which when correctly combined together could give you a sense of security over time that the software is robust.

You could look into mean time between
failures as a robustness measure.
The problem with "MTBF" is that it is usually measured in positive traffic whereas failures often happen in unexpected situations. It does not give any indication of robustness or reliability. No matter if a web site stays always on in lab environment, it will still be hacked in a second in the Internet if it has a weakness.

Related

Using JMH as a framework for performance testing on functional/user level. Is it wrong?

I want to use JMH as a framework for performance testing on functional/user level for web application. Imagine me using JMH to, say, measure how long it takes from the moment when 100 users click "Post Your Question" on this site concurrently, to the moment when user sees their question posted.
Is this entirely wrong? What are the drawbacks of such approach?
I do not expect a nanosecond accuracy for those tests: half a second to a second accuracy are just fine.
I created a first realistic test, and really liked how it looked / worked - exactly what I need. But am I missing some big trouble ahead by using micro-benchmark framework for what it's not intended to do?
Not looking for tool recommendations

Now using this approach for approximately 6 months, I want to say that I still did not see any drawbacks. A few things I learned:
Even though on functional/user level of accuracy is lower, it's important to learn how various configuration parameters work (especially JVM-related, e.g. fork). They may influence how you build your tests, how do you run them, and what do you measure.
JMH is really lightweight and efficient, so comparing to the results obtained using other frameworks may not be valid (basically we saw 10-20% performance boost when running with JMH); I had to establish a new baseline.
JMH Jenkins plug-in helps with visualizing the results

Java application - which all parts of my code are being fired up in production?

I have a java web based application running in production. I need some way to be able to see which all parts of the code is being actually used, by the actions of the end user.
Just to clarify my requirement further.
I do not want to put a logging based solution. Any solution that needs me to put some logs and analyse the logs is not something that I am looking from.
I need some solution that works on similar lines like unit test coverage reporter. Like cobertura or emma reports, after running the unit tests, it shows me which all part of my code was fired up by the unit tests. I need something that will listen to JVM in production and tell me which all parts of my code is being fired up in production by the action of end user.
Why am I trying to do this?
I have a code that I have inherited. It is a big piece - some 25,000 classes. One of the bits that I need to do is to chop off parts of the application that is not being used too much. If I can show to management that there are parts of the application that are being scarcely used, I can chop off those parts from this product and effectively make this product a little more manageable (as in the manual regression test suite that needs to run every week or so and takes a couple of days, can be shortened).
Hope there is some ready solution to this.

As Joachim Sauer said in the comments below your question: the most straightforward approach is to just use a Code Coverage Tool that you'd use for unit testing and instrument the production code with it.
There's a major catch: overhead. Code Coverage analysis can really slow things down and while an informed user-base will tolerate some temporary performance degradation, the whole thing needs to remain useable.
From my experience JaCoCo is relatively light and doesn't impose much overhead, whereas Cobertura will impose a tremendous slowdown. On the other hand, JaCoCo merely flags "hit or no hit" whereas Cobertura gives you per-line hit counts. This means that JaCoCo will only let you find dead spots, whereas Cobertura will let you find rarely hit spots.
Whichever of these two tools you use (possibly one after the other), you may end up with giant class whitelists and class blacklists to restrict the coverage counting to places where it makes sense to do so, thereby keeping the performance overhead down. For example, if the entire thing has a single front controller Servlet, including that in the analysis will maximize the performance overhead while providing no information of value. This could turn into a lot of work and a lot of application deployments.
It may actually be quicker and less work to identify bottlenecks/gateways into specific subsystems and slap a counter on each of those (e.g. perf4j or even a full blown Nagios). Queries are another good place to slap a counter on. If you suspect some part of the application is rarely used, put a few counters there and see what happens.

Code refactoring on bad system design

I am a junior software engineer who've been given a task to take over a old system. This system has several problems, based on my preliminary assessment.
spaghetti code
repetitive code
classes with 10k lines and above
misuse and over-logging using log4j
bad database table design
Missing source control -> I have setup Subversion for this
Missing documents -> I have no idea of the business rule, except to read the codes
How should I go about it to enhance the quality of the system and resolve such issues? I can think of using static code analysis software to resolve any bad coding practice.
However, it can't detect any bad design issues or problems. How should I go about resolving these issues step by step?

Get and read Working Effectively With Legacy Code. It deals exactly with this situation.
As others have also advised, for refactoring you need a solid set of unit tests. However, legacy code is typically very difficult to unit test as is, since it has not been written to be unit testable. So you need to refactor first to allow unit testing, which would allow you to start refactoring... a bad catch.
This is where the book will help you. It gives lots of practical advice on how to make badly designed code unit testable with the minimal, and safest possible, code changes. Automatic refactorings can also help you here, but there are tricks described in the book which can only be done by hand. Then once the first set of unit tests are in place, you can start gradually refactoring towards better, more maintainable code.
Update: For hints on how to take over legacy code, you may find this earlier answer of mine useful.
As #Alex noted, unit tests are also very useful to understand and document the actual behaviour of the code. This is especially useful when documentation about the system is nonexistent or outdated.

Focus on stability first. You can't enhance or refactor until you have some kind of stable environment in-place around the application.
Some thoughts:
Revision control. You've made a start by setting-up subversion. Now make sure that your database schemas, stored procedures, scripts, third-party components, etc. are under revision control too. Have a version labelling system, make sure you label versions and can accurately access old versions in the future.
Build and release. Have a way to build stable releases on a machine other than your dev machine. You may want to use ant/nant, make, msbuild, or even a batch file or shell script. You may need deployment scripts / installers too if they don't exist.
Get it under test. Do not change the app until you have a way to know whether your change has broken it. For this you need tests. You should hopefully be able to write xunit unit tests for some of the simpler, stand-alone classes, but try to build some system/integration tests that exercise the application as a whole. Without high code coverage (which you won't have to begin with) integration tests are your best bet. Get into the habit of running the tests as often as possible. Take every opportunity to extend them.
Make small, focussed changes. Try to identify systems/subsystems within the application, and improve the boundaries between them. This reduces the knock-on effects of changes you may make. Beware the temptation to "pretty-up" the code by reformatting it or imposing the latest fashionable design pattern. Turning-around a system like this takes time.
Documentation. Its necessary, but don't worry too much about it. System documentation is rarely used in my experience. Good tests are usually better than good documentation. Concentrate on documenting the interfaces between the application and the system context that it runs in (inputs, outputs, file structures, db schemas, etc).
Manage expectations. If its in bad shape then it will probably resist your efforts to make changes and timescales may be harder than usual to estimate. Make sure management and stakeholders understand that.
At all costs, beware the temptation to just rewrite the whole thing. Its almost never the right thing to do in this situation. If it works, concentrate on keeping it working.
As a junior developer, don't be afraid to ask for help. As others have said, Working Effectively With Legacy Code is a good book to read, as is Martin Fowler's Refactoring.
Good luck!

First, don't fix what isn't broken. As long as the system you are to take over works, leave functionality alone.
The system is obviuosly broken when it comes to maintainability, however, so that is what you tackle. As mentioned above, write some tests first, get the source backed up in a cvs, and THEN start by cleaning up small pieces first, then the larger ones and so on. Do NOT attack the bigger architectural issues until you have gained a good understanding of how the system works. Tools won't help you as long as you don't dive into the code yourself, but when you do, they do help a lot.
Remember, nothing is "perfect". Don't over-engineer. Obey the KISS and YAGNI principles.
EDIT: Added direct link to YAGNI article

Your issue #7 is by far the most important. As long as you have no idea how the system is supposed to behave, all technical considerations are secondary. Everyone is suggesting unit tests - but how can you write a useful test if you can't distinguish between wanted and unwanted behaviour?
So before you start touching the code, you have to understand the system from the user's point of view: talk to users, observe them using the system, write documentation on the use case level.
Yes, I am seriously suggesting that you spend days, more likely weeks, without changing a single line of code. Because right now, any change you make is likely to break things without you realizing it.
Once you understand the app, you'll at least know which functionality is important to test (manually or automated).

Write some unit tests first, and make sure they pass. Then with each refactoring change you make, just keep making sure the tests keep passing. Then you can be confident that your application behaviour to the outside world hasn't changed.
This also has the added benefit that the tests will always be there, so for any future changes the tests should still pass, guarding against any regressions in the new changes.

First and foremost, make sure you have source control system installed and all source code is versioned and can be built.
Next, you can try writing unit test for core parts of your system. From there, when you have a more or less solid body of regression tests, you can actually proceed with refactoring.
When I encounter messy codebase, I usually start with renaming poorly-named types and methods to better reflect their initial intent. Next you can try splitting huge methods into smaller ones.

Keep in mind that this legacy system, with all it's spaghetti code, currently works. Don't go changing things just because they don't look as pretty as they should. Focus on stability, new features & familiarity before ripping old code out left right and centre.

Firstly, let me say that Working Effectively with Legacy Code is probably a really good book to read, judging by three answers within a minute of each other.
bad database table design
This one, you are probably stuck with. If you try to change an existing database design you are probably committing yourself to redesigning the whole system and writing migration tools for the existing data. Leave well alone.

My standard answer to this question is: Refactor the Low-hanging Fruit. In this case, I'd be inclined to take one of the 10K-line classes and seek out opportunities to Sprout Class, but that's just my own proclivity; you might be more comfortable changing other things first (setting up source control was an excellent first step!) Test what you can; refactor what can't be tested, take a step at a time, and make it better.
Keep in mind as you progress how much better you are making things; if you concentrate only on how bad things still are, you're likely to become discouraged.

As others have noted, don't change something that works just to make it prettier. The risk that you will introduce errors is great.
My philosophy is: As I have to make changes to satisfy new requirements or to fix reported bugs, I try to make the piece of code that I have to change a little cleaner. I'm going to have to test the changed code anyway, so now is a good time to do a little clean-up at small additional cost.
Fundamental design changes are the toughest and must be saved for occasions where you have to make a big enough change that you would be testing all the changed code anyway.
Changing bad database design is hardest of all because the poorly designed tables are likely used by many programs. Any change to the database requires changing every program that reads or writes it. The best way to accomplish this is usually to try to reduce the number of places that access any given part of the database. To take a simple example: Suppose there are 20 places that read through customer records and calculate the customer account balance. Replace this with one function that reads the database and returns the total, and twenty calls to that function. Now you can change the schema for the customer records and there is only one piece of code to change instead of 20. The principle is simple enough, but in practice it is unlikely that every function that accesses a given record is doing the same thing. Even if the original programmer was clumsy enough to write the same code 20 times (not unlikely -- I've seen plenty of that), the real situation is probably not that he wrote 1 function 20 times, period, but that he wrote function A 20 times, function B 12 times, function C 4 times, etc.

Working Effectively With Legacy Code might be helpful.

Design issues are very difficult to catch. The first place to start is understanding the design of the application. I find it useful to diagram using either UML or a process flow diagram, anything works that communicates the design and working for the application.
From there I go into more detail, and ask myself the questions "Would I have done it this way", what other options are there. It is easy to see code-debt, i.e. the debt that we get from making bad choices, as always bad, but sometimes there are other factors involved like budget, time, availability of resources etc. Their you have to ask the question if it is worth refactoring a working but bad designed application.
If there are many upcoming new features, changes, bug fixes, etc I would say it is good to refactor, but if the application rarely changes and is stable, then maybe leaving it as is is a better approach.
Another sidepoint to note, is that if the code is used by another application as a service or module, then refactoring might first mean create a stub around the code that servers as the interfaces, once that is defined clearly and has unit test to prove it work. You can choose any technology to fill in the details.

A good book on this subject is Working Effectively with Legacy Code By Michael Feathers (2004). It goes through the process of making small changes, while working towards a bigger clean up.
Write unit test & Find and remove duplicate code.
Write unit test & Break long methods into a series of short methods.
Write unit test & Find and remove duplicate method.
Write unit test & Break apart classes so that the follow the single responsibility principle.

Try to create some unit tests first that can trigger some actions in your code.
Commit everyting in SVN and TAG it (in case that something goes bad you'll have an escape pod).
Use inCode Eclipse plugin http://www.intooitus.com/inCode.html and look for what refactorings it proposes. Check if the refactorings proposed seem ok for your proble. Try to understand them.
Retest with the units created before.
Now you can use FindBugs and/or PMD to check for other subtle issues.
If everything is oka you might want to check-in again.
I'd also try reading the source in order to detect some cases where patterns can be applied.

What is the easiest straightforward way of telling which version performs better?

I have an application, which I have re-factored so that I believe it is now faster. One can't possibly feel the difference, but in theory, the application should run faster. Normally I would not care, but as this is part of my project for my master's degree, I would like to support my claim that the re-factoring did not only lead to improved design and 'higher quality', but also an increase in performance of the application (a small toy-thing - a train set simulation).
I have toyed with the latest VisualVM thing today for about four hours but I couldn't get anything helpful out of it. There isn't (or I haven't found it) a way to simply compare the profiling results taken from the two versions (pre- and post- refactoring).
What would be the easiest, the most straightforward way of simply telling the slower from the faster version of the application. The difference of the two must have had an impact on the performance. Thank you.

I would suggest creating a few automated tests that simulate real usage of the application. Create enough tests to have a decent benchmark.
Run the test suite for both versions of the app under various loads.
That should give you a pretty real world time of performance. Doing it at a lower level may not give you an accurate truth.

I assume you can find a good way to measure the difference, and you can say it is due to the refactoring if there's nothing else you did, but I would be bothered by that, because that's not really understanding why it's faster.
Here's an example of really aggressive performance tuning.
What convinces me is if it can be shown that
A particular line of code, or small set of lines, is directly responsible for approximate fraction F% of overall wall-clock time,
It is shown that that line or lines is not really necessary in the sense that a way can be found to use it a lot less or maybe not at all,
That change results in a reduction in overall wall-clock time of approximately F%.

Unit testing real-time / concurrent software [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How should I unit test threaded code?
The classical unit testing is basically just putting x in and expecting y out, and automating that process. So it's good for testing anything that doesn't involve time. But then, most of the nontrivial bugs I've come across have had something to do with timing. Threads corrupt each others' data, or cause deadlocks. Nondeterministic behavior happens – in one run out of million. Hard stuff.
Is there anything useful out there for "unit testing" parts of multithreaded, concurrent systems? How do such tests work? Isn't it necessary to run the subject of such test for a long time and vary the environment in some clever manner, to become reasonably confident that it works correctly?

Most of the work I do these days involves multi-threaded and/or distributed systems. The majority of bugs involve "happens-before" type errors, where the developer assumes (wrongly) that event A will always happen before event B. But every 1000000th time the program is run, event B happens first, and this causes unpredictable behavior.
Additionally, there aren't really any good tools to detect timing issues, or even data corruption caused by race conditions. Tools like Helgrind and drd from the Valgrind toolkit work great for trivial programs, but they are not very useful in diagnosing large, complex systems. For one thing, they report false positives quite frequently (Helgrind especially). For another thing, it's difficult to actually detect certain errors while running under Helgrind/drd simply because programs running under Helgrind run almost 1000x slower, and you often need to run a program for quite a long time to even reproduce the race condition. Additionally, since running under Helgrind totally changes the timing of the program, it may become impossible to reproduce a certain timing issue. That's the problem with subtle timing issues; they're almost Heisenbergian in the sense that altering a program to detect timing issues may obscure the original issue.
The sad fact is, the human race still isn't adequately prepared to deal with complex, concurrent software. So unfortunately, there's no easy way to unit-test it. For distributed systems especially, you should plan your program carefully using Lamport's happens-before diagrams to help you identify the necessary order of events in your program. But ultimately, you can't really get away from brute-force unit testing with randomly varying inputs. It also helps to vary the frequency of thread context-switching during your unit-test by, e.g. running another background process which just takes up CPU cycles. Also, if you have access to a cluster, you can run multiple unit-tests in parallel, which can detect bugs much quicker and save you a lot of time.

If you can run your tests under Linux, valgrind includes a tool called helgrind which purports to detect race conditions and potential deadlocks in programs that use pthreads; you might get some benefit from running your multithreaded code under that, since it will report potential errors even if they didn't actually occur in that particular test run.

I have never heard of anything that can.
I guess if someone was to design one, it would have to have exact control over the execution of the threads and execute all possible combinations of stepping of the threads.
Sounds like a major task, not to mention the mathematical combinations for non-trivial sized threads when there are a handful or more of them...
Although, a quick search of stackoverflow... Unit testing a multithreaded application?

If the tested system is simple enough you could control the concurrency quite well by blocking operations in external mockup systems. This blocking can be done for example by waiting for some other operation to be started. If you can control all external calls this might work quite well by implementing different blocking sequences. I have tried this and it does reveal lock-level bugs quite well if you know possible problematic sequences well. And compared to many other concurrency testing it is quite deterministic. However this approach doesn't detect low level race conditions too well. I usually just go for load testing to find those, but I quess that isn't exactly unit testing.
I have seen these concurrency testing frameworks for .net, I'd assume its only matter of time before someone writes one for Java (hopefully).
And not to forget good old code reading. One of the best ways to find concurrency bugs is to just read through the code once again giving it your full concentration.

Perhaps the answer is that you shouldn't. In concurrent systems, there may not always be a single deterministic answer that is correct.
Take the example of people boarding a train and choosing a seat. You are going to end up with different results everytime.

Awaitility is a useful framework when you need to deal with asynchronicity in your tests. It allows you to wait until some state somewhere in your system is updated. For example:
await().untilCall( to(myService).myMethod(), equalTo(3) );
or
await().until( fieldIn(myObject).ofType(int.class), greaterThan(1));
It also has Scala and Groovy support.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.