Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
What tools do you use to find unused/dead code in large java projects? Our product has been in development for some years, and it is getting very hard to manually detect code that is no longer in use. We do however try to delete as much unused code as possible.
Suggestions for general strategies/techniques (other than specific tools) are also appreciated.
Edit: Note that we already use code coverage tools (Clover, IntelliJ), but these are of little help. Dead code still has unit tests, and shows up as covered. I guess an ideal tool would identify clusters of code which have very little other code depending on it, allowing for docues manual inspection.
An Eclipse plugin that works reasonably well is Unused Code Detector.
It processes an entire project, or a specific file and shows various unused/dead code methods, as well as suggesting visibility changes (i.e. a public method that could be protected or private).
CodePro was recently released by Google with the Eclipse project. It is free and highly effective. The plugin has a 'Find Dead Code' feature with one/many entry point(s). Works pretty well.
I would instrument the running system to keep logs of code usage, and then start inspecting code that is not used for months or years.
For example if you are interested in unused classes, all classes could be instrumented to log when instances are created. And then a small script could compare these logs against the complete list of classes to find unused classes.
Of course, if you go at the method level you should keep performance in mind. For example, the methods could only log their first use. I dont know how this is best done in Java. We have done this in Smalltalk, which is a dynamic language and thus allows for code modification at runtime. We instrument all methods with a logging call and uninstall the logging code after a method has been logged for the first time, thus after some time no more performance penalties occur. Maybe a similar thing can be done in Java with static boolean flags...
I'm suprised ProGuard hasn't been mentioned here. It's one of the most mature products around.
ProGuard is a free Java class file shrinker, optimizer, obfuscator,
and preverifier. It detects and removes unused classes, fields,
methods, and attributes. It optimizes bytecode and removes unused
instructions. It renames the remaining classes, fields, and methods
using short meaningless names. Finally, it preverifies the processed
code for Java 6 or for Java Micro Edition.
Some uses of ProGuard are:
Creating more compact code, for smaller code archives, faster transfer across networks, faster loading, and smaller memory
footprints.
Making programs and libraries harder to reverse-engineer.
Listing dead code, so it can be removed from the source code.
Retargeting and preverifying existing class files for Java 6 or higher, to take full advantage of their faster class loading.
Here example for list dead code: https://www.guardsquare.com/en/products/proguard/manual/examples#deadcode
One thing I've been known to do in Eclipse, on a single class, is change all of its methods to private and then see what complaints I get. For methods that are used, this will provoke errors, and I return them to the lowest access level I can. For methods that are unused, this will provoke warnings about unused methods, and those can then be deleted. And as a bonus, you often find some public methods that can and should be made private.
But it's very manual.
Use a test coverage tool to instrument your codebase, then run the application itself, not the tests.
Emma and Eclemma will give you nice reports of what percentage of what classes are run for any given run of the code.
We've started to use Find Bugs to help identify some of the funk in our codebase's target-rich environment for refactorings. I would also consider Structure 101 to identify spots in your codebase's architecture that are too complicated, so you know where the real swamps are.
In theory, you can't deterministically find unused code. Theres a mathematical proof of this (well, this is a special case of a more general theorem). If you're curious, look up the Halting Problem.
This can manifest itself in Java code in many ways:
Loading classes based on user input, config files, database entries, etc;
Loading external code;
Passing object trees to third party libraries;
etc.
That being said, I use IDEA IntelliJ as my IDE of choice and it has extensive analysis tools for findign dependencies between modules, unused methods, unused members, unused classes, etc. Its quite intelligent too like a private method that isn't called is tagged unused but a public method requires more extensive analysis.
In Eclipse Goto Windows > Preferences > Java > Compiler > Errors/Warnings
and change all of them to errors. Fix all the errors. This is the simplest way. The beauty is that this will allow you to clean up the code as you write.
Screenshot Eclipse Code :
IntelliJ has code analysis tools for detecting code which is unused. You should try making as many fields/methods/classes as non-public as possible and that will show up more unused methods/fields/classes
I would also try to locate duplicate code as a way of reducing code volume.
My last suggestion is try to find open source code which if used would make your code simpler.
The Structure101 slice perspective will give a list (and dependency graph) of any "orphans" or "orphan groups" of classes or packages that have no dependencies to or from the "main" cluster.
DCD is not a plugin for some IDE but can be run from ant or standalone. It looks like a static tool and it can do what PMD and FindBugs can't. I will try it.
P.S. As mentioned in a comment below, the Project lives now in GitHub.
There are tools which profile code and provide code coverage data. This lets you see (as code is run) how much of it is being called. You can get any of these tools to find out how much orphan code you have.
FindBugs is excellent for this sort of thing.
PMD (Project Mess Detector) is another tool that can be used.
However, neither can find public static methods that are unused in a workspace. If anyone knows of such a tool then please let me know.
User coverage tools, such as EMMA. But it's not static tool (i.e. it requires to actually run the application through regression testing, and through all possible error cases, which is, well, impossible :) )
Still, EMMA is very useful.
Code coverage tools, such as Emma, Cobertura, and Clover, will instrument your code and record which parts of it gets invoked by running a suite of tests. This is very useful, and should be an integral part of your development process. It will help you identify how well your test suite covers your code.
However, this is not the same as identifying real dead code. It only identifies code that is covered (or not covered) by tests. This can give you false positives (if your tests do not cover all scenarios) as well as false negatives (if your tests access code that is actually never used in a real world scenario).
I imagine the best way to really identify dead code would be to instrument your code with a coverage tool in a live running environment and to analyse code coverage over an extended period of time.
If you are runnning in a load balanced redundant environment (and if not, why not?) then I suppose it would make sense to only instrument one instance of your application and to configure your load balancer such that a random, but small, portion of your users run on your instrumented instance. If you do this over an extended period of time (to make sure that you have covered all real world usage scenarios - such seasonal variations), you should be able to see exactly which areas of your code are accessed under real world usage and which parts are really never accessed and hence dead code.
I have never personally seen this done, and do not know how the aforementioned tools can be used to instrument and analyse code that is not being invoked through a test suite - but I am sure they can be.
There is a Java project - Dead Code Detector (DCD). For source code it doesn't seem to work well, but for .jar file - it's really good. Plus you can filter by class and by method.
Netbeans here is a plugin for Netbeans dead code detector.
It would be better if it could link to and highlight the unused code. You can vote and comment here: Bug 181458 - Find unused public classes, methods, fields
Eclipse can show/highlight code that can't be reached. JUnit can show you code coverage, but you'd need some tests and have to decide if the relevant test is missing or the code is really unused.
I found Clover coverage tool which instruments code and highlights the code that is used and that is unused. Unlike Google CodePro Analytics, it also works for WebApplications (as per my experience and I may be incorrect about Google CodePro).
The only drawback that I noticed is that it does not takes Java interfaces into account.
I use Doxygen to develop a method call map to locate methods that are never called. On the graph you will find islands of method clusters without callers. This doesn't work for libraries since you need always start from some main entry point.
I am wondering if there is a tool that can identify cohesive blocks of code within JAVA source code.
For example if I had a long method that I would like to extract another method from - is there any tool that automatically can tell me large chunks of code that would be worth extracting?
There are plug-ins like PMD (for eclipse) & FindBugs etc., to do static code review which flags code based on rules your configured.
Google CodePro Analytics has an Eclipse plug-in that can provide a bunch of statistics like lines of code and cyclomatic complexity that can be good indicators that a method should be refactored.
I don't think you will find a tool that can automatically refactor 'cohesive' blocks of code into methods. There is too much subjectivity in that.
I looked for a similar tool with another question: https://stackoverflow.com/questions/12016289/tool-for-visualizing-dependencies-inside-a-java-class just on a slightly higher level: a single class.
I think the same answer applies: There isn't anything like that. There are tools though that provide information from which you might extract the information you are looking for.
I'd look into DependencyFinder. It provides access to all the bits and pieces of the code, so you could find clusters of code elements that access a common set of variables. Unfortunately I found the API a little confusing and not well documented, so you'll need some try and error or get into contact with the author. It also probably won't give you access to whitespace I think. But I don't think this is a valid approach anyway.
Another Tool you might want to look into is JaMoPP It should even have information about whitespace. Although it is a Java Plugin you can use the underlying library independent of eclipse (I think).
Check out Sonar It has very good support for finding duplicate code blocks.
Sonar uses PMD and FindBugs underlying. It also generates some custom metrics like class complexity, method complexity which points to classes / methods that are too large and which are candidate for breaking down.
Control blocks (i.e. conditionals and loops) are "cohesive" in that you cannot readily extract blocks of code that cross control block boundaries. Find blocks that can be replaced by a method call, that makes the original method easier to understand. You will have the best impact on complexity by extracting out the regions of deepest control flow nesting, so this is a good place to start. You don't need a tool as such - the code itself has the info you need.
I am evaluating clover currently and wonder how to use it best. First I'd like to understand how it works conceptually.
1) What does instrumentation mean? Are the test-calls attached to implementation's statements?
2) How is this done? Are the tests actually executed with some fancy execution context (similar to JRebel e.g.) for this? Or is it more like static analysis ?
3) After a "clover-run", some DB is saved to disk, and based on this, reports are generated right? Is the DB-Format accessible? I mean Can I launch my own analysis on it, e.g. using my own reporting tools ? What information does the DB contain exactly? Can I see the mapping between test and implementation there ?
4) Are there other tools that find the mapping between test and implementation? Not just the numbers, but which test, actually covers a line of code ...
Thanks, Bastl.
How is this done? Are the tests actually executed with some fancy execution context (similar to JRebel e.g.) for this? Or is it more like static analysis?
During code instrumentation by Clover it detects which methods are test methods (by default it recognizes JUnit3/4 and TestNG). Such methods gets additional instrumentation instructions. Shortly speaking, entering a test method will usually instantiate a dedicated coverage recorder which measures coverage exclusively for this test. More information about per test recording strategies available in Clover:
https://confluence.atlassian.com/display/CLOVER/Clover+Performance+Results
https://confluence.atlassian.com/display/CLOVER/About+Distributed+Per-Test+Coverage
After a "clover-run", some DB is saved to disk, and based on this, reports are generated right?
A Clover database (clover.db) contains information about code structure (packages, files, classes, methods, statements, branches), it has also information about test methods. There are also separate coverage recording files (produced at runtime) containing information about number of "hits" of given code element. Clover supports both global coverage (i.e. for the whole run) as well as per-test coverage (i.e. coverage from a single test).
More information is here:
https://confluence.atlassian.com/display/CLOVER/Managing+the+Coverage+Database
Is the DB-Format accessible?
The API is still in development (https://jira.atlassian.com/browse/CLOV-1375), but there is a possibility to get basic information. See:
https://confluence.atlassian.com/display/CLOVER/Database+Structure
for more details about DB model and code samples.
But the question is: do you really need to manually read this DB? You wrote that:
Can I see the mapping between test and implementation there ?
Such mapping is already provided by Clover - in the HTML report for example if you click on a source line it will pop up a list of test methods hitting this line.
PS: I'm a Clover developer at Atlassian, feel free to contact me if you have any questions.
What does instrumentation mean?
Additional code is woven in with your code.
Are the test-calls attached to implementation's statements?
I am not sure what you mean but it could be instructions or call to methods. Trivial methods will be inlined by the JIT at runtime.
How is this done?
There are many ways to do it, but often the Instrumentation class is to used to capture when a class is being loaded and a library like Objectweb's ASM is used to manipulate the code.
Are the tests actually executed with some fancy execution context
The context counts which lines have been executed.
Or is it more like static analysis ?
No, it is based on what is called.
After a "clover-run", some DB is saved to disk, and based on this, reports are generated right? Is the DB-Format accessible?
You had best ask the producers of clover as to the content of their files.
Are there other tools that find the mapping between test and implementation? Not just the numbers, but which test, actually covers a line of code ...
There are many code coverage tools available including EMMA, JaCoCo, Cobertura, IDEA has one builtin.
Is it possible to dump the complete program execution in java? I have to go through a complete process flow for a execution for a specific input values. Using step over, step into is a bit time consuming and I wanted to find out if any java command dumps the execution?
Maybe you want to have a look at the Chronon Time Travel Debugger.
I haven't tried it out yet, after a long beta period it seems to be now officially available and may satisfy your demands. It's a commercial product, but offers a free time trial.
Another alternative may be the use of debugging to a core file using the jsadebugd utility provided with the JDK. (you can't step forwards and backwards, but you can examine the stack/monitors of all threads which might help you already out)
If you only need the method calls, as stated in a comment, maybe a profiler which uses instrumentation like jprofiler or yourkit will also be helpful.
Or you want to have a look at btrace, a dtrace-like tool.
If you're able to modify/build the application, also some sort of a small AOP method interceptor will do the job.
If I understand correctly, you want something like a view of all the method calls that happen when your program processes some set of inputs. You can often get this kind of information out of a profiler, such as JProbe:
http://www.quest.com/jprobe/
You can run the program under JProbe, and then it will present a visual call graph of all of the method calls or a list of all method calls along with their frequency of execution.
Somewhat related are static analysis tools, such as Understand:
http://www.scitools.com/
Static analysis tools tend to focus on figuring out overall code structure rather than what happens with a specific set of inputs though.
Of course, you can always change code, but it's probably too much work to change every method in a large system to print a debugging string. Aspect-oriented programming tends to be a good approach for this kind of problem, because it's a cross-cutting concern across the codebase. There are a few different Java AOP solutions. I've used Spring AOP with dynamic proxies, which isn't enough to cover all method executions, but it is good enough for covering any method execution defined on an interface for a bean managed in a Spring container:
http://static.springsource.org/spring/docs/3.1.0.M1/spring-framework-reference/html/aop.html
For example, I've written a TimingAspect that wraps the execution of a method and logs its execution time after it completes. When I want to use it, I update my Spring applicationContext.xml to specify pointcuts for the methods I want to measure. You could define a similar TracingAspect to print a debugging message at the start of each method execution. Just remember to leave this off for production deployment.
For all of these approaches, measuring every single method call is probably going to cause information overload. You'll probably want to selectively measure just a few important pieces of your own codebase, filtering out core JDK methods and third-party libraries.
We've trying to separate a big code base into logical modules. I would like some recommendations for tools as well as whatever experiences you might have had with this sort of thing.
The application consists of a server WAR and several rich-clients distributed in JARs. The trouble is that it's all in one big, hairy code base, one source tree of > 2k files war. Each JAR has a dedicated class with a main method, but the tangle of dependencies ensnares quickly. It's not all that bad, good practices were followed consistently and there are components with specific tasks. It just needs some improvement to help our team scale as it grows.
The modules will each be in a maven project, built by a parent POM. The process has already started on moving each JAR/WAR into it's own project, but it's obvious that this will only scratch the surface: a few classes in each app JAR and a mammoth "legacy" project with everything else. Also, there are already some unit and integration tests.
Anyway, I'm interesting in tools, techniques, and general advice to breaking up an overly large and entangled code base into something more manageable. Free/open source is preferred.
Have a look a Structure 101. It is awesome for visualizing dependencies, and showing the dependencies to break on your way to a cleaner structure.
We recently have accomplished a similar task, i.e. a project that consisted of > 1k source files with two main classes that had to be split up. We ended up with four separate projects, one for the base utility classes, one for the client database stuff, one for the server (the project is a rmi-server-client application), and one for the client gui stuff. Our project had to be separated because other applications were using the client as a command line only and if you used any of the gui classes by accident you were experiencing headless exceptions which only occurred when starting on the headless deployment server.
Some things to keep in mind from our experience:
Use an entire sprint for separating the projects (don't let other tasks interfere with the split up for you will need the the whole time of a sprint)
Use version control
Write unit tests before you move any functionality somewhere else
Use a continuous integration system (doesn't matter if home grown or out of the box)
Minimize the number of files in the current changeset (you will save yourself a lot of work when you have to undo some changes)
Use a dependency analysis tool all the way before moving classes (we have made good experiences with DependencyFinder)
Take the time to restructure the packages into reasonable per project package sets
Don't fear to change interfaces but have all dependent projects in the workspace so that you get all the compilation errors
Two advices: The first thing you need is Test suites. The second advice is to work in small steps.
If you have a strong test suite already then you're in a good position. Otherwise, I would some good high level tests (aka: system tests).
The main advantage of high level tests is that a relatively small amount of tests can get you great coverage. They will not help you in pin-pointing a bug, but you won't really need that: if you work in small steps and you make sure to run the tests after each change you'll be able to quickly detect (accidentally introduced) bugs: the root of the bug is in the small portion of the code has changed since the last time you ran the tests.
I would start with the various tasks that you need to accomplish.
I was faced with a similar task recently, given a 15 year old code base that had been made by a series of developers who didn't have any communication with one another (one worked on the project, left, then another got hired, etc, with no crosstalk). The result is a total mishmash of very different styles and quality.
To make it work, we've had to isolate the necessary functionality, distinct from the decorative fluff to make it all work. For instance, there's a lot of different string classes in there, and one person spent what must have been a great deal of time making a 2k line conversion between COleDateTime to const char* and back again; that was fluff, code to solve a task ancillary to the main goal (getting things into and out of a database).
What we ended up having to do was identify a large goal that this code accomplished, and then writing the base logic for that. When there was a task we needed to accomplish that we know had been done before, we found it and wrapped it in library calls, so that it could exist on its own. One code chunk, for instance, activates a USB device driver to create an image; that code is untouched by this current project, but called when necessary via library calls. Another code chunk works the security dongle, and still another queries remote servers for data. That's all necessary code that can be encapsulated. The drawing code, though, was built over 15 years and such an edifice to insanity that a rewrite in OpenGL over the course of a month was a better use of time than to try to figure out what someone else had done and then how to add to it.
I'm being a bit handwavy here, because our project was MFC C++ to .NET C#, but the basic principles apply:
find the major goal
identify all the little goals that make the major goal possible
Isolate the already encapsulated portions of code, if any, to be used as library calls
figure out the logic to piece it all together.
I hope that helps...
To continue Itay's answer, I suggest reading Michael Feathers' "Working Effectively With Legacy Code"(pdf). He also recommends every step to be backed by tests. There is also A book-length version.
Maven allows you to setup small projects as children of a larger one. If you want to extract a portion of your project as a separate library for other projects, then maven lets you do that as well.
Having said that, you definitely need to document your tasks, what each smaller project will accomplish, and then (as has been stated here multiple times) test, test, test. You need tests which work through the whole project, then have tests that work with the individual portions of the project which will wind up as child projects.
When you start to pull out functionality, you need additional tests to make sure that your functionality is consistent, and that you can mock input into your various child projects.