Tool to identify cohesive blocks of (JAVA) code

Tool to identify cohesive blocks of (JAVA) code - java

I am wondering if there is a tool that can identify cohesive blocks of code within JAVA source code.
For example if I had a long method that I would like to extract another method from - is there any tool that automatically can tell me large chunks of code that would be worth extracting?

There are plug-ins like PMD (for eclipse) & FindBugs etc., to do static code review which flags code based on rules your configured.

Google CodePro Analytics has an Eclipse plug-in that can provide a bunch of statistics like lines of code and cyclomatic complexity that can be good indicators that a method should be refactored.
I don't think you will find a tool that can automatically refactor 'cohesive' blocks of code into methods. There is too much subjectivity in that.

I looked for a similar tool with another question: https://stackoverflow.com/questions/12016289/tool-for-visualizing-dependencies-inside-a-java-class just on a slightly higher level: a single class.
I think the same answer applies: There isn't anything like that. There are tools though that provide information from which you might extract the information you are looking for.
I'd look into DependencyFinder. It provides access to all the bits and pieces of the code, so you could find clusters of code elements that access a common set of variables. Unfortunately I found the API a little confusing and not well documented, so you'll need some try and error or get into contact with the author. It also probably won't give you access to whitespace I think. But I don't think this is a valid approach anyway.
Another Tool you might want to look into is JaMoPP It should even have information about whitespace. Although it is a Java Plugin you can use the underlying library independent of eclipse (I think).

Check out Sonar It has very good support for finding duplicate code blocks.
Sonar uses PMD and FindBugs underlying. It also generates some custom metrics like class complexity, method complexity which points to classes / methods that are too large and which are candidate for breaking down.

Control blocks (i.e. conditionals and loops) are "cohesive" in that you cannot readily extract blocks of code that cross control block boundaries. Find blocks that can be replaced by a method call, that makes the original method easier to understand. You will have the best impact on complexity by extracting out the regions of deepest control flow nesting, so this is a good place to start. You don't need a tool as such - the code itself has the info you need.

Related

How to find a list of methods used only within tests [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
What tools do you use to find unused/dead code in large java projects? Our product has been in development for some years, and it is getting very hard to manually detect code that is no longer in use. We do however try to delete as much unused code as possible.
Suggestions for general strategies/techniques (other than specific tools) are also appreciated.
Edit: Note that we already use code coverage tools (Clover, IntelliJ), but these are of little help. Dead code still has unit tests, and shows up as covered. I guess an ideal tool would identify clusters of code which have very little other code depending on it, allowing for docues manual inspection.

An Eclipse plugin that works reasonably well is Unused Code Detector.
It processes an entire project, or a specific file and shows various unused/dead code methods, as well as suggesting visibility changes (i.e. a public method that could be protected or private).

CodePro was recently released by Google with the Eclipse project. It is free and highly effective. The plugin has a 'Find Dead Code' feature with one/many entry point(s). Works pretty well.

I would instrument the running system to keep logs of code usage, and then start inspecting code that is not used for months or years.
For example if you are interested in unused classes, all classes could be instrumented to log when instances are created. And then a small script could compare these logs against the complete list of classes to find unused classes.
Of course, if you go at the method level you should keep performance in mind. For example, the methods could only log their first use. I dont know how this is best done in Java. We have done this in Smalltalk, which is a dynamic language and thus allows for code modification at runtime. We instrument all methods with a logging call and uninstall the logging code after a method has been logged for the first time, thus after some time no more performance penalties occur. Maybe a similar thing can be done in Java with static boolean flags...

I'm suprised ProGuard hasn't been mentioned here. It's one of the most mature products around.
ProGuard is a free Java class file shrinker, optimizer, obfuscator,
and preverifier. It detects and removes unused classes, fields,
methods, and attributes. It optimizes bytecode and removes unused
instructions. It renames the remaining classes, fields, and methods
using short meaningless names. Finally, it preverifies the processed
code for Java 6 or for Java Micro Edition.
Some uses of ProGuard are:
Creating more compact code, for smaller code archives, faster transfer across networks, faster loading, and smaller memory
footprints.
Making programs and libraries harder to reverse-engineer.
Listing dead code, so it can be removed from the source code.
Retargeting and preverifying existing class files for Java 6 or higher, to take full advantage of their faster class loading.
Here example for list dead code: https://www.guardsquare.com/en/products/proguard/manual/examples#deadcode

One thing I've been known to do in Eclipse, on a single class, is change all of its methods to private and then see what complaints I get. For methods that are used, this will provoke errors, and I return them to the lowest access level I can. For methods that are unused, this will provoke warnings about unused methods, and those can then be deleted. And as a bonus, you often find some public methods that can and should be made private.
But it's very manual.

Use a test coverage tool to instrument your codebase, then run the application itself, not the tests.
Emma and Eclemma will give you nice reports of what percentage of what classes are run for any given run of the code.

We've started to use Find Bugs to help identify some of the funk in our codebase's target-rich environment for refactorings. I would also consider Structure 101 to identify spots in your codebase's architecture that are too complicated, so you know where the real swamps are.

In theory, you can't deterministically find unused code. Theres a mathematical proof of this (well, this is a special case of a more general theorem). If you're curious, look up the Halting Problem.
This can manifest itself in Java code in many ways:
Loading classes based on user input, config files, database entries, etc;
Loading external code;
Passing object trees to third party libraries;
etc.
That being said, I use IDEA IntelliJ as my IDE of choice and it has extensive analysis tools for findign dependencies between modules, unused methods, unused members, unused classes, etc. Its quite intelligent too like a private method that isn't called is tagged unused but a public method requires more extensive analysis.

In Eclipse Goto Windows > Preferences > Java > Compiler > Errors/Warnings
and change all of them to errors. Fix all the errors. This is the simplest way. The beauty is that this will allow you to clean up the code as you write.
Screenshot Eclipse Code :

IntelliJ has code analysis tools for detecting code which is unused. You should try making as many fields/methods/classes as non-public as possible and that will show up more unused methods/fields/classes
I would also try to locate duplicate code as a way of reducing code volume.
My last suggestion is try to find open source code which if used would make your code simpler.

The Structure101 slice perspective will give a list (and dependency graph) of any "orphans" or "orphan groups" of classes or packages that have no dependencies to or from the "main" cluster.

DCD is not a plugin for some IDE but can be run from ant or standalone. It looks like a static tool and it can do what PMD and FindBugs can't. I will try it.
P.S. As mentioned in a comment below, the Project lives now in GitHub.

There are tools which profile code and provide code coverage data. This lets you see (as code is run) how much of it is being called. You can get any of these tools to find out how much orphan code you have.

FindBugs is excellent for this sort of thing.
PMD (Project Mess Detector) is another tool that can be used.
However, neither can find public static methods that are unused in a workspace. If anyone knows of such a tool then please let me know.

User coverage tools, such as EMMA. But it's not static tool (i.e. it requires to actually run the application through regression testing, and through all possible error cases, which is, well, impossible :) )
Still, EMMA is very useful.

Code coverage tools, such as Emma, Cobertura, and Clover, will instrument your code and record which parts of it gets invoked by running a suite of tests. This is very useful, and should be an integral part of your development process. It will help you identify how well your test suite covers your code.
However, this is not the same as identifying real dead code. It only identifies code that is covered (or not covered) by tests. This can give you false positives (if your tests do not cover all scenarios) as well as false negatives (if your tests access code that is actually never used in a real world scenario).
I imagine the best way to really identify dead code would be to instrument your code with a coverage tool in a live running environment and to analyse code coverage over an extended period of time.
If you are runnning in a load balanced redundant environment (and if not, why not?) then I suppose it would make sense to only instrument one instance of your application and to configure your load balancer such that a random, but small, portion of your users run on your instrumented instance. If you do this over an extended period of time (to make sure that you have covered all real world usage scenarios - such seasonal variations), you should be able to see exactly which areas of your code are accessed under real world usage and which parts are really never accessed and hence dead code.
I have never personally seen this done, and do not know how the aforementioned tools can be used to instrument and analyse code that is not being invoked through a test suite - but I am sure they can be.

There is a Java project - Dead Code Detector (DCD). For source code it doesn't seem to work well, but for .jar file - it's really good. Plus you can filter by class and by method.

Netbeans here is a plugin for Netbeans dead code detector.
It would be better if it could link to and highlight the unused code. You can vote and comment here: Bug 181458 - Find unused public classes, methods, fields

Eclipse can show/highlight code that can't be reached. JUnit can show you code coverage, but you'd need some tests and have to decide if the relevant test is missing or the code is really unused.

I found Clover coverage tool which instruments code and highlights the code that is used and that is unused. Unlike Google CodePro Analytics, it also works for WebApplications (as per my experience and I may be incorrect about Google CodePro).
The only drawback that I noticed is that it does not takes Java interfaces into account.

I use Doxygen to develop a method call map to locate methods that are never called. On the graph you will find islands of method clusters without callers. This doesn't work for libraries since you need always start from some main entry point.

Is there a Java tool to reduce the visibility of methods

I have a legacy code base that I want to refactor to reduce the visibility of methods to the minimum possible (private, protected, default) such that the code still works. Many of the methods in the codebase are unnecessarily public, and I'd like to change that to reduce the interface burden and simplify documentation as the code evolves in the future. Is there a tool that will analyze the codebase and generate a list of suggested methods whose visibility can be reduced? I can specify all the entry points into the code (just the main methods), and the codebase doesn't use reflection.
An Eclipse plugin will be even better.

Take a look at UCDetector (Unnecessary Code Detector - pronounced "You See Detector")
http://www.ucdetector.org/
It is an eclipse plugin and it helps to reduce visibility. I have used it for long time and it works very well.

Although there are tools that you can use to report the occurrence of methods which visibility can be reduced, I am not aware of something that allows you to transform the code to solve those issues.
However, you may find interesting taking a look to JTransformer and Ekeko.
Both allows you to query and accomplish custom code transformations based on logic programming techniques. JTransformer may be a bit more mature, but Ekeko also looks quite interesting. To the best of my knowledge they both are open source and include an Eclipse plugin.

Keeping track of utility classes

I've recently been more and more frustrated with a problem I see emerging in my projects code-base.
I'm working on a large scale java project that has >1M lines of code. The interfaces and class structure are designed very well and the engineers writing the code are very proficient. The problem is that in an attempt to make the code cleaner people write Utility classes whenever they need to reuse some functionality, as a result over time and as the project grows more and more utility methods crop up. However, when the next engineer comes across the need for the same functionality he has no way of knowing that someone had already implemented a utility class (or method) somewhere in the code and implements another copy of the functionality in a different class. The result is a lot of code duplication and too many utility classes with overlapping functionality.
Are there any tools or any design principles which we as a team can implement in order to prevent the duplication and low visibility of the utility classes?
Example: engineer A has 3 places he needs to transform XML to String so he writes a utility class called XMLUtil and places a static toString(Document) method in it. Engineer B has several places where he serializes Documents into various formats including String, so he writes a utility class called SerializationUtil and has a static method called serialize(Document) which returns a String.
Note that this is more than just code-duplication as it is quite possible that the 2 implementations of the above example are different (say one uses transformer API and the other uses Xerces2-J) so this can be seen as a "best-practices" problem as well...
Update: I guess I better describe the current environment we develop in.
We use Hudson for CI, Clover for code coverage and Checkstyle for static code analysis.
We use agile development including daily talks and (perhaps insufficient) code reviews.
We define all our utility classes in a .util which due to it's size now has 13 sub-packages and about 60 classes under the root (.util) class. We also use 3rd party libraries such as most of the apache commons jars and some of the jars that make up Guava.
I'm positive that we can reduce the amount of utilities by half if we put someone on the task of refactoring that entire package, I was wondering if there are any tools which can make that operation less costly, and if there are any methodologies which can delay as much as possible the problem from recurring.

A good solution to this problem is to start adding more object-orientation. To use your example:
Example: engineer A has 3 places he needs to transform XML to String so he writes a utility class called XMLUtil and places a static toString(Document) method in it
The solution is to stop using primitive types or types provided by the JVM (String, Integer, java.util.Date, java.w3c.Document) and wrap them in your own project-specific classes. Then your XmlDocument class can provide a convenient toString method and other utility methods. Your own ProjectFooDate can contain the parsing and formatting methods that would otherwise end up in various DateUtils classes, etc.
This way, the IDE will prompt you with your utility methods whenever you try to do something with an object.

Your problem is a very common one. And a real problem too, because there is no good solution.
We are in the same situation here, well I'd say worse, with 13 millions line of code, turnover and more than 800 developers working on the code. We often discuss about the very same problem that you describe.
The first idea - that your developers have already used - is to refactor common code in some utility classes. Our problem with that solution, even with pair programming, mentoring and discussion, is that we are simply too many for this to be effective. In fact we grow in subteams, with people sharing knowledge in their subteam, but the knowledge doesn't transit between subteams. Maybe we are wrong but I think that even pair programming and talks can't help in this case.
We also have an architecture team. This team is responsible to deal with design and architecture concerns and to make common utilities that we might need. This team in fact produces something we could call a corporate framework. Yes, it is a framework, and sometimes it works well. This team is also responsible to push best practices and to raise awareness of what should be done or not, what is available or what is not.
Good core Java API design is one of the reason for Java success. Good third party open sources libraries count a lot too. Even a small well crafted API allows to offer a really useful abstraction and can help reduce code size a lot. But you know, making framework and public API is not the same thing at all as just coding an utility class in 2 hours. It has a really high cost. An utility class costs 2 hours for the initial coding, maybe 2 days with debugging and unit tests. When you start sharing common code on big projects/teams, you really make an API. You must ensure perfect documentation then, really readable and maintainable code. When you release new version of this code, you must stay backward compatible. You have to promote it company wide (or at least team wide). From 2 days for your small utility class you grow to 10 days, 20 days or even 50 days for a full-fledged API.
And your API design may not be so great. Well, it is not that your engineers are not bright - indeed they are. But are you willing to let them work 50 days on a small utility class that just help parsing number in a consistent way for the UI? Are you willing to let them redesign the whole thing when you start using a mobile UI with totally different needs? Also have you noticed how the brightest engineers in the word make APIs that will never be popular or will fade slowly? You see, the first web project we made used only internal frameworks or no framework at all. We then added PHP/JSP/ASP. Then in Java we added Struts. Now JSF is the standard. And we are thinking about using Spring Web Flow, Vaadin or Lift...
All I want to say is that there is no good solution, the overhead grows exponentially with code size and team size. Sharing a big codebase restricts your agility and responsiveness. Any change must be done carefully, you must think of all potential integration problems and everybody must be trained of the new specificities and features.
But the main productivity point in a software company is not to gain 10 or even 50 lines of code when parsing XML. A generic code to do this will grow to a thousand lines of code anyway and recreates a complex API that will be layered by utility classes. When the guy make an utility class for parsing XML, it is good abstraction. He give a name to one dozen or even one hundred lines of specialized code. This code is useful because it is specialized. The common API allows to work on streams, URL, strings, whatever. It has a factory so you can choose you parser implementation. The utility class is good because it work only with this parser and with strings. And because you need one line of code to call it. But of course, this utility code is of limited use. It works well for this mobile application, or for loading XML configuration. And that's why the developer added the utility class for it in the first place.
In conclusion, what I would consider instead of trying to consolidate the code for the whole codebase is to split code responsibility as the teams grow:
transform your big team that work on one big project into small teams that work on several subprojects;
ensure that interfacing is good to minimize integration problems, but let team have their own code;
inside theses teams and corresponding codebases, ensure you have the best practices. No duplicate code, good abstractions. Use existing proven APIs from the community. Use pair programming, strong API documentation, wikis... But you should really let different teams make their choices, build their own code, even if this means duplicate code across teams or different design decisions. You know, if the design decisions are different this may be because the needs are different.
What you are really managing is complexity. In the end if you make one monolithic codebase, a very generic and advanced one, you increase the time for newcomers to ramp up, you increase the risk that developers will not use your common code at all, and you slow down everybody because any change has far greater chances to break existing functionality.

There are several agile/ XP practices you can use to address this, e.g.:
talk with each other (e.g. during daily stand-up meeting)
pair programming/ code review
Then create, document & test one or several utility library projects which can be referenced. I recommend to use Maven to manage dependecies/ versions.

You might consider suggesting that all utility classes be placed in a well organized package structure like com.yourcompany.util.. If people are willing to name sub packages and classes well, then at least if they need to find a utility, they know where to look. I don't think there is any silver bullet answer here though. Communication is important. Maybe if a developer sends a simple email to the rest of the development staff when they write a new utility, that will be enough to get it on people's radar. Or a shared wiki page where people can list/document them.

Team communication (shout out "hey does someone have a Document toString?")
Keep utility classes to an absolute minimum and restrict them to a single namespace
Always think: how can I do this with an object. In your example, I would extend the Document class and add those toString and serialize methods to it.

This problem is helped when combining IDE "code-completion" features with languages which support type extensions (e.g. C# and F#). So that, imagining Java had a such a feature, a programmer could explore all the extension methods on a class easily within the IDE like:
Document doc = ...
doc.to //list pops up with toXmlString, toJsonString, all the "to" series extension methods
Of course, Java doesn't have type extensions. But you could use grep to search your project for "all static public methods which take SomeClass as the first argument" to gain similar insight into what utility methods have already been written for a given class.

Its pretty hard to build a tool that recognizes "same functionality". (In theory this is in fact impossible, and where you can do it in practice you likely need a theorem prover).
But what often happens is people clone clode that is close to what they want, and then customize it. That kind of code you can find, using a clone detector.
Our CloneDR is a tool for detecting exact and near-miss cloned code based on using parameterized syntax trees. It matches parsed versions of the code, so it isn't confused by layout, changed comments, revised variable names, or in many cases, inserted or deleted statements. There are versions for many languages (C++, COBOL, C#, Java, JavaScript, PHP, ...) and you can see examples of clone detection runs at the provided
link. It typically finds 10-20% duplicated code, and if you abstract that code into library methods on a religious base, your code base can actually shrink (that has occurred with one organization using CloneDR).

You are looking for a solution that can you help you manage this inevitable problem, then I can suggest a tool:
TeamCity: an amazing easy to use product that manages all your automated code building from your repository and runs unit tests etc.
It's even a free product for most people.
The even better part: it has built in code duplicate detection across all your code.
More stuff to read up:
Tools to detect duplicated code (Java)

a standard application utility project. build a jar with the restricted extensibility scope and package based on functionality.
use common utilities like apache-commons or google collections and provide an abstraction
maintain knowledge-base and documentation and JIRA tracking for bugs and enhancements
evolutionary refactoring
findbugs and pmd for finding code duplication or bugs
review and test utility tools for performance
util karma! ask team members to contribute to the code base, whenever they find one in the existing jungle code or requiring new ones.

Functional depedency analysis

Developers who have used eclipse cannot miss out the Cntrl+Shift+G combo - the easiest way to find all references to a particular member/method/class in your workspace.
Consider a scenario where you are a new guy maintaining a web application written in java. Now, you are about to change a method signature, and you do a Cntl+Shift+G to find all references to the said method (yes, hoping that you are not doing depedency injection / reflection etc). However, a new guy, would want not to screw up any functionality in the application. How would ensure that the functional dependencies are not affected?
I guess..the question is a bit unclear.. lemme rephrase... Say you are changing something functional (an if loop in a business rule or whateva) - this will definetly CHANGE something else in the context of the application.. and at this point you wish there was something (a plugin?) in eclipse, that would tell you - "hey noob..don't change this - it would affect this..." - Now, if you were to create something that does this for eclipse (plugin?) - where would you start? (tagging parts of scr code and introducing a depdency tree? etc?)

Perhaps I failed to understand your question, but I think I might have an answer. Take a look at nWire for Java (or PHP). It is a plugin for code exploration. Focusing on a piece of code, the developer can quickly determine where the method is invoked, where the class is used, etc. This makes it easier to understand what you are about to change.
I am the developer of this plugin. If it is not exactly what you are looking for, let me know, I'll be happy to better understand what you are looking for.

Besides: ALT+SHIFT+C is the way to change a method signature. ALT+SHIFT+G "only" finds references, which is helpful of course.
vickirk mentionend the most important aspect here: Without having tests and a good code coverage you aren't able to apply any changes without risking a failing system afterwards.
The book "Working Effectively with Legacy Code" from Robert C Martin explains it nicely: All code, which is not covered by tests, is legacy code. You could draw the conclusion, that before you apply any functional change you need to ensure a sufficient test coverage.
Tagging parts in the source code seems like a bad idea, since these tags need to be additionally maintained, which usually never really happens in projects. :)

What about JDepend?

Cross-class-capable extendable static analysis tool for java?

I'm trying to write rules for detecting some errors in annotated multi-threaded java programs. As a toy example, I'd like to detect if any method annotated with #ThreadSafe calls a method without such an annotation, without synchronization. I'm looking for a tool that would allow me to write such a test.
I've looked at source analyzers, like CheckStyle and PMD, and they don't really have cross-class analysis capabilities. Bytecode analysers, like FindBugs and JLint seem rather difficult to extend.
I'd settle for a solution to something even simpler, but posing the same difficulty: writing a custom rule that checks whether each overriden method is annotated with #Override.

Have you tried FindBugs? It actually supports a set of annotations for thread safety (the same as those used in Java Concurrency in Practice). Also, you can write your own custom rules. I'm not sure whether you can do cross-class analysis, but I believe so.
Peter Ventjeer has a concurrency checking tool (that uses ASM) to detect stuff like this. I'm not sure if he's released it publicly but he might able to help you.
And I believe Coverity's static/dynamic analysis tools for thread safety do checking like this.

You can do cross-class analysis in PMD (though I've never used it for this specific purpose). I think it's possible using this visitor pattern that they document, though I'll leave the specifics to you.

A simple tool to checkup on annotations is apt (http://java.sun.com/j2se/1.5.0/docs/guide/apt/ also part of Java 6 api in javax.annotation.processing) however this only has type information (ie I couldn't find a quick way to get at the inheritance hierarchy using the javax.lang.model api, however if you can load the class you can get that information using reflection).

Try javap + regexes (eg. Perl)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.