How to determine which classes are used by a Java program?

How to determine which classes are used by a Java program? - java

Is there any tool that lists which and when some classes are effectively used by an app or, even-better, automatically trims JAR libraries to only provide classes that are both referenced and used?

Bear in mind that, as proven by the halting problem, you can't definitely say that a particular class is or isn't used. At least on any moderately complex application. That's because classes aren't just bound at compile-time but can be loaded:
based on XML config (eg Spring);
loaded from properties files (eg JDBC driver name);
added dynamically with annotations;
loaded as a result of external input (eg user input, data from a database or remote procedure call);
etc.
So just looking at source code isn't enough. That being said, any reasonable IDE will provide you with dependency analysis tools. IntelliJ certainly does.
What you really need is runtime instrumentation on what your application is doing but even that isn't guaranteed. After all, a particular code path might come up one in 10 million runs due to a weird combination of inputs so you can't be guaranteed that you're covered.
Tools like this do have some value though. You might want to look at something like Emma. Profilers like Yourkit can give you a code dump that you can do an analysis on too (although that won't pick up transient objects terribly well).
Personally I find little value beyond what the IDE will tell you: removing unused JARs. Going more granular than that is just asking for trouble for little to no gain.

Yes, you want ProGuard. It's a completely free Java code shrinker and obfuscator. It's easy to configure, fast and effective.

You might try JarJar http://code.google.com/p/jarjar/
It trims the jar dependencies.

For most cases, you can do it quite easily using just javac.
Delete you existing class files. Call javac with the name of your entry classes. It will compile those classes necessary, but no more. Job done.

Related

How to find a list of methods used only within tests [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
What tools do you use to find unused/dead code in large java projects? Our product has been in development for some years, and it is getting very hard to manually detect code that is no longer in use. We do however try to delete as much unused code as possible.
Suggestions for general strategies/techniques (other than specific tools) are also appreciated.
Edit: Note that we already use code coverage tools (Clover, IntelliJ), but these are of little help. Dead code still has unit tests, and shows up as covered. I guess an ideal tool would identify clusters of code which have very little other code depending on it, allowing for docues manual inspection.

An Eclipse plugin that works reasonably well is Unused Code Detector.
It processes an entire project, or a specific file and shows various unused/dead code methods, as well as suggesting visibility changes (i.e. a public method that could be protected or private).

CodePro was recently released by Google with the Eclipse project. It is free and highly effective. The plugin has a 'Find Dead Code' feature with one/many entry point(s). Works pretty well.

I would instrument the running system to keep logs of code usage, and then start inspecting code that is not used for months or years.
For example if you are interested in unused classes, all classes could be instrumented to log when instances are created. And then a small script could compare these logs against the complete list of classes to find unused classes.
Of course, if you go at the method level you should keep performance in mind. For example, the methods could only log their first use. I dont know how this is best done in Java. We have done this in Smalltalk, which is a dynamic language and thus allows for code modification at runtime. We instrument all methods with a logging call and uninstall the logging code after a method has been logged for the first time, thus after some time no more performance penalties occur. Maybe a similar thing can be done in Java with static boolean flags...

I'm suprised ProGuard hasn't been mentioned here. It's one of the most mature products around.
ProGuard is a free Java class file shrinker, optimizer, obfuscator,
and preverifier. It detects and removes unused classes, fields,
methods, and attributes. It optimizes bytecode and removes unused
instructions. It renames the remaining classes, fields, and methods
using short meaningless names. Finally, it preverifies the processed
code for Java 6 or for Java Micro Edition.
Some uses of ProGuard are:
Creating more compact code, for smaller code archives, faster transfer across networks, faster loading, and smaller memory
footprints.
Making programs and libraries harder to reverse-engineer.
Listing dead code, so it can be removed from the source code.
Retargeting and preverifying existing class files for Java 6 or higher, to take full advantage of their faster class loading.
Here example for list dead code: https://www.guardsquare.com/en/products/proguard/manual/examples#deadcode

One thing I've been known to do in Eclipse, on a single class, is change all of its methods to private and then see what complaints I get. For methods that are used, this will provoke errors, and I return them to the lowest access level I can. For methods that are unused, this will provoke warnings about unused methods, and those can then be deleted. And as a bonus, you often find some public methods that can and should be made private.
But it's very manual.

Use a test coverage tool to instrument your codebase, then run the application itself, not the tests.
Emma and Eclemma will give you nice reports of what percentage of what classes are run for any given run of the code.

We've started to use Find Bugs to help identify some of the funk in our codebase's target-rich environment for refactorings. I would also consider Structure 101 to identify spots in your codebase's architecture that are too complicated, so you know where the real swamps are.

In theory, you can't deterministically find unused code. Theres a mathematical proof of this (well, this is a special case of a more general theorem). If you're curious, look up the Halting Problem.
This can manifest itself in Java code in many ways:
Loading classes based on user input, config files, database entries, etc;
Loading external code;
Passing object trees to third party libraries;
etc.
That being said, I use IDEA IntelliJ as my IDE of choice and it has extensive analysis tools for findign dependencies between modules, unused methods, unused members, unused classes, etc. Its quite intelligent too like a private method that isn't called is tagged unused but a public method requires more extensive analysis.

In Eclipse Goto Windows > Preferences > Java > Compiler > Errors/Warnings
and change all of them to errors. Fix all the errors. This is the simplest way. The beauty is that this will allow you to clean up the code as you write.
Screenshot Eclipse Code :

IntelliJ has code analysis tools for detecting code which is unused. You should try making as many fields/methods/classes as non-public as possible and that will show up more unused methods/fields/classes
I would also try to locate duplicate code as a way of reducing code volume.
My last suggestion is try to find open source code which if used would make your code simpler.

The Structure101 slice perspective will give a list (and dependency graph) of any "orphans" or "orphan groups" of classes or packages that have no dependencies to or from the "main" cluster.

DCD is not a plugin for some IDE but can be run from ant or standalone. It looks like a static tool and it can do what PMD and FindBugs can't. I will try it.
P.S. As mentioned in a comment below, the Project lives now in GitHub.

There are tools which profile code and provide code coverage data. This lets you see (as code is run) how much of it is being called. You can get any of these tools to find out how much orphan code you have.

FindBugs is excellent for this sort of thing.
PMD (Project Mess Detector) is another tool that can be used.
However, neither can find public static methods that are unused in a workspace. If anyone knows of such a tool then please let me know.

User coverage tools, such as EMMA. But it's not static tool (i.e. it requires to actually run the application through regression testing, and through all possible error cases, which is, well, impossible :) )
Still, EMMA is very useful.

Code coverage tools, such as Emma, Cobertura, and Clover, will instrument your code and record which parts of it gets invoked by running a suite of tests. This is very useful, and should be an integral part of your development process. It will help you identify how well your test suite covers your code.
However, this is not the same as identifying real dead code. It only identifies code that is covered (or not covered) by tests. This can give you false positives (if your tests do not cover all scenarios) as well as false negatives (if your tests access code that is actually never used in a real world scenario).
I imagine the best way to really identify dead code would be to instrument your code with a coverage tool in a live running environment and to analyse code coverage over an extended period of time.
If you are runnning in a load balanced redundant environment (and if not, why not?) then I suppose it would make sense to only instrument one instance of your application and to configure your load balancer such that a random, but small, portion of your users run on your instrumented instance. If you do this over an extended period of time (to make sure that you have covered all real world usage scenarios - such seasonal variations), you should be able to see exactly which areas of your code are accessed under real world usage and which parts are really never accessed and hence dead code.
I have never personally seen this done, and do not know how the aforementioned tools can be used to instrument and analyse code that is not being invoked through a test suite - but I am sure they can be.

There is a Java project - Dead Code Detector (DCD). For source code it doesn't seem to work well, but for .jar file - it's really good. Plus you can filter by class and by method.

Netbeans here is a plugin for Netbeans dead code detector.
It would be better if it could link to and highlight the unused code. You can vote and comment here: Bug 181458 - Find unused public classes, methods, fields

Eclipse can show/highlight code that can't be reached. JUnit can show you code coverage, but you'd need some tests and have to decide if the relevant test is missing or the code is really unused.

I found Clover coverage tool which instruments code and highlights the code that is used and that is unused. Unlike Google CodePro Analytics, it also works for WebApplications (as per my experience and I may be incorrect about Google CodePro).
The only drawback that I noticed is that it does not takes Java interfaces into account.

I use Doxygen to develop a method call map to locate methods that are never called. On the graph you will find islands of method clusters without callers. This doesn't work for libraries since you need always start from some main entry point.

Statically checking a Java app for link errors

I have a scenario where I have code written against version 1 of a library but I want to ship version 2 of the library instead. The code has shipped and is therefore not changeable. I'm concerned that it might try to access classes or members of the library that existed in v1 but have been removed in v2.
I figured it would be possible to write a tool to do a simple check to see if the code will link against the newer version of the library. I appreciate that the code may still be very broken even if the code links. I am thinking about this from the other side - if the code won't link then I can be sure there is a problem.
As far as I can see, I need to run through the bytecode checking for references, method calls and field accesses to library classes then use reflection to check whether the class/member exists.
I have three-fold question:
(1) Does such a tool exist already?
(2) I have a niggling feeling it is much more complicated that I imagine and that I have missed something major - is that the case?
(3) Do you know of a handy library that would allow me to inspect the bytecode such that I can find the method calls, references etc.?
Thanks!

I think that Clirr - a binary compatibility checker - can help here:
Clirr is a tool that checks Java libraries for binary and source compatibility with older releases. Basically you give it two sets of jar files and Clirr dumps out a list of changes in the public api. The Clirr Ant task can be configured to break the build if it detects incompatible api changes. In a continuous integration process Clirr can automatically prevent accidental introduction of binary or source compatibility problems.

Changing the library in your IDE will result in all possible compile-time errors.
You don't need anything else, unless your code uses another library, which in turn uses the updated library.

Be especially wary of Spring configuration files. Class names are configured as text and don't show up as missing until runtime.

If you have access to the source code, you could just compile source against the new library. If it doesn't compile, you have definitely a problem. If it compiles you may still have a problem if the program uses reflection, some kind of IoC stuff like Spring etc.
If you have unit tests, then you may have a better change catch any linking errors.
If you have only have a .class file of the program, then I don't know any tools that would help besides decomplining class file to source and compiling source again against the new library, but that doesn't sound too healthy.

The checks you mentioned are done by the JVM/Java class loader, see e.g. Linking of Classes and Interfaces.
So "attempting to link" can be simply achieved by trying to run the application. Of course you could hoist the checks to run them yourself on your collection of .class/.jar files. I guess a bunch of 3rd party byte code manipulators like BCEL will also do similar checks for you.
I notice that you mention reflection in the tags. If you load classes/invoke methods through reflection, there's no way to analyse this in general.
Good luck!

How to automate a build of a Java class and all the classes it depends on?

I guess this is kind of a follow-on to question 1522329.
That question talked about getting a list of all classes used at runtime via the java -verbose:class option.
What I'm interested in is automating the build of a JAR file which contains my class(es), and all other classes they rely on. Typically, this would be where I am using code from some third party open source product's "client logic" but they haven't provided a clean set of client API objects. Their complete set of code goes server-side, but I only need the necessary client bits.
This would seem a common issue but I haven't seen anything (e.g. in Eclipse) which helps with this. Am I missing something?
Of course I can still do it manually by: biting the bullet and including all the third-party code in a massive JAR (offending my purist sensibilities) / source walkthrough / trial and error / -verbose:class type stuff (but the latter wouldn't work where, say, my code runs as part of a J2EE servlet, and thus I only want to see this for a given Tomcat webapp and, ideally, only for classes related to my classes therein).

I would recommend using a build system such as Ant or Maven. Maven is designed with Java in mind, and is what I use pretty much exclusively. You can even have Maven assemble (using the assembly plugin) all of the dependent classes into one large jar file, so you don't have to worry about dependencies.
http://maven.apache.org/
Edit:
Regarding the servlet, you can also define which dependencies you want packaged up with your jar, and if you are making a stand alone application you can have the jar tool make an executable jar.
note: yes, I am a bit of a Maven advocate, as it has made the project I work on much easier. No I do not work on the project personally. :)

Take a look at ProGuard.
ProGuard is a free Java class file shrinker, optimizer, obfuscator, and preverifier. It detects and removes unused classes, fields, methods, and attributes. It optimizes bytecode and removes unused instructions. It renames the remaining classes, fields, and methods using short meaningless names. Finally, it preverifies the processed code for Java 6 or for Java Micro Edition.

What you want is not only to include the classes you rely on but also the classes, the classes you rely on, rely on. And so on, and so forth.
So that's not really a build problem, but more a dependency one. To answer your question, you can either solve this with Maven (apparently) or Ant + Ivy.
I work with Ivy and I sometimes build "ueber-jar" using the zipgroupfileset functionality of the Ant Jar task. Not very elegant would say some, but it's done in 10 seconds :-)

What settings affect the layout of compiled java .class files? How can you tell if two compiled classes are equal?

I have an app that was compiled with the built-in Eclipse "Compile" task. Then I decided to move the build procedure into Ant's javac, and the result ended being smaller files.
Later I discovered that adjusting the debuglevel to "vars,lines,source" I could embed the same debug information that Eclipse did, and in a lot of cases files stayed exactly the same size but the internal layout was different. And as a consequence, I couldn't use md5sum signatures to determine if they were exactly the same version.
Besides debug information, what can be the reason that 2 supposedly equal files get a different internal layout or size?
And how can you compare compiled .class files?

THere is no required order for things such as the order of the constant pool entries (essentially all of the symbol info) as well as the attributes for each field/method/class. Different compilers are free to write out in whatever order they want.
You can compared compiled classes, but you would need to dig into the class file structure and parse it. There are libraries out there for doing that, like BCEL or ASM, but I am not 100% sure they will help you with what you want to do.

The ASM Eclipse plugin has a bytecode comparer in it. You select two classes, right click, and do Compare With / Each Other Bytecode.

An important thing to note is that Eclipse does not use javac. Eclipse has its own compiler, the JDT, and so differences in the resulting .class files do not surprise me. I'd expect them to not be verbatim, because they are different compilers.
Due to their differences, there exists code that compiles with javac but not JDT, and vice versa. Typically I have seen the differences in the two become apparent in cases of heavy use of generics

Most importantly, the stack slots for local variables can be arranged arbitrarily without changing the semantics of the code. So basically, you cannot compare compiled class files without parsing and normalizing them - quite a lot of effort.
Why do you want to do that anyway?

as Michale B said, it can be arbitrary.
I work on systems that are using file sizes as security. If the .class files change in size, the class won't be given certain permissions.
Normally that would be easy to get around, but we have fairly complete control over the environment, so it's actually pretty functional.
Anyway, any time the classes that are watched are recompiled, it seems, we have to recalculate the size.
Another thing--a special key number is generated when the file is compiled. I don't know much about this, but it often prevents classes from working together. I believe the procedure is, compile class A and save it (call it a1). compile class a again (a2). Compile class b against class a2. Try to run b against a1. I believe that in this case it will fail at runtime.
If you could learn more about that key number, it might give you the info you are after.

For the comparisson you can decompile your class files and play with the sources generated. See this.

Is Eclipse doing some instrumentation to assist with running in the debugger?

Ultimately the configurations being used are probably making the difference. Assuming they are using the same versions of Java, there are a host of options that are available for the compile configuration (JDK compliance, class file compatibility and a host of debugging information options).

How can I override a class using a separate jar?

A customer requires a preview of a new feature of our product. They asked to have that feature sent to them in a jar file (like a patch). There's no problem with including the new classes in said jar file. However, an existing class was modified, which is needed to integrate the new feature. They just want to add this new jar file without having to update the core classes of our product. So, the question is: is it possible to override an already existing class using a separate jar? If so, how?
Thanks in advance.

There's a chance it'll work if you put the new jar earlier in the classpath than the original jar. It's worth trying, although it still sounds like a recipe for disaster - or at least, really hard to debug issues if somehow both classes are loaded.
EDIT: I had planned to write this bit earlier, but got interrupted by the end of a train journey...
I would go back to the customer and explain that while what they're asking is possible, it may cause unexpected problems. Updating the jar file is a much safer fix, with much less risk. The phrases "unexpected problems" and "risk" are likely to ring alarm bells with the customer, so hopefully they'll let you do the right thing.

Yes and no, it depends on your environment.
If you use, for example, OSGi and have your versions under control, it's just a matter of installing a new bundle with the exported package at a higher version (assuming your version ranges are lenient enough).
If you use plain old Java with no fancy custom class loading, you should be good to go putting it earlier on your class path (as others already mentioned).
If you do have custom class loading, you'll need to make sure that all the classes that your 'patched' class needs, and indeed the entire transitive dependency hull, is visible from the class loader which is loading the patched version, which might mean you need to ship the entire application, worst case.

All of the answers that stipulate putting the updated classes before the ones they are replacing in the classpath are correct, only provided the original JAR is not sealed or signed.

Yes, it may be possible, by putting it earlier on the classpath than your original jar. However, relying on the ordering of your classpath is not always going to lead to happiness. I'm not sure if it is even documented in the Java Language Spec; if not, then it's going to break for different JVMs and even different versions of the same JVM.
Instead, consider quoting a realistic time frame to integrate the new feature into the current codebase. This is perhaps not the answer you're looking for.

Probably more than you need for this specific case, but in generally if you just want to tweak or augment an existing class you can also use AspectJ with load-time weaving.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.