Java snippet based AST-Manipulation before/during compilation - java

The question is whether the functionality I describe below already exists, or whether I need to make an attempt at creating it myself. I am aware that I am probably looking at a lot of work if it does not exist yet, and I am also sure that others have already tried. I am nevertheless grateful for comments such as "project A tried this, but..." or "dude D already failed because...". If somebody has an overall more elegant solution, that would of course be welcome as well.
I want to change the way I develop (private) Java code by introducing a multiplexing layer. What I mean by that is that I want to be able to create library-like parameterizable AST-snippets, which I want to insert into my code via some sort of placeholders (such as annotations). I am aware of project https://projectlombok.org/ and have found that, while I find it useful for small applications, it does not generally suit my requirements, as it does not seem possible to insert own snippets without forking the entire project and making major modifications. Also lombok only ever modifies a single file at a time, while I am looking for a solution that will need to 'know' multiple files at a time.
I imagine a structure like this:
Source S: (Parameterizable) AST-snippets that can be included via some sort of reference in Source A.
Source A: Regular Java-Code, in which I can reference snippets from Source A. This code will not be compiled directly, as it is lacking the referenced snippets, and would thus throw a lot of compile time exceptions.
Source T: Target Source, which is an AST-equivalent copy of Source A, except that all references of AST-Snippets have been replaced by their respective Snippet from Source S. It needs to be mappable to the original Source A as well as the resolved snippets from Source S, where applicable, as most development will happen there.
I see several challenges with this concept, not the least of which are debuggability, source-mapping and compatibility with other frameworks/APIs. Also, it seems a challenge to work around the one-file-at-a-time limitation, memory wise.
The advantage over lombok would be flexibility, since lombok only provides a fixed set of snippets for specific purposes, whereas this would enable devs to write own snippets, or make modifications to getters, setters etc. Also, lombok 'quirks' into the compilation step, and does not output the 'fused' source, afaik.
I want to target at least javac and eclipse's ecj compilers.

Related

Reverse Engineer Java *.class file to change data type of variable

I have a problem with an old application which runs on a Java Tomcat server and the source code for the application is not fully available, but the .class files are obviously all running on the tomcat server.
Can I somehow manipulate the bytecode of a .class file (used by JVM) so that I can change a variables datatype (because this is what has to be done)? Or even reverse engineer it to its old .java source code?
I have used decompilers and javap command up to now. Can I somehow copy the whole Tomcat application and:
decompile it
do my changes
recompile it?
Well, if you decompile it to make changes and recompile, then you're not going to need to change the byte code directly.
If you change the type, you'll have to change the type of any methods (like getters and setters) that use the variable. Then you'll need to change the calls of any methods in all classes that CALL those methods, and the types of their variables that hold these values, etc. The good news is that, if you manage to decompile it successfully, your IDE will tell you where all those places are, assuming the new type is incompatible with the old type.
I would evaluate this as "theoretically possible", but problematic. With the little information you've given us, there's no way to know the size of the job AFTER you successfully decompile the entire application.
I have a wild and crazy idea; I haven't done this, but as long as we're talking about things that are theoretically possible...
IF you manage to decompile all the code and get a system that you can recompile and run (and I strongly recommend you do that before you make any changes), if you are able to identify where the int is that you want to replace with a long, and then all the direct and indirect references to it, hopefully (because it's just this file size limit that you mention elsewhere) you end up with only a handful of classes.
The decompile should tell you their names. Create new classes with the exact same names, containing (of course) all their decompiled code. Change the methods that you need to change.
Now put those classes in a jar that is searched before the jar containing the application. You're limiting the number of classes for which you're providing new .class files to just those. This makes it easier to see exactly what has been changed, for future programmers, if it doesn't do anything else. It's possible because of the way Java handles its runtime; the step that's equivalent to 'linking' in a traditional compiled non-virtual-machine language happens when the class is loaded, instead of at compile time.
I did that. Not exactly that, but something very similar. Instead of decompiling and recompiling, which is very long, and tedious, I directly edited the byte-code of the class file.
pros
You do not need to compile anything at all, you just edit a file
no SDK, no IDE, etc is necessary, just a java-byte code editor
for small changes you can get away with single method modification
cons
very-very error-prone even if you know what you are doing
no way to track changes as you do with git
will probably require modifying all dependent classes
you should have some knowledge about how compiled code looks like, and behaves before even attempting such a thing.
you will most likely break a law or two since this will not be for "educational" purposes
you will be marked as "the hacker" and every odd job will be forwarded to you
PS: I had to edit licensing class of a product to allow more users. The company writing it ceased to exist, so buying was not an option. We switched to a new product anyway, it was just temporarily.

How to find a list of methods used only within tests [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
What tools do you use to find unused/dead code in large java projects? Our product has been in development for some years, and it is getting very hard to manually detect code that is no longer in use. We do however try to delete as much unused code as possible.
Suggestions for general strategies/techniques (other than specific tools) are also appreciated.
Edit: Note that we already use code coverage tools (Clover, IntelliJ), but these are of little help. Dead code still has unit tests, and shows up as covered. I guess an ideal tool would identify clusters of code which have very little other code depending on it, allowing for docues manual inspection.
An Eclipse plugin that works reasonably well is Unused Code Detector.
It processes an entire project, or a specific file and shows various unused/dead code methods, as well as suggesting visibility changes (i.e. a public method that could be protected or private).
CodePro was recently released by Google with the Eclipse project. It is free and highly effective. The plugin has a 'Find Dead Code' feature with one/many entry point(s). Works pretty well.
I would instrument the running system to keep logs of code usage, and then start inspecting code that is not used for months or years.
For example if you are interested in unused classes, all classes could be instrumented to log when instances are created. And then a small script could compare these logs against the complete list of classes to find unused classes.
Of course, if you go at the method level you should keep performance in mind. For example, the methods could only log their first use. I dont know how this is best done in Java. We have done this in Smalltalk, which is a dynamic language and thus allows for code modification at runtime. We instrument all methods with a logging call and uninstall the logging code after a method has been logged for the first time, thus after some time no more performance penalties occur. Maybe a similar thing can be done in Java with static boolean flags...
I'm suprised ProGuard hasn't been mentioned here. It's one of the most mature products around.
ProGuard is a free Java class file shrinker, optimizer, obfuscator,
and preverifier. It detects and removes unused classes, fields,
methods, and attributes. It optimizes bytecode and removes unused
instructions. It renames the remaining classes, fields, and methods
using short meaningless names. Finally, it preverifies the processed
code for Java 6 or for Java Micro Edition.
Some uses of ProGuard are:
Creating more compact code, for smaller code archives, faster transfer across networks, faster loading, and smaller memory
footprints.
Making programs and libraries harder to reverse-engineer.
Listing dead code, so it can be removed from the source code.
Retargeting and preverifying existing class files for Java 6 or higher, to take full advantage of their faster class loading.
Here example for list dead code: https://www.guardsquare.com/en/products/proguard/manual/examples#deadcode
One thing I've been known to do in Eclipse, on a single class, is change all of its methods to private and then see what complaints I get. For methods that are used, this will provoke errors, and I return them to the lowest access level I can. For methods that are unused, this will provoke warnings about unused methods, and those can then be deleted. And as a bonus, you often find some public methods that can and should be made private.
But it's very manual.
Use a test coverage tool to instrument your codebase, then run the application itself, not the tests.
Emma and Eclemma will give you nice reports of what percentage of what classes are run for any given run of the code.
We've started to use Find Bugs to help identify some of the funk in our codebase's target-rich environment for refactorings. I would also consider Structure 101 to identify spots in your codebase's architecture that are too complicated, so you know where the real swamps are.
In theory, you can't deterministically find unused code. Theres a mathematical proof of this (well, this is a special case of a more general theorem). If you're curious, look up the Halting Problem.
This can manifest itself in Java code in many ways:
Loading classes based on user input, config files, database entries, etc;
Loading external code;
Passing object trees to third party libraries;
etc.
That being said, I use IDEA IntelliJ as my IDE of choice and it has extensive analysis tools for findign dependencies between modules, unused methods, unused members, unused classes, etc. Its quite intelligent too like a private method that isn't called is tagged unused but a public method requires more extensive analysis.
In Eclipse Goto Windows > Preferences > Java > Compiler > Errors/Warnings
and change all of them to errors. Fix all the errors. This is the simplest way. The beauty is that this will allow you to clean up the code as you write.
Screenshot Eclipse Code :
IntelliJ has code analysis tools for detecting code which is unused. You should try making as many fields/methods/classes as non-public as possible and that will show up more unused methods/fields/classes
I would also try to locate duplicate code as a way of reducing code volume.
My last suggestion is try to find open source code which if used would make your code simpler.
The Structure101 slice perspective will give a list (and dependency graph) of any "orphans" or "orphan groups" of classes or packages that have no dependencies to or from the "main" cluster.
DCD is not a plugin for some IDE but can be run from ant or standalone. It looks like a static tool and it can do what PMD and FindBugs can't. I will try it.
P.S. As mentioned in a comment below, the Project lives now in GitHub.
There are tools which profile code and provide code coverage data. This lets you see (as code is run) how much of it is being called. You can get any of these tools to find out how much orphan code you have.
FindBugs is excellent for this sort of thing.
PMD (Project Mess Detector) is another tool that can be used.
However, neither can find public static methods that are unused in a workspace. If anyone knows of such a tool then please let me know.
User coverage tools, such as EMMA. But it's not static tool (i.e. it requires to actually run the application through regression testing, and through all possible error cases, which is, well, impossible :) )
Still, EMMA is very useful.
Code coverage tools, such as Emma, Cobertura, and Clover, will instrument your code and record which parts of it gets invoked by running a suite of tests. This is very useful, and should be an integral part of your development process. It will help you identify how well your test suite covers your code.
However, this is not the same as identifying real dead code. It only identifies code that is covered (or not covered) by tests. This can give you false positives (if your tests do not cover all scenarios) as well as false negatives (if your tests access code that is actually never used in a real world scenario).
I imagine the best way to really identify dead code would be to instrument your code with a coverage tool in a live running environment and to analyse code coverage over an extended period of time.
If you are runnning in a load balanced redundant environment (and if not, why not?) then I suppose it would make sense to only instrument one instance of your application and to configure your load balancer such that a random, but small, portion of your users run on your instrumented instance. If you do this over an extended period of time (to make sure that you have covered all real world usage scenarios - such seasonal variations), you should be able to see exactly which areas of your code are accessed under real world usage and which parts are really never accessed and hence dead code.
I have never personally seen this done, and do not know how the aforementioned tools can be used to instrument and analyse code that is not being invoked through a test suite - but I am sure they can be.
There is a Java project - Dead Code Detector (DCD). For source code it doesn't seem to work well, but for .jar file - it's really good. Plus you can filter by class and by method.
Netbeans here is a plugin for Netbeans dead code detector.
It would be better if it could link to and highlight the unused code. You can vote and comment here: Bug 181458 - Find unused public classes, methods, fields
Eclipse can show/highlight code that can't be reached. JUnit can show you code coverage, but you'd need some tests and have to decide if the relevant test is missing or the code is really unused.
I found Clover coverage tool which instruments code and highlights the code that is used and that is unused. Unlike Google CodePro Analytics, it also works for WebApplications (as per my experience and I may be incorrect about Google CodePro).
The only drawback that I noticed is that it does not takes Java interfaces into account.
I use Doxygen to develop a method call map to locate methods that are never called. On the graph you will find islands of method clusters without callers. This doesn't work for libraries since you need always start from some main entry point.

Statically checking a Java app for link errors

I have a scenario where I have code written against version 1 of a library but I want to ship version 2 of the library instead. The code has shipped and is therefore not changeable. I'm concerned that it might try to access classes or members of the library that existed in v1 but have been removed in v2.
I figured it would be possible to write a tool to do a simple check to see if the code will link against the newer version of the library. I appreciate that the code may still be very broken even if the code links. I am thinking about this from the other side - if the code won't link then I can be sure there is a problem.
As far as I can see, I need to run through the bytecode checking for references, method calls and field accesses to library classes then use reflection to check whether the class/member exists.
I have three-fold question:
(1) Does such a tool exist already?
(2) I have a niggling feeling it is much more complicated that I imagine and that I have missed something major - is that the case?
(3) Do you know of a handy library that would allow me to inspect the bytecode such that I can find the method calls, references etc.?
Thanks!
I think that Clirr - a binary compatibility checker - can help here:
Clirr is a tool that checks Java libraries for binary and source compatibility with older releases. Basically you give it two sets of jar files and Clirr dumps out a list of changes in the public api. The Clirr Ant task can be configured to break the build if it detects incompatible api changes. In a continuous integration process Clirr can automatically prevent accidental introduction of binary or source compatibility problems.
Changing the library in your IDE will result in all possible compile-time errors.
You don't need anything else, unless your code uses another library, which in turn uses the updated library.
Be especially wary of Spring configuration files. Class names are configured as text and don't show up as missing until runtime.
If you have access to the source code, you could just compile source against the new library. If it doesn't compile, you have definitely a problem. If it compiles you may still have a problem if the program uses reflection, some kind of IoC stuff like Spring etc.
If you have unit tests, then you may have a better change catch any linking errors.
If you have only have a .class file of the program, then I don't know any tools that would help besides decomplining class file to source and compiling source again against the new library, but that doesn't sound too healthy.
The checks you mentioned are done by the JVM/Java class loader, see e.g. Linking of Classes and Interfaces.
So "attempting to link" can be simply achieved by trying to run the application. Of course you could hoist the checks to run them yourself on your collection of .class/.jar files. I guess a bunch of 3rd party byte code manipulators like BCEL will also do similar checks for you.
I notice that you mention reflection in the tags. If you load classes/invoke methods through reflection, there's no way to analyse this in general.
Good luck!

Debugging java obfuscated code

We are going to obfuscate our project but don't want to lose the ability of remote debugging and hotswapping.
Is it possible? Which tools can handle this? I'd be happy with simple obfuscation - just renaming classes/methods/variables.
[Edited] We're using Intellij IDEA but wasn't able to find any plugin for this task.
We have the same kind of needs (simple obfuscation, need to debug later)
and we use ProGuard. It's a Java app, which can be integrated in an Ant task.
It can do a lot of things, but it's also fully tuneable. So you can keep your obfuscation simple. One of the options is to generate a "Symbol Correspondance Table", which allows you to retrive the non-obfucated code from the obfuscated one. (it keeps track that the variable xyz in the class qksdnqd is in fact myCuteVarName in the class MeaningfulClassName)
Edit: Obfuscation can be tricky. Some examples:
You can't change the name of your main method.
Do you use a classloader? Can it still retrieve the class after the obfuscation?
What about your ORM Mapping? Your Spring Context? (if any)
Edit2:
You can also see:
Do you obfuscate your commercial Java code?
See SD Java Obfuscator. It strips comments and whitespace, and renames all members/methods/class names that aren't public.
It also providew you with a map of how the code was obfuscated, e.g., for each symbol FOO obfuscated as XYZ, a map FOO->XYZ. This means if you get a backtrace mentioning XYZ, you can easily determine the original symbol FOO. Of course, since only you (the person doing the obfuscation) has this map, only you can do this.

How to manage multiple versions of same class file for different SDK targets?

This is for an Android application but I'm broadening the question to Java as I don't know how this is usually implemented.
Assuming you have a project that targets a specific SDK version. A new release of the SDK is backward incompatible and requires changing three lines in one class.
How is this managed in Java without duplicating any code(or by duplicating the least amount)?
I don't want to create two projects for only 3 lines that are different.
What I'm trying to achieve in the end is a single executable that'll work for both versions. In C/C++, you'd have a #define based on the version. How do I achieve the same thing in Java?
Edit: after reading the comments about the #define, I realized there were two issues I was merging into one:
So first issue is, how do I not
duplicate code ? What construct is there that is the equivalent of a
#define in C.
The second one is: is it possible
to bundle everything in the same
executable? (this is less of a
concern as the first one).
It depends heavily on the incompatibility. If it is simply behavior, you can check the java.version system property and branch the code accordingly (for three lines, something as simple as an if statement).
If, however, it is a lack of a class or something similar that will throw an error when the class is loaded or when the code gets closer to execution (not necessarily something you can void reasonably by checking before calling), then the solution gets a lot harder. The notion of having a separate version is the cleanest from a code point of view, but it does mean you have to distribute two versions.
Another solution is reflection. Don't reference the class directly, call it via reflection (test for the methods or classes to determine what environment you are currently running in and execute the methods). This is probably the "official" approach in that reflection exists to deal with classes that you don't have or don't know you will have at compile time. It is just being applied to libraries within the JDK. It gets very ugly very fast, however. For three lines of code, it's ok, but doing anything extensive is going to get bad.
The last thing I can think of is to write common denominator code - that is code that gets the job done in both, finding another way to do it that doesn't trigger the problematic class or method.
I would isolate the code that needs to be different in a separate class (or multiple classes if necessary), and include / exclude them when building the project for the different versions.
So i would have like src/java/org/myproj/Foo.java which is the common stuff, and then oldversion/java/org/myproj/Bar.java and newversion/java/org/myproj/Bar.java which is the different implementations of the class that uses changed api.
Then I either compile "src/java and oldversion/java" or "src/java and newversion/java".
Possibly a similar situation, I had a method which wasn't available in the previous version of the JDK but if it was there I wanted to call it, I didn't want to force people to use the more recent version though. I used reflection to look for the method, if it was there I called it, if it wasn't I didn't.
Pretty hacky but might give you what you want.
Addressing Java in general, I see two primary approaches.
1). Refactor the specific code to its own library. Have different versions of that library. Effectively your app is creating an abstaction above the different SDKs. Heavyweight for 3 lines of code, but perhaps quite reasonable for larger scale problems.
2). Injection using annotation. Write your own annotation processor to manage the appropriate injection. More work, but maybe more fun.
Separate changing code in different classes with the same interface. Place classes in the same jar. Use factory design pattern to instantiate one or another class depending on SDK version.

Categories