How to prune a Java program

How to prune a Java program - java

Let's me start from what I want to do then raising some questions I have.
I want to develop a general Java program which is a superset of a number of programs (let's call them program variants). In particular, the general program has methods which are only used by one or more program variants (but not all). Given a particular configuration, I want to remove unnecessary methods and just keep the smallest set of methods for one program variant.
For example, I have a general program as below:
public class GeneralProgram {
// this method is common for all variants
public void method1() {};
// this method is specific to variant 1
public void method2() {};
// this method is specific to variant 2
public void method3() {};
}
Then after pruning the program based on configuration for variant 1, the result is
public class GeneralProgram {
// this method is common for all variants
public void method1() {};
// this method is specific to variant 1
public void method2() {};
}
It doesn't matter if the resulting class name is the same as the original one or not. I just want to prune the content of the class.
So, here are my questions:
Do you have any idea how to realize this except low level text processing?
I know that I can use aspectJ to disable/enable specific methods at runtime but what I really want to do is performing this task before deploying the program. Is there any technique in Java for this purpose?

It seems to me that the right solution here is to use some object oriented programming and layer your program:
base.jar contains:
package foo.base;
class GeneralProgram {
public void method1(){ }
}
var1.jar contains:
package foo.var1;
import foo.base.GeneralProgram;
class GeneralProgramVar1 extends GeneralProgram {
public void method2(){ }
}
var2.jar contains:
package foo.var2;
import foo.base.GeneralProgram;
class GeneralProgramVar2 extends GeneralProgram {
public void method3(){ }
}
Some deployments will have both base.jar and var1.jar, others will have base.jar and var2.jar. You'll have to mess with the classpaths a bit to resolve the dependencies.
If you can separate your variants well enough so that there are truly unused functions then you can use a compression utility like ProGuard to remove unused methods from the classes. You might find, however, that the effort required to reap the benefits of ProGuard are the same as the structure I recommend above.

#Mark Elliot's answer gives you a "right way" to do this.
There are a number of reasons why your way is not a good idea in general, and for Java applications in particular:
Java does not support this. Specifically, it does not support conditional compilation.
While source code preprocessors are sometimes used, mainstream Java tool chains don't support them. (Same for (hypothetical?) tools that operate at the bytecode level ... though that's not what you seem to be talking about.)
With conditionally compilation variants, it is easier for a change made in one variant to break another. (By contrast, a good O-O design will isolate variant-specific code to particular classes where they can't affect the behaviour of other variants.)
A codebase with rampant conditional compilation is much harder to understand.
Conditional compilation variants make testing more complicated. You basically have to treat each variant as a separate application that has to be tested separately. This makes writing tests more complicated, and running tests more expensive. (And testing the variants IS important because of the fragility of code bases that rely on conditional compilation; see previous.)
Test coverage analysis is harder / more work with variants because of tool issues; see previous.
In a comment the OP writes:
So it is not effective if I deploy unnecessary resources (e.g. methods, classes specific to other variants) for a particular variant.
What do you mean by "not effective"?
In most cases it simply does not matter that a code base includes functionality that is not used in certain use-cases or on certain platforms. Java applications use lots of memory, and code size is generally not the major cause of this. In short, in most cases it is "effective" to deploy code that won't be used: it does the job and the overheads don't really matter.
If you have one of those unusual applications where JAR file size or code memory usage is really significant (and not just a hypothetical issue), you still don't need to resort to conditional compilation or bytecode hacking.
If JAR file size is the critical issue, then there are tools that will strip out classes and methods that the tool determines will not be used; e.g. assuming that the application is started from a specified main method.
If memory usage is the critical issue, you can structure your code so that it uses dynamic loading to load variant, platform or even use-case specific classes.

Related

Conditionally Remove Java Methods at Compile-Time

I am trying to achieve something similar to the C# preprocessor. I am aware that Java does NOT have the same preprocessor capabilities, and am aware that there are ways to achieve similar results using design patterns such as Factory. However, I am still interested in finding a solution to this question.
Currently, what I do is create a class that contains several static final boolean attributes, such as the following example:
public class Preprocessor
{
public static final boolean FULLACCESS = false;
}
I then use this in the following manner:
public ClassName getClassName()
{
if(Preprocessor.FULLACCESS)
{
return this;
}
else
{
return this.DeepCopy();
}
}
So far so good, this solves my problem (the example above is trivial, but I do use this in other instances where it is helpful). My question is, would there be a way to place the conditional around an entire method, so that the method itself would be unavailable given the correct "Preprocessor" variables? For example, I would like to be able to make a specific constructor available only for packages that are given "Full Access", as follows:
public ClassName()
{
// do things
}
if(FULLACCESS)
{
public ClassName(ClassName thing)
{
// copy contents from thing to the object being created
}
}
Again, I am aware of the limitations (or design decisions) of Java as a language, and am aware that in most circumstances this is unnecessary. As a matter of fact, I have considered simply creating these "extra" methods and placing the entire code of them within a conditional, while throwing an Exception if the conditional is not active, but that is a very crude solution that does not seem helpful to my programmers when I make these libraries available to them.
Thank you very much in advance for any help.
Edit:
To complement the question, the reason why I am attempting to do this is that by using exceptions as a solution, the IDE would display methods as "available" when they are actually not. However, again, it might just be a case of my being ignorant of Java.
The reasons for my wanting to do this are primarily so that I may have more than one public interface available, say, one restrictive where control is tighter within the methods, and one more permissive where direct alteration of attributes is allowed. However, I do also want to be able to actively remove portions of code from the .class, for instance, in a Product Line development approach where certain variants are not available.
Edit2.:
Furthermore, it is important to note that I will be generating the documentation conditionally as well. Therefore, each compiled version of the packages would have its own documentation, containing only that which is actually available.

Well, you can make it happen. A word of caution, though...
I can only think of one time when I thought this kind of approach was the best way, and it turned out I was wrong. The case of changing a class's public interface especially looks like a red flag to me. Throwing an exception when the access level isn't high enough to invoke the method might be more code-friendly.
But anyway, when I thought I wanted a preprocessor, what I did was to write one. I created a custom annotation to place on conditionally-available methods, grabbed a Java parser and wrote a little program that used the parser to find and remove methods that have the annotation. Then add that (conditionally) to the build process.
Because it turned out to be useless to me, I discarded mine; and I've never seen anyone else do it and publish it; so as far as I know you'd have to roll your own.

This answer is based partially on the comments you have left on the question and on Mark's answer.
I would suggest that you do this using Java interfaces which expose just the API that you desire. When you need a less restrictive API contract, extend an interface or create a separate implementation of an existing interface to get what you need.
public interface A
{
void f();
}
A above is your general API. Now you want to have some special extra methods to test A or to debug it or manipulate it or whatever...
public interface B extends A
{
void specialAccess();
}
Also, Java now supports default method implementations for interfaces which might be useful to you depending on how you implement your API. They take the following form...
public interface A
{
List getList();
// this is still only an interface, but you have a default impl. here
default void add(Object o)
{
getList().add(o);
}
}
You can read more about default methods on Oracle's page about it here.
In your API, your general distribution of it could include A and omit B entirely, and omit any implementations that offer the special access; then you can include B and special implementations for the special access version of the API you mentioned. This would allow plain old Java objects, nothing different to the code other than an extra interface and maybe an extra implementation of it. The custom part would just be in your packaging of the library. If you want to hand someone a "non-special" low-access version, hand them a jar that does not include B and does not include any possible BImplementation, possibly by having a separate build script.
I use Netbeans for my Java work, and I like to let it use the default build scripts that it auto generates. So if I were doing this and I were doing it in Netbeans, I would probably create two projects, one for base API and one for special-access API, and I would make the special-access one dependent on the base project. That would leave me with two jars instead of one, but I would be fine with that; if two jars bothered me enough I would go through the extra step mentioned above of making a build script for the special access version.
Some examples straight from Java
Swing has examples of this kind of pattern. Notice that GUI components have a void paint(Graphics g). A Graphics gives you a certain set of functionality. Generally, that g is actually a Graphics2D, so you can treat it as such if you so desire.
void paint(Graphics g)
{
Graphics2d g2d = Graphics2d.class.cast(g);
}
Another example is with Swing component models. If you use a JList or a JComboBox to display a list of objects in a GUI, you probably do not use the default model it comes with if you want to change that list over time. Instead, you create a new model with added functionality and inject it.
JList list = new JList();
DefaultListModel model = new DefaultListModel();
list.setModel(model);
Now your JList model has extra functionality that is not normally apparent, including the ability to add and remove items easily.
Not only is extra functionality added this way, but the original author of ListModel did not even need to know that this functionality could exist.

the only way in Java to reach that is to use preprocessor, for instance PostgresJDBC team uses java comment preprocessor for such manipulations, here is example from their Driver.java
//#if mvn.project.property.postgresql.jdbc.spec >= "JDBC4.1"
#Override
public java.util.logging.Logger getParentLogger() {
return PARENT_LOGGER;
}
//#endif

With Gradle you can manage your sources and I think that no preprocessor macros are no longer needed. Right now in src directory you have main/java with all sources but if you need specific methods in e.g. debug and release builds to do / or not specific things then create debug/java and release/java in src and put YourClass there. Note that by doing this you'll have to have YourClass in debug/java and release/java but not in main/java.

Programatic code modification (e.g. variable extraction) in Java

I know it's possible to do nice stuff with Reflection, such as invoking methods, or altering the values of fields. Is it possible to do heavier code modification, though, at runtime and programmatically?
For instance, if I have a method:
public void foo(){
this.bar = 100;
}
Can I write a program that modifies the innards of this method, notices that it assigns a constant to a field, and turns it into the following:
public int baz = 100;
public void foo(){
this.bar = baz;
}
Perhaps Java isn't really the language to do this kind of thing in - if not, I'm open to suggestions for languages that would allow me to basically reparse or inspect code in this way, and be able to alter it so precisely. I might be pipe dreaming here though, so please tell me if this is the case also.

Just adding a suggestion from a friend - Apache Commons' BCEL looks excellent:
http://commons.apache.org/bcel/manual.html
The Byte Code Engineering Library (Apache Commons BCEL™) is intended to
give users a convenient way to analyze, create, and manipulate (binary)
Java class files (those ending with .class). Classes are represented by
objects which contain all the symbolic information of the given class:
methods, fields and byte code instructions, in particular.
Such objects can be read from an existing file, be transformed by a
program (e.g. a class loader at run-time) and written to a file again.
An even more interesting application is the creation of classes from
scratch at run-time. The Byte Code Engineering Library (BCEL) may be
also useful if you want to learn about the Java Virtual Machine (JVM)
and the format of Java .class files.

You are looking for software that allows you to do bytecode manipulation, there are several frameworks to achieve this, but the two most known currently are:
ASM
javassist
When performing bytecode modifications at runtime in Java classes keep in mind the following:
If you change a class's bytecode after a class has been loaded by a classloader, you'll have to find a way to reload it's class definition (either through classloading tricks, or using hotswap functionalities)
If you change the classes interface (example add new methods or fields) you will be able only to reach them through reflection.

It's probably fair to say that Java wasn't designed with this purpose in mind, but you can do it potentially. How and when depends a little on the ultimate aim of the exercise. A couple of options:
At the source code level, you can use the Java Compiler API to
compile arbitrary code into a class file (which you can then load).
At the bytecode level, you can write an agent that installs a
ClassFileTransformer to arbitrarily alter a class "on the fly"
as it is loaded. In practice, if you do this, you will also probably
make use of a library such as BCEL (Bytecode Engineering
Library) to make manipulating the class easier.

You want to investigate program transformation systems (PTS), which provide general facilities for parsing and transforming languages at the source level. PTS provide rewrite rules that say in effect, "if you see this pattern, replace it by that pattern" using the surface syntax of the target language. This is done using full parsers so the rewrite rule really operates on language syntax and not text; such rewrite rules obviously won't attempt to modify code-like text in comments, unlike tools based on regexps.
Our DMS Software Reengineering Toolkit is one of these. It provides not only the usual parsing, AST building and prettyprinting (reproducing compilable source code complete with comments), but also supports symbol tables and control and data flow analysis. These are needed for almost any interesting transformations. DMS also has front ends for a variety of dialects of Java as well as many other languages.
Bytecode transformers exist because they are much easier to build; it is pretty easy to "parse" bytecode. Of course, you can't make permanent source changes with a bytecode transformer, so it is lot less useful.

You mean like this?
String script1 = "println(\"OK!\");";
eval( script1 );
script1 += "println(\"... well, maybe NOT OK after all\");";
eval( script2 );
Output:
OK!
OK!
... well, maybe NOT OK after all
... use a scripting extension to Java. Groovy and other things like that would probably allow you to do what you want. I've written a scripting extension which integrates with Java through reflection almost seamlessly myself; contact me if you're interested in the details.

How do you organize class source code in Java?

By now my average class contains about 500 lines of code and about 50 methods.
IDE is Eclipse, where I turned “Save Actions” so that methods are sorted in alphabetical order, first public methods, and then private methods.
To find any specific method in the code I use “Quick Outline”. If needed, “Open Call Hierarchy” shows the sequence of methods as they called one by one.
This approach gives following advantages:
I can start typing new method without thinking where to place it in the code, because after save it will be placed by Eclipse to appropriate place automatically.
I always find public methods in the upper part of the code (don’t have to search the whole class for them)
However there are some disadvantages:
When refactoring large method into smaller ones I’m not very satisfied that new private methods are placed in different parts of code and therefore it’s little bit hard to follow the code concept. To avoid that, I name them in some weird way to keep them near each one, for example: showPageFirst(), showPageSecond() instead of showFirstPage(), showSecondPage().
May be there are some better approaches?

Organize your code for its audiences. For example, a class in a library might have these audiences:
An API client who wants more detail on how a public method works.
A maintainer who wants to find the relevant method to make a small change.
A more serious maintainer who wants to do a significant refactoring or add functionality.
For clients perusing the source code, you want to introduce core concepts. First we have a class doc comment that includes a glossary of important terms and usage examples. Then we have the code related to one term, then those related to another, then those related to a third.
For maintainers, any pair of methods that are likely to have to change together should be close by. A public method and its private helper and any constants related to it only should show up together.
Both of these groups of users are aided by grouping class members into logical sections which are separately documented.
For example, a collection class might have several mostly orthogonal concerns that can't easily be broken out into separate classes but which can be broken into separate sections.
Mutators
Accessors
Iteration
Serializing and toString
Equality, comparability, hashing

Well, naming your methods so that they'll be easier to spot in your IDE is really not good. Their name should reflect what they do, nothing more.
As an answer to your question, probably the best thing to do is to split you class into multiple classes and isolate groups of methods that have something in common in each of such classes. For example , if you have
public void largeMethodThatDoesSomething() {
//do A
//do B
//do C
}
which then you've refactored such that:
public void largeMethodThatDoesSomething() {
doA();
doB();
doC();
}
private void doA() {};
private void doB() {};
private void doC() {};
you can make a class called SomethingDoer where you place all these 4 metods and then use an instance of that class in your original class.

Don't worry about physically ordering your methods inside the class, if you can't see it just use Ctrl-O and start typing the method name and you will jump straight to it.
Having self-describing method names results in more maintainable code than artificially naming them to keep them in alphabetical order.
Hint: learn your shortcut keys and you will improve your productivity

Organizing the way you described sounds better than 99% of the Java code I have seen so far. However, on the other side, please make sure your classes don't grow too much and methods are not huge.
Classes should usually be less than 1000 lines and methods less than 150.

Best choice? Edit bytecode (asm) or edit java file before compiling

Goal
Detecting where comparisons between and copies of variables are made
Inject code near the line where the operation has happened
The purpose of the code: everytime the class is ran make a counter increase
General purpose: count the amount of comparisons and copies made after execution with certain parameters
2 options
Note: I always have a .java file to begin with
1) Edit java file
Find comparisons with regex and inject pieces of code near the line
And then compile the class (My application uses JavaCompiler)
2)Use ASM Bytecode engineering
Also detecting where the events i want to track and inject pieces into the bytecode
And then use the (already compiled but modified) class
My Question
What is the best/cleanest way? Is there a better way to do this?

If you go for the Java route, you don't want to use regexes -- you want a real java parser. So that may influence your decision. Mind, the Oracle JVM includes one, as part of their internal private classes that implement the java compiler, so you don't actually have to write one yourself if you don't want to. But decoding the Oracle AST is not a 5 minute task either. And, of course, using that is not portable if that's important.
If you go the ASM route, the bytecode will initially be easier to analyze, since the semantics are a lot simpler. Whether the simplicity of analyses outweighs the unfamiliarity is unknown in terms of net time to your solution. In the end, in terms of generated code, neither is "better".
There is an apparent simplicity of just looking at generated java source code and "knowing" that What You See Is What You Get vs doing primitive dumps of class files for debugging and etc., but all that apparently simplicity is there because of your already existing comfortability with the Java lanaguage. Once you spend some time dredging through byte code that, too, will become comfortable. Just a question whether it's worth the time to you to get there in the first place.

Generally it all depends how comfortable you are with either option and how critical is performance aspect. The bytecode manipulation will be much faster and somewhat simpler, but you'll have to understand how bytecode works and how to use ASM framework.
Intercepting variable access is probably one of the simplest use cases for ASM. You could find a few more complex scenarios in this AOSD'07 paper.
Here is simplified code for intercepting variable access:
ClassReader cr = ...;
ClassWriter cw = ...;
cr.accept(new MethodVisitor(cw) {
public void visitVarInsn(int opcode, int var) {
if(opcode == ALOAD) { // loading Object var
... insert method call
}
}
});

If it was me i'd probably use the ASM option.
If you need a tutorial on ASM I stumbled upon this user-written tutorial click here

Java Metaprogramming

I'm working on my first real project with Java. I'm beginning to get comfortable with the language, although I have more experience with dynamic languages.
I have a class that behave similar to the following:
class Single
{
public void doActionA() {}
public void doActionB() {}
public void doActionC() {}
}
And then I have a SingleList class that acts as a collection of these classes (specifically, it's for a 2D Sprite library, and the "actions" are all sorts of transformations: rotate, shear, scale, etc). I want to be able to do the following:
class SingleList
{
public void doActionA() {
for (Single s : _innerList) {
s.doActionA();
}
}
... etc ...
}
Is there any way to simply defer a method (or a known list of methods) to each member of the inner list? Any way without having to specifically list each method, then loop through each inner member and apply it manually?
To make things a bit harder, the methods are of varying arity, but are all of return type "void".

Unfortunately Java does not readily support class creation at runtime, which is what you need: the SingleList needs to be automatically updated with the necessary stub methods to match the Single class.
I can think of the following approaches to this issue:
Use Java reflection:
Pros:
It's readily available in the Java language and you can easily find documentation and examples.
Cons:
The SingleList class would not be compatible with the Single class interface any more.
The Java compiler and any IDEs are typically unable to help with methods called via reflection - errors that would be caught by the compiler are typically transformed into runtime exceptions.
Depending of your use case, you might also see a noticeable performance degradation.
Use a build system along with some sort of source code generator to automatically create the SingleList.java file.
Pros:
Once you set it up you will not have to deal with it any more.
Cons:
Setting this up has a degree of difficulty.
You would have to separately ensure that the SingleList class loaded in any JVM - or your IDE, for that matter - actually matches the loaded Single class.
Tackle this issue manually - creating an interface (e.g. SingleInterface) or a base abstract class for use by both classes should help, since any decent IDE will point out unimplemented methods. Proper class architecture would minimize the duplicated code and your IDE might be able to help with generating the boilerplate parts.
Pros:
There is no setup curve to get over.
Your IDE will always see the right set of classes.
The class architecture is usually improved afterwards.
Cons:
Everything is manual.
Use a bytecode generation library such as Javassist or BCEL to dynamically generate/modify the SingleList class on-the-fly.
Pros:
This method is extremely powerful and can save a lot of time in the long term.
Cons:
Using bytecode generation libraries is typically not trivial and not for the faint-hearted.
Depending on how you write your code, you may also have issues with your IDE and its handling of the dynamic classes.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.