Does GWT remove unused classes during compilation? - java

Suppose I have a project structure that looks roughly like this:
{module-package}.webapp
module.gwt.xml
{module-package}.webapp.client
Client.java
UsedByClient.java
NotUsedByClient.java
And the module.gwt.xml file has:
<source path='client'/>
<entry-point class='{module-package}.webapp.client.Client'/>
When I compile this project using GWT, how much of the Java code will be compiled into Javascript?
Is NotUsedByClient.java included, even though the entry point doesn't reference it?
Is UsedByClient.java fully or partially included? E.g. if it has method m() which isn't called by Client, will m be compiled or not?
The motivation is that unfortunately I'm working with a legacy codebase that has server-side code living alongside client-side code in the same package and it would be some work to separate them. The server-side code isn't used by the client, but I'm concerned that GWT might compile it to Javascript where someone might notice it and try to reverse engineer it.

All of the above and more happen:
unreferenced classes are removed
unreferenced methods and fields are removed
constants may be inlined
various operations on constants (like !, ==, +, &&, etc) may be simplified (based on some field always being null, or true, etc)
un-overridden methods may be made final...
...and final methods may be made static in certain situations (leading to smaller callsites, and no "this" reference inside that method)...
and small, frequently called static methods may be inlined
And this process repeats, with even more optimizations that I skipped, to further assist in removing code, both big and small. At the end, all classes, methods, fields, and local variables are renamed in a way to further reduce output size, including reordering methods in the output so that they are ordered by length, letting gzip more efficiently compress your content on the way to the client.
So while some aspects of your code could be reverse engineered (just like any machine code could be reverse engineered), code which isn't referenced won't be available, and code which is may not even be readable.

I somehow managed to stumble upon a 'deep dive' video presentation on the compiler by one of the GWT engineers which has an explanation: https://youtu.be/n-P4RWbXAT8?t=865
Key points:
One of the compiler optimizations is called Pruner and it will "Traverse all reachable code from entrypoint, delete everything else (uses ControlFlowAnalyzer)"
It is actually an essential optimization because without it, all GWT apps would need to include gwt-user.jar in its entirety, which would greatly increase app sizes.
So it seems the GWT compiler does indeed remove unused code.

Related

Tracking method implementation changes in class bytecode

I have some abstract project (let's call it The Project) bytecode (of it's every class) inside some kotlin code, and each class bytecode is stored as ByteArray; the task is to tell which specific methods in each class are being modified from build to build of The Project. In other words, there are two ByteArrays of a same class of The Project, but they belong to different versions of it, and I need to compare them accurate. A simple example. Let's assume we have a trivial class:
class Rst {
fun getjson(): String {
abc("""ss""");
return "jsonValid"
}
public fun abc(s: String) {
println(s)
}
}
It's bytecode is stored in oldByteCode. Now some changes happened to the class:
class Rst {
fun getjson(): String {
abc("""ss""");
return "someOtherValue"
}
public fun newMethod(s: String) {
println("it's not abc anymore!")
}
}
It's bytecode is stored in newByteCode.
That's the main goal: compare oldByteCode to newByteCode.
Here we have the following changes:
getjson() method had been changed;
abc() method had been removed;
newMethod() had been created.
So, a method is changed, if it's signature remains the same. If not, it's already some different method.
Now back to the actual problem. I have to know every method's exact status by it's bytecode. What I have at the moment is the jacoco analyzer, which parses class bytecode to "bundles". In these bundles I have hierarchy of packages, classes, methods, but only with their signatures, so I cant tell if a method's body has any changes. I can only track signature differences.
Are there any tools, libs to split class bytecode to it's methods bytecodes? With those I could, for example, calculate hashes and compare them. Maybe asm library has any deal with that?
Any ideas are welcome.
TL;DR you approach of just comparing bytecode or even hashes won’t lead to a reliable solution, in fact, there is no solution with a reasonable effort to this kind of problem at all.
I don’t know, how much of it applies to the Kotlin compiler, but as elaborated in Is the creation of Java class files deterministic?, Java compilers are not required to produce identical bytecode even if the same version is used to compile exactly the same source code. While they may have an implementation that tries to be as deterministic as possible, things change when looking at different versions or alternative implementations, as explained in Do different Java Compilers (where the vendor is different) produce different bytecode.
Even when we assume that the Kotlin compiler is outstandingly deterministic, even across versions, it can’t ignore the JVM evolution. E.g. the removal of the jsr/ret instructions could not be ignored by any compiler, even when trying to be conservative. But it’s rather likely that it will incorporate other improvements as well, even when not being forced¹.
So in short, even when the entire source code did not change, it’s not a safe bet to assume that the compiled form has to stay the same. Even with an explicitly deterministic compiler we would have to be prepared for changes when recompiling with newer versions.
Even worse, if one method changes, it may have an impact on the compiled form of others, as instructions refer to items of a constant pool whenever constants or linkage information are needed and these indices may change, depending on how the other methods use the constant pool. There’s also an optimized form for certain instructions when accessing one of the first 255 pool indices, so changes in the numbering may require changing the form of the instruction. This in turn may have an impact on other instructions, e.g. switch instructions have padding bytes, depending on their byte code position.
On the other hand, a simple change of a constant value used in only one method may have no impact on the method’s bytecode at all, if the new constant happened to end up at the same place in the pool than the old constant.
So, to determine whether the code of two methods does actually the same, there is no way around parsing the instructions and understanding their meaning to some degree. Comparing just bytes or hashes won’t work.
¹ to name some non-mandatory changes, the compilation of class literals changed, likewise string concatenation changed from using StringBuffer to use StringBuilder and changed again to use StringConcatFactory, the use of getClass() for intrinsic null checks changed to requireNonNull(…), etc. A compiler for a different language doesn’t have to follow, but no-one wants to be left behind…
There are also bugs to fix, like obsolete instructions, which no compiler would keep just to stay deterministic.

Adding immutable programming rules to the Java language within a program

I'm writing a program in Java. I find that reading and debugging code is easiest when the paradigm techniques are consistent, allowing me very quickly assume where and what a problem is.
Doing this has, as you might guess, made my programming much faster, and so I want to find a way to enforce these rules.
For example, lets say I have a method that makes changes to the state of an object, and returns a value. If the method is called outside of the class, I don't ever want to see it resolve inside parameter parentheses, like this:
somefunction(param1, param2, object.change_and_return());
Instead, I want it to be done like this:
int relevant_variable_name = object.change_and_return();
somefunction(param1, param2, relevant_variable_name);
Another example, is I want to create a base class that includes certain print methods, and I want all classes that are user defined to be derived from that base class, much in the way java has done so.
Within my objects, is there a way I can force myself (and anyone else) to adhere to these rules? Ie. if you try to run code that breaks the rules, it will terminate and return the custom error report. Also, if you write code that breaks the rules, the IDE (I use eclipse) will recognize it as an error, underline and call the appropriate javadoc?
For the check and underline violations part:
You can use PMD, it is a static code analyzer.
It has a default ruleset, and you can write custom rules matching what you need.
However your controls seem to be quite complex to express in "PMD language".
PMD is available in Eclipse Marketplace.
For the crash if not conform part
There see no easy way to do it.
Hard/complex ways could be:
Write a rule within PMD, run the analysis at compile time, parse the report (still at compile time) and return an error if your rule is violated.
Write a Java Agent doing the rule check and make it crash the VM if the rule is violated (not sure it is really feasable, agents are meant for instrumentation).
Use reflection anywhere in your code to load classes, and analyze loaded class against your rules and crash the VM if the rule is violated (seriously don't do this: the code would be ugly and the rule easily bypassable).

What happen if I manually changed the bytecode before running it?

I am little bit curious about that what happen if I manually changed something into bytecode before execution. For instance, let suppose assigning int type variable into byte type variable without casting or remove semicolon from somewhere in program or anything that leads to compile time error. As I know all compile time errors are checked by compiler before making .class file. So what happen when I changed byte code after successfully compile a program then changed bytecode manually ? Is there any mechanism to handle this ? or if not then how program behaves after execution ?
EDIT :-
As Hot Licks, Darksonn and manouti already gave correct satisfy answers.Now I just conclude for those readers who all seeking answer for this type question :-
Every Java virtual machine has a class-file verifier, which ensures that loaded class files have a proper internal structure. If the class-file verifier discovers a problem with a class file, it throws an exception. Because a class file is just a sequence of binary data, a virtual machine can't know whether a particular class file was generated by a well-meaning Java compiler or by shady crackers bent on compromising the integrity of the virtual machine. As a consequence, all JVM implementations have a class-file verifier that can be invoked on untrusted classes, to make sure the classes are safe to use.
Refer this for more details.
You certainly can use a hex editor (eg, the free "HDD Hex Editor Neo") or some other tool to modify the bytes of a Java .class file. But obviously, you must do so in a way that maintains the file's "integrity" (tables all in correct format, etc). Furthermore (and much trickier), any modification you make must pass muster by the JVM's "verifier", which essentially rechecks everything that javac verified while compiling the program.
The verification process occurs during class loading and is quite complex. Basically, a data flow analysis is done on each procedure to assure that only the correct data types can "reach" a point where the data type is assumed. Eg, you can't change a load operation to load a reference to a HashMap onto the "stack" when the eventual user of the loaded reference will be assuming it's a String. (But enumerating all the checks the verifier does would be a major task in itself. I can't remember half of them, even though I wrote the verifier for the IBM iSeries JVM.)
(If you're asking if one can "jailbreak" a Java .class file to introduce code that does unauthorized things, the answer is no.)
You will most likely get a java.lang.VerifyError:
Thrown when the "verifier" detects that a class file, though well formed, contains some sort of internal inconsistency or security problem.
You can certainly do this, and there are even tools to make it easier, like http://set.ee/jbe/. The Java runtime will run your modified bytecode just as it would run the bytecode emitted by the compiler. What you're describing is a Java-specific case of a binary patch.
The semicolon example wouldn't be an issue, since semicolons are only for the convenience of the compiler and don't appear in the bytecode.
Either the bytecode executes normally and performs the instructions given or the jvm rejects them.
I played around with programming directly in java bytecode some time ago using jasmin, and I noticed some things.
If the bytecode you edited it into makes sense, it will of coursse run as expected. However there are some bytecode patterns that are rejected with a VerifyError.
For the specific examble of out of bounds access, you can compile code with out of bounds just fine. They will get you an ArrayIndexOutOfBoundsException at runtime.
int[] arr = new int[20];
for (int i = 0; i < 100; i++) {
arr[i] = i;
}
However you can construct bytecode that is more fundamentally flawed than that. To give an example I'll explain some things first.
The java bytecode works with a stack, and instructions works with the top elements on the stack.
The stack naturally have different sizes at different places in the program but sometimes you might use a goto in the bytecode to cause the stack to look different depending on how you reached there.
The stack might contain object, int then you store the object in an object array and the int in an int array. Then you go on and from somewhere else in that bytecode you use a goto, but now your stack contains int, object which would result in an int being passed to an object array and vice versa.
This is just one example of things that could happen which makes your bytecode fundamentally flawed. The JVM detects these kinds of flaws when the class is loaded at runtime, and then emits a VerifyError if something dosen't work.

Java super-tuning, a few questions

Before I ask my question can I please ask not to get a lecture about optimising for no reason.
Consider the following questions purely academic.
I've been thinking about the efficiency of accesses between root (ie often used and often accessing each other) classes in Java, but this applies to most OO languages/compilers. The fastest way (I'm guessing) that you could access something in Java would be a static final reference. Theoretically, since that reference is available during loading, a good JIT compiler would remove the need to do any reference lookup to access the variable and point any accesses to that variable straight to a constant address. Perhaps for security reasons it doesn't work that way anyway, but bear with me...
Say I've decided that there are some order of operations problems or some arguments to pass at startup that means I can't have a static final reference, even if I were to go to the trouble of having each class construct the other as is recommended to get Java classes to have static final references to each other. Another reason I might not want to do this would be... oh, say, just for example, that I was providing platform specific implementations of some of these classes. ;-)
Now I'm left with two obvious choices. I can have my classes know about each other with a static reference (on some system hub class), which is set after constructing all classes (during which I mandate that they cannot access each other yet, thus doing away with order of operations problems at least during construction). On the other hand, the classes could have instance final references to each other, were I now to decide that sorting out the order of operations was important or could be made the responsibility of the person passing the args - or more to the point, providing platform specific implementations of these classes we want to have referencing each other.
A static variable means you don't have to look up the location of the variable wrt to the class it belongs to, saving you one operation. A final variable means you don't have to look up the value at all but it does have to belong to your class, so you save 'one operation'. OK I know I'm really handwaving now!
Then something else occurred to me: I could have static final stub classes, kind of like a wacky interface where each call was relegated to an 'impl' which can just extend the stub. The performance hit then would be the double function call required to run the functions and possibly I guess you can't declare your methods final anymore. I hypothesised that perhaps those could be inlined if they were appropriately declared, then gave up as I realised I would then have to think about whether or not the references to the 'impl's could be made static, or final, or...
So which of the three would turn out fastest? :-)
Any other thoughts on lowering frequent-access overheads or even other ways of hinting performance to the JIT compiler?
UPDATE: After running several hours of test of various things and reading http://www.ibm.com/developerworks/java/library/j-jtp02225.html I've found that most things you would normally look at when tuning e.g. C++ go out the window completely with the JIT compiler. I've seen it run 30 seconds of calculations once, twice, and on the third (and subsequent) runs decide "Hey, you aren't reading the result of that calculation, so I'm not running it!".
FWIW you can test data structures and I was able to develop an arraylist implementation that was more performant for my needs using a microbenchmark. The access patterns must have been random enough to keep the compiler guessing, but it still worked out how to better implement a generic-ified growing array with my simpler and more tuned code.
As far as the test here was concerned, I simply could not get a benchmark result! My simple test of calling a function and reading a variable from a final vs non-final object reference revealed more about the JIT than the JVM's access patterns. Unbelievably, calling the same function on the same object at different places in the method changes the time taken by a factor of FOUR!
As the guy in the IBM article says, the only way to test an optimisation is in-situ.
Thanks to everyone who pointed me along the way.
Its worth noting that static fields are stored in a special per-class object which contains the static fields for that class. Using static fields instead of object fields are unlikely to be any faster.
See the update, I answered my own question by doing some benchmarking, and found that there are far greater gains in unexpected areas and that performance for simple operations like referencing members is comparable on most modern systems where performance is limited more by memory bandwidth than CPU cycles.
Assuming you found a way to reliably profile your application, keep in mind that it will all go out the window should you switch to another jdk impl (IBM to Sun to OpenJDK etc), or even upgrade version on your existing JVM.
The reason you are having trouble, and would likely have different results with different JVM impls lies in the Java spec - is explicitly states that it does not define optimizations and leaves it to each implementation to optimize (or not) in any way so long as execution behavior is unchanged by the optimization.

Any Java counterpart for `/usr/bin/strip`?

Is there any tool that can remove debug info from Java .class files, just like /usr/bin/strip can from C/C++ object files on Linux?
EDIT: I liked both Thilo's and Peter Mmm's answers: Peter's was short and to the point exposing my ignorance of what ships with JDK; Thilo's ProGuard suggestion is something I'll definitely be checking out anyway for all those extra features it appears to provide. Thank you Thilo and Peter!
ProGuard (which the Android SDK for example ships with to reduce code size), can do all kinds of manipulation to shrink JAR files:
Evaluate constant expressions.
Remove unnecessary field accesses and method calls.
Remove unnecessary branches.
Remove unnecessary comparisons and instanceof tests.
Remove unused code blocks.
Merge identical code blocks.
Reduce variable allocation.
Remove write-only fields and unused method parameters.
Inline constant fields, method parameters, and return values.
Inline methods that are short or only called once.
Simplify tail recursion calls.
Merge classes and interfaces.
Make methods private, static, and final when possible.
Make classes static and final when possible.
Replace interfaces that have single implementations.
Perform over 200 peephole optimizations, like replacing ...*2 by ...<<1.
Optionally remove logging code.
They do not mention removing debug info in that list, but I guess they can also do that.
Update: Yes, indeed:
By default, compiled bytecode still contains a lot of debugging information: source file names, line numbers, field names, method names, argument names, variable names, etc. This information makes it straightforward to decompile the bytecode and reverse-engineer entire programs. Sometimes, this is not desirable. Obfuscators such as ProGuard can remove the debugging information and replace all names by meaningless character sequences, making it much harder to reverse-engineer the code. It further compacts the code as a bonus. The program remains functionally equivalent, except for the class names, method names, and line numbers given in exception stack traces.

Categories