I am trying to obfuscate a spring web application using ProGuard. I want to keep class and method names, especially the ones used as spring beans.
But ProGuard renames local variables to local[class name], for example if I have a User object it renames the local variable to localUser. It also renames method parameters to param[Class name], for example if I have a User parameter the variable name in obfuscated method becomes paramUser. So the obfuscated code becomes pretty readable.
I want to prevent ProGuard using local and param prefixes and class names. For example I want it to use x1 instead of localUser. I checked configuration options but I could not find how to do that.
ProGuard manual > Troubleshooting > Unexpected observations after processing > Variable names not being obfuscated
If the names of the local variables and parameters in your obfuscated
code don't look obfuscated, because they suspiciously resemble the
names of their types, it's probably because the decompiler that you
are using is coming up with those names. ProGuard's obfuscation step
does remove the original names entirely, unless you explicitly keep
the LocalVariableTable or LocalVariableTypeTable attributes.
The variable x1 isn't giving away any more information than paramUser, given that the viewed code would be:
public void foo(User x1)
{
...
}
Unless your methods are really long, it wouldn't be hard for anyone reading the method to remember that it's a parameter of type User, which is all that paramUser is saying. Yes, there's a bit of a difference in readability but I wouldn't say it's worth worrying about, personally - if someone's investing enough time to decompile your code to start with, a very small difference like that would be unlikely to deter them. If the class names were obfuscated as well, that makes a bigger difference IMO.
The naming scheme, you are describing, looks like the names regenerated by JD when the LocalVariableTable has been skipped by a Java compiler (see javac -g:var). For me, this is not a bug of ProGuard.
To make more efficient the obfuscation of your applications,
try to replace "protected" by "private" each time that is possible : ProGuard will replace the class, method and field names by short names,
try to use anonymous classes in your code,
and try to split your algoritms in a large number of classes to complexify the understanding of the execution flows.
Related
Suppose I have a project structure that looks roughly like this:
{module-package}.webapp
module.gwt.xml
{module-package}.webapp.client
Client.java
UsedByClient.java
NotUsedByClient.java
And the module.gwt.xml file has:
<source path='client'/>
<entry-point class='{module-package}.webapp.client.Client'/>
When I compile this project using GWT, how much of the Java code will be compiled into Javascript?
Is NotUsedByClient.java included, even though the entry point doesn't reference it?
Is UsedByClient.java fully or partially included? E.g. if it has method m() which isn't called by Client, will m be compiled or not?
The motivation is that unfortunately I'm working with a legacy codebase that has server-side code living alongside client-side code in the same package and it would be some work to separate them. The server-side code isn't used by the client, but I'm concerned that GWT might compile it to Javascript where someone might notice it and try to reverse engineer it.
All of the above and more happen:
unreferenced classes are removed
unreferenced methods and fields are removed
constants may be inlined
various operations on constants (like !, ==, +, &&, etc) may be simplified (based on some field always being null, or true, etc)
un-overridden methods may be made final...
...and final methods may be made static in certain situations (leading to smaller callsites, and no "this" reference inside that method)...
and small, frequently called static methods may be inlined
And this process repeats, with even more optimizations that I skipped, to further assist in removing code, both big and small. At the end, all classes, methods, fields, and local variables are renamed in a way to further reduce output size, including reordering methods in the output so that they are ordered by length, letting gzip more efficiently compress your content on the way to the client.
So while some aspects of your code could be reverse engineered (just like any machine code could be reverse engineered), code which isn't referenced won't be available, and code which is may not even be readable.
I somehow managed to stumble upon a 'deep dive' video presentation on the compiler by one of the GWT engineers which has an explanation: https://youtu.be/n-P4RWbXAT8?t=865
Key points:
One of the compiler optimizations is called Pruner and it will "Traverse all reachable code from entrypoint, delete everything else (uses ControlFlowAnalyzer)"
It is actually an essential optimization because without it, all GWT apps would need to include gwt-user.jar in its entirety, which would greatly increase app sizes.
So it seems the GWT compiler does indeed remove unused code.
I am currently taking a project management class and the professor gave this assignment to compare two .java files methods and fields in all cases programmatically. I don't think it's actually possible to do but maybe I am wrong!
The assignment spec is as following (its extremely ambiguous I know)
In this assignment, you are required to write a comparison tool for two
versions of a Java source file.
Your program takes as input two .java files representing those two versions
and reports the following atomic changes:
1. AM: Add a new method
2. DM: Delete a method
3. CM: Change the body of a method (note: you need to handle the case where a method is
relocated within the body of its class)
4. AF: Add a field
5. DF: Delete a field
6. CFI: Change the definition of an instance field initializer (including (i) adding an initialization to a
field, (ii) deleting an initialization of a field, (iii) making changes to the initialized value of a field,
and (iv) making changes to a field modifier, e.g., private to public)
So that's what I am working with and my approach was to use reflection as it allows you to do everything but detect differences in the method body.
I had considered the idea that you could create a parser but that seemed ridiculous, especially for a 3 credit undergrad class in project management. Tools like BeyondCompare don't list what methods or fields changed, just lines that are different so don't meet the requirements.
I turned in this assignment and pretty much the entire class failed it with the reason as "our code would not work for java files with external dependencies that are not given or with java files in different projects" - which is completely correct but also I'm thinking, impossible to do.
I am trying to nail down a concrete answer as to why this is not actually possible to do or learn something new about why this is possible so any insight would be great.
What you got wrong here is that you have started to examine the .class files (using reflection). Some of the information listed above is not even available at that stage (generics, in-lined functions). What you need to do is parsing the .java files as text. That is the only way to actually solve the problem. A very high-level solution could be writing a program that:
reads the files
constructs a specific object for each .java file containing all the informations that needs to be compared (name of the functions, name of the instance variables, etc)
compares the constructed objects (example: addedFunctions = functionsFromA.removeAll(functionsFromB)) to provide the requested results
Note: if this is an assignment, you should not be using solutions provided by anybody else, you need to do it on your own. Likely you will not get a single point if you use a library written by somebody else.
I have a requirement like to scan a directory of java(POJO) files, go through each among them, and find out the corresponding variables defined in those POJO's, and to check whether it is having the correct getter and setter name. For Eg:- if empName is the variable name, then it should have getter as getEmpName() and not getempName().
This is because our J2EE application which was build on long time back started failing because of the use of invalid getters and setters, which is not recognizable with the front end technologies.
I have done a basic program in which this can be determined. My exact problem is like on what basis can i identify a variable in a line. In my logic i have written assuming the third word in a line which contains private keyword will be the variable name. just want to know whether this approach is right or do i need to try something different, as it seems the requirement is very generic.
Trying to scan the source files yourself will be painful and involve a lot of edge cases etc.
For example the qualifiers on variables can be in any order, there can be multiple ones. Array brackets can be before or after the variable name, variable's may or may not be being initialized, etc. Some may be commented out or in an inner class.
Your best approach will be to use reflection and scan the objects using that.
Reflection is what allows running Java code to find out about itself. You can write a small program and add the code to scan as libraries for that program. The program can then scan through the classes in those Jars, and for each one use reflection to query the list of methods and variables within.
http://docs.oracle.com/javase/tutorial/reflect/
You are forgetting that variables can have more qualifiers than just the visibility qualifier:
private transient volatile int someVariable;
is valid syntax. It is a private variable which is not serialized and which is shared between threads.
It is also possible to have no visibility-qualifier, which results in a package-private variable (can be accessed by classes in the same package but not from classes in other packages).
int otherVariable;
What you can rely on is that the variable name itself is always followed by 0-n whitespaces and a = or a ;. Unless it is an array, but exposing arrays with simple getters and setters is usually not a good idea.
Method names are always followed by 0-n whitespaces and a (.
Most Of IDE's(Eclipse, Netbeans and IntelliJ IDEA) are having plugins for quality tools(Checkstyle, PMD and FindBug).
Externals tools like SONAR, FISHEYE are also you can use.
Kindly check this link for PMD startup.
We are using Sonar to review our codebase. There are few violations for Unused private method, Unused private field and Unused local variable.
As per my understanding private methods and private fields can be accessed outside of the class only through reflection and Java Native Interface. We are not using JNI in our code base, but using reflection in some places.
So what we are planning is to do a complete workspace search for these methods and fields and if these are not used anywhere even through reflection, then these will be commented out. Again chances for accessing private methods and fields through reflection are very less. This is for safer side.
Unused local variables can’t be accessed outside of the method. So we can comment out these.
Do you have any other suggestions about this?
I love reflection myself, but to put it in a few words: it can be a nightmare. Keep java reflection to a very controlable (that is, stateless, no global/external variable usage) and minimal scope.
What to look for?
To find private fields and methods turned public, look for Field#setAccessible() and Method#setAccessible(), such as the examples below:
Field privateNameField = Person.class.getDeclaredField("name");
privateNameField.setAccessible(true);
Method privatePersonMethod = Person.class.getDeclaredMethod("personMeth", null);
privatePersonMethod.setAccessible(true);
So, setAccessible() will get you some smoke, but getDeclaredField() and getDeclaredMethod() are really where the fields are accessed (what really makes the fire).
Pay special attention to the values used in them, specially if they are variables (they probably will be), as they are what determine the field accessed.
Do a plain text search
Also, doing a plain text search for the field/method name on the whole project folder is very useful. I'd say, if you are not sure, don't delete before doing a full text search.
If you have many other projects that depend on this one you are trying to change; and if you weren't (or didn't know) the guy who planted those (bombs), I'd let it go. Only would change if really really needed to. The best action would be to get them one by one when you need to make a change to a code around it.
Ah, and, if you have them, running tests with code coverage can also help you big time in spotting unused code.
Calling an unused method via reflection is just weird. And unused fields are could only be used as a deposit via reflection, and used via reflection. Weird too.
Reflection is more in use as a generic bean copying tool.
So a radical clean-up should be absolutely unproblematic. It would be time better spent to look into the usages of java.reflect; whether the reflection code is legitimate. That is more intelligent work than looking for usage of private fields in strings.
And yes, remove it from the source code, which speeds up reading by seconds.
(Of course I understood that this a question of the type: did I oversee something.)
Is there any tool that can remove debug info from Java .class files, just like /usr/bin/strip can from C/C++ object files on Linux?
EDIT: I liked both Thilo's and Peter Mmm's answers: Peter's was short and to the point exposing my ignorance of what ships with JDK; Thilo's ProGuard suggestion is something I'll definitely be checking out anyway for all those extra features it appears to provide. Thank you Thilo and Peter!
ProGuard (which the Android SDK for example ships with to reduce code size), can do all kinds of manipulation to shrink JAR files:
Evaluate constant expressions.
Remove unnecessary field accesses and method calls.
Remove unnecessary branches.
Remove unnecessary comparisons and instanceof tests.
Remove unused code blocks.
Merge identical code blocks.
Reduce variable allocation.
Remove write-only fields and unused method parameters.
Inline constant fields, method parameters, and return values.
Inline methods that are short or only called once.
Simplify tail recursion calls.
Merge classes and interfaces.
Make methods private, static, and final when possible.
Make classes static and final when possible.
Replace interfaces that have single implementations.
Perform over 200 peephole optimizations, like replacing ...*2 by ...<<1.
Optionally remove logging code.
They do not mention removing debug info in that list, but I guess they can also do that.
Update: Yes, indeed:
By default, compiled bytecode still contains a lot of debugging information: source file names, line numbers, field names, method names, argument names, variable names, etc. This information makes it straightforward to decompile the bytecode and reverse-engineer entire programs. Sometimes, this is not desirable. Obfuscators such as ProGuard can remove the debugging information and replace all names by meaningless character sequences, making it much harder to reverse-engineer the code. It further compacts the code as a bonus. The program remains functionally equivalent, except for the class names, method names, and line numbers given in exception stack traces.