How to decompile obfuscated java programs avoiding class/package name collisions

How to decompile obfuscated java programs avoiding class/package name collisions - java

I want to decompile a java program and recompile the derived (obfuscated) source. I unpacked the .jar archive and got a directory structure like that:
com/
com/foo/A/
com/foo/A/A.class
com/foo/A/B.Class
com/foo/B/A.class
...
com/foo/A.class
com/foo/B.class
org/foo/Bar.class
...
The problem is that there are name collisions between packages and classes, which makes it impossible to recompile the decompiled class files.
A decompiled class will look like this:
package org.foo;
import com.foo.A; // <-- name collision error
class Bar {
...
}
Is there any way to resolve those naming issues, without renaming the class files?
EDIT:
This is not a decompiler problem, but the question how it is possible to have a working .jar file with classes that violate naming conventions.
EDIT2:
Okay, i guess on bytecode level such naming is possible, so with a smarter decompiler (who automatically renames the classes and fixes their references) this problem could be solved.

Do you really need to unpack the entire jar and recompile everything? Instead of recompiling the entire decompiled source by itself, use the original jar as the classpath, and extract and recompile only those classes that you need to modify. Then, when you need to package up your recompiled code, just copy the original jar and use jar -uf to replace the modified class files in place:
jar -uf ./lib/copy_of_original_jar_file.jar -C ./bin com/foo/A.class com/foo/B.class [...]
...and ./lib/copy_of_original_jar_file.jar becomes your new library.
One thing is for sure, and that is that the original jar must work properly with a Java classloader in order for the program to run. It should work just as well for compiling your one-off .class files.
You should experience much fewer naming collision issues by using the original jar because you keep the same classpath scanning order that the running application would use. Not only that, but Java decompilers aren't perfect. By eliminating the majority of the decompiled code from recompilation, you avoid the majority of issues that decompilers have with things like exception handler overlaps, special characters in obfuscated symbols, variable scoping issues, etc.

Java's import mechanism provides a shorthand for naming things, but you obviously cannot use it when there are collisions. You can always use the fully qualified name in your code, e.g.
package org.foo;
class Bar {
private com.foo.Bar aDifferentBar;
...
}
EDIT:
I suppose there could be class files that comply with the JVM spec but which cannot be produced by a Java program that complies with the JLS spec. If so then you'll definitely need a smarter decompiler.

You can not import packages in Java, so why should this be a name collision? Which error message do you get from the compiler?
If there would be a name collision in the obfuscated code, the code would not run. So the decompiled code should be collision free.

Related

Files with the .SCL.lombok extension

When the lombok jar file is opened in Intellij, all files other than the annotations end with .SCL.lombok (e.g. HandleAccessors.SCL.lombok). I was just wondering what the reason for this was and how it's handled.

The reason for it
Lombok has a public API - the stuff you're supposed to interact with. That'd be, for example, the #lombok.Getter annotation. Those are just class files in that jar, the aim is simply: add that jar to your classpath and your IDE autocomplete dialogs and the like will automatically start suggesting these, as per design.
But, lombok also has lots of classes that just 'make it tick', these aren't meant for public consumption. Things like lombok.eclipse.HandleGetter, which is the implementation for handling the #Getter annotation inside the eclipse agent. There is no point or purpose to referring to this class anywhere, in any project - it's an internal lombok thing. If we just stuck that jar file into the jar, and you typed Handle and hit your IDE's autocomplete shortcut key, you'd still get the suggestion.
Similarly, we ship a few dependencies straight into lombok.jar - it's a 'shaded jar' (a jar with all deps included), though we don't have many, keeping lombok.jar a nice small size. Still, ASM (a bytecode manipulation library) is in it, and that is fairly popular.
The standard shading solution offered by most shading tools is to prefix something to the name. ASM's org.objectweb.asm.AnnotationVisitor class would become org.projectlombok.shading.org.objectweb.asm.AnnotationVisitor. Point is, your IDE doesn't know that, and if you ALSO use asm in your project (where you also use lombok), and you want AnnotationVisitor thus you type AnnV and hit cmd+space or whatnot, your IDE suggests both. That's ugly and we'd like to avoid this.
Hence, we built our own shader, and it works by not having class files in the first place. This way, IDEs and any other automated tool doesn't even know either our ASM classes, or our implementation details, even exists. The only files that such tools (such as your IDE) sees are the types you're meant to see: lombok.Builder, lombok.extern.slf4j.Slf4j, lombok.experimental.UtilityClass, etcetera.
How does it work
Java's classloader architecture is abstracted: You can make your own. The primitives offered by a class loader is simply this: "Convert this byte array containing bytecode (i.e. the contents of a class file) into a Class<?> definition", and the primitives that you're supposed to implement when you write your own classloader is twofold:
Here is a resource key, such as "/com/foo/load.png". Please provide me an InputStream with this data.
Here is a fully qualified class name, such as "com.foo.MyApp". Please provide me with a Class<?> instance representing it.
Out of the box, java ships with a default classloader. This default classloader answers these questions by checking your CLASSPATH - which can be provided in various ways (via the jar manifest's Class-Path entry, or via the -cp argument to the JVM executable, or the CLASSPATH environment variable), and scanning each entry on the classpath for the resource requested, capable of reading the file system as well as opening jar files.
But that's just a classloader. One implementation of the general principle that's baked into java. You can write your own. You can write a classloader that generates resources on the fly, or that loads them from a network.
Or, as lombok does, that loads them by opening its own jar and looking for .SCL.lombok files.
Thus, lombok works like this: When you launch it, the 'entrypoint' (the class containing public static void main - or in lombok's case, for javac mode it's the annotation processor entrypoint and for eclipse it's agentmain), we 'hide' it from you using some fancy trickery: agentmain does not need to be in a public class (it can't be .SCL.lombok files - our classloader isn't available yet, we need to bootstrap that up first!). annotation processors do have to be in a public class, but, it's a public class inside a package private class, thus, just about every IDE knows it's 'invisible' and won't show it, but javac's annotation runner accepts it.
From there, we register a classloader that is capable of loading classes by way of reading in an .SCL.lombok file, and this lets us hide everything else we want to hide.
I want to develop lombok and this is getting in the way!
No need; just clone our repo, run ant eclipse or ant intellij, and off you go. There is no way to extend lombok without first forking it; we'd like lombok to be able to be extensible without it, but that would be far more complicated than simply not doing the .SCL.lombok thing. Eclipse runs on top of equinox, a runtime modularization system, and making that work properly requires all sorts of stuff that would make 'just toss some extra handlers on the classpath' not a feasible route to extending lombok in the first place.

Is Java's import keyword for source files or binary files?

I know I can include a class or collection of classes in my Java project using the import statement.
For example, import java.io.utils.* imports (i.e. makes available for use in my Java program) all the classes in the java.io.utils package.
My question is, do the classes in an imported package need to be compiled? Or can packages also include uncompiled Java files? If it can be either, when can we use class files and when can we use Java files?

Import just means "make the imported classes available by their simple names" - you can remove imports entirely if you use fully-qualified names everywhere. It's definitely not like #include in C for example.
When you compile, if you try to refer to uncompiled code it will be compiled at that point, assuming the compiler can guess where to find the source code. The result never refers to uncompiled code, because the compiler needs to know what each type exposes.
As a complete example, construct the following file structure:
// src/foo/A.java
package foo;
import bar.*;
public class A {
public static void main(String[] args) {
B.sayHello();
}
}
// src/bar/B.java
package bar;
public class B {
public static void sayHello() {
System.out.println("Hello");
}
}
Then in the src directory, run:
javac foo/A.java
That will automatically compile bar/B.java - but wouldn't compile any other code that isn't referenced (potentially transitively).
I would strongly recommend against using this "compile on demand" behaviour anyway though - if you compile class A that depends on class B, it will compile B the first time, but after that if you change B and recompile A, the compiler won't recompile B. I would organize your code into appropriate projects, and always recompile a complete project at a time, adding a project's output directory to the classpath for a project that depends on it, rather than allowing compiling one project to recompile bits of another on demand.
(Note that this isn't talking about the incremental compilation that many IDEs support... that's a rather different matter, and is fine assuming it's been implemented properly.)

Unlike C, import in Java does not "copy" stuff. Packages in Java is simply a way of avoiding ambiguity. javax.swing.Timer and java.utils.Timer are different Timers. When you say import javax.swing.Timer, you are telling the compiler that you mean javax.swing.Timer, not any other Timer.
All these things that you can import comes from the JDK or some other libraries you're using, or they are created by you. The classes that fall into the former category is already compiled (.class). The classes you created are compiled as well, when you do javac. You can't refer to any uncompiled classes. Since they are uncompiled, the computer does not know they exist.
The reason why your IDE knows your packages and classes before you compile your code is because IDEs are smart. They compile your code before you even notice it.

As Java documentation reports.
You might have to set your CLASSPATH so that the compiler and the JVM can find the .class files for your types.
link: https://docs.oracle.com/javase/tutorial/java/package/summary-package.html

Do I always have to type package name in Java?

Today I started learning Java.
I saw that package automatic gets included in .Java file.
I was wondering if it always need to be included?

Consider specify a common package for all the types within a same project.
In Java is common to start a project with a specific package setting. A package creates a namespace to disambiguate the types that it includes, to play nicelly with other projects that may or may not be in the same classpath. Normally, the package is bound to a URL of the project.
Think of Java packages like C++ namespaces.
A huge project/product written in Java can depend on lots and lots of projects, each described in a different package.
Organizations like Apache have lots of projects, organized under a common package pattern: org.apache.<<name_of_the_project>>.
Consider starting your project with a package named: com.user3552670; or something like your personal site, so persons that will consume your project can relate to the creator.

Yes and no.
It's used to specify the package of the class, read more here.
You could create a class without a package, but your code will look bad..
They exists to avoid conflicts, example between your code and default java package.
If packages doesn't exists, you can't create a class named ArrayList because already exists in Java.
Some IDEs force the fact that, if your .java file is in com/a/b/c folder his package should be com/a/b/c (If i don't remember wrong, IntellIJ IDEA do that)

Yes and no.
It must be there, but the IDE takes care of it (I don't use Netbeans, but I'd bet that it can do it, too). When moving files between packages, it has to be updated, but again, the IDE does it all.

Use existing Java classes from Jython REPL?

I have a massive, unfamiliar Java codebase that I need to use in one of my projects, and unfortunately it's one of those situations where almost nothing is documented, and the very few things that are documented are of the "setFoo(Foo foo) - sets the foo." variety. So the documentation generated with javadoc is not as helpful as it could be.
I'm more of a Lisp and Python guy myself, so my first thought was that I could learn a lot by interactively playing with some of the relevant classes. Enter the Jython REPL. The problem is that I can't figure out how to set the...the whatever (classpath?) to use them. Assume that I have two directories containing the subdirectories containing the .java files: ~/project/foo/src/ and ~/project/bar/src/.
Thanks in advance.

It sounds like you first need to compile those Java classes (you've referenced src directories in your question).
Once you have classes compiled, you can reference them via the classpath.
e.g.
>>> import sys
>>> sys.path.append(r'C:\temp\sample.jar')
>>> from org.my.package import MyClass
More info in this document

Do Eclipse's Refactoring Tools Violate The Java Language Specification?

In Eclipse 3.5, say I have a package structure like this:
tom.package1
tom.package1.packageA
tom.package1.packageB
if I right click on an the tom.package1 package and go to Refactor->Rename, an option "Rename subpackages" appears as a checkbox. If I select it, and then rename tom.package1 to tom.red my package structure ends up like this:
tom.red
tom.red.packageA
tom.red.packageB
Yet I hear that Java's packages are not hierarchical. The Java Tutorials back that up (see the section on Apparent Hierarchies of Packages). It certainly seems like Eclipse is treating packages as hierarchical in this case.
I was curious why access specifiers couldn't allow/restrict access to "sub-packages" in a previous question because I KNEW I had seen "sub-packages" referenced somewhere before.
So are Eclipse's refactoring tools intentionally misleading impressionable young minds by furthering the "sub-package" myth? Or am I misinterpreting something here?

Eclipse can't possibly violate the JLS in this case, because it has nothing to do with compiling or running Java source or bytecode.
The refactoring tools behave as they do because that behaviour is useful to developers. The behaviour is useful to developers because, for many intents and purposes, we do treat packages as hierarchal (a.b.c has some kind of relationship with a.b, even if that relationship is not consistent from project to project). That doesn't mean Java treats them as hierarchal intrinsically.
One example where people treat packages as very hierarchal is in configuring a logging framework such as log4j. Again, it's not intrinsic to log4j, but that's how people use it in practice.

Java packages are not hierarchical in the sense that importing everything from package A does not import everything from package A.B.
However, Java packages do correspond directly to the directory structure on the file system, and directories are hierarchical. So Eclipse is doing the correct thing - it is renaming the directory, which automatically changes the name of the parent directory of the renamed directory's children (to state the very obvious).

even java itself has the concept of subpackage:
http://java.sun.com/j2se/1.5.0/docs/tooldocs/windows/java.html
java -ea[:<package name>"..." | :<class name> ]
Enable assertions. Assertions are disabled by default.
With no arguments, enableassertions or -ea enables assertions. With one argument ending in "...", the switch enables assertions in the specified package and any subpackages. If the argument is simply "...", the switch enables assertions in the unnamed package in the current working directory. With one argument not ending in "...", the switch enables assertions in the specified class.
If a single command line contains multiple instances of these switches, they are processed in order before loading any classes. So, for example, to run a program with assertions enabled only in package com.wombat.fruitbat (and any subpackages), the following command could be used:
java -ea:com.wombat.fruitbat... <Main Class>

Java's packages are not hierarchical, but Eclipse stores packages on your system's file structure.
tom.package1.packageA is represented on a Windows file system as tom/package1/packageA.
When you ask Eclipse to refactor a package name, you're asking Eclipse to change the name of the file system directory structure.
You can have packages in Eclipse like:
tom.package1.packageA
tom.package2.packageB
tom.package3.packageC
You'll just have different 2nd level file system directories.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to decompile obfuscated java programs avoiding class/package name collisions - java

You can not import packages in Java, so why should this be a name collision? Which error message do you get from the compiler? If there would be a name collision in the obfuscated code, the code would not run. So the decompiled code should be collision free.

Related

Files with the .SCL.lombok extension

Is Java's import keyword for source files or binary files?

Do I always have to type package name in Java?

Use existing Java classes from Jython REPL?

Do Eclipse's Refactoring Tools Violate The Java Language Specification?

Categories

Resources