Altering value in a class file

Altering value in a class file - java

With ClassEditor I'm able to change values of constants but is it possible to alter the code where a value is being set?
Here's an example code that appears on the file:
this.varr[this.sval] = 11;
How can I edit the file so that instead of setting 11 as the value, I can set 33?

If you want to jump into reverse engineering you might want to look into BCEL(https://commons.apache.org/proper/commons-bcel/) or ASM (http://asm.ow2.org/)
Here's a little thing on bytecode a lot more online, http://www.javaworld.com/article/2077233/core-java/bytecode-basics.html
Basically the java will store a value in memory you need to get the bytecode and find out where memory is and store a different value instead.
Here's JBE it can be used for simple bytecode editing, http://set.ee/jbe/
Sometimes code can be protected, for example android java code is protected by proguard... Sometimes people use ZKM http://www.zelix.com/klassmaster/features.html
There are tools known as deobfuscators that try to reverse these types of protection. They are pretty good generally but commonly fail on while loops you should find these yourself if you need them.
Basically what obfuscators do is to push things from a normal position on a stack to somewhere else multiple times with goto function in bytecode... and they almost always mess up constant int values by using byte shifts...

Related

Is it possible to compare two .java files methods and fields in all cases?

I am currently taking a project management class and the professor gave this assignment to compare two .java files methods and fields in all cases programmatically. I don't think it's actually possible to do but maybe I am wrong!
The assignment spec is as following (its extremely ambiguous I know)
In this assignment, you are required to write a comparison tool for two
versions of a Java source file.
Your program takes as input two .java files representing those two versions
and reports the following atomic changes:
1. AM: Add a new method
2. DM: Delete a method
3. CM: Change the body of a method (note: you need to handle the case where a method is
relocated within the body of its class)
4. AF: Add a field
5. DF: Delete a field
6. CFI: Change the definition of an instance field initializer (including (i) adding an initialization to a
field, (ii) deleting an initialization of a field, (iii) making changes to the initialized value of a field,
and (iv) making changes to a field modifier, e.g., private to public)
So that's what I am working with and my approach was to use reflection as it allows you to do everything but detect differences in the method body.
I had considered the idea that you could create a parser but that seemed ridiculous, especially for a 3 credit undergrad class in project management. Tools like BeyondCompare don't list what methods or fields changed, just lines that are different so don't meet the requirements.
I turned in this assignment and pretty much the entire class failed it with the reason as "our code would not work for java files with external dependencies that are not given or with java files in different projects" - which is completely correct but also I'm thinking, impossible to do.
I am trying to nail down a concrete answer as to why this is not actually possible to do or learn something new about why this is possible so any insight would be great.

What you got wrong here is that you have started to examine the .class files (using reflection). Some of the information listed above is not even available at that stage (generics, in-lined functions). What you need to do is parsing the .java files as text. That is the only way to actually solve the problem. A very high-level solution could be writing a program that:
reads the files
constructs a specific object for each .java file containing all the informations that needs to be compared (name of the functions, name of the instance variables, etc)
compares the constructed objects (example: addedFunctions = functionsFromA.removeAll(functionsFromB)) to provide the requested results
Note: if this is an assignment, you should not be using solutions provided by anybody else, you need to do it on your own. Likely you will not get a single point if you use a library written by somebody else.

What happen if I manually changed the bytecode before running it?

I am little bit curious about that what happen if I manually changed something into bytecode before execution. For instance, let suppose assigning int type variable into byte type variable without casting or remove semicolon from somewhere in program or anything that leads to compile time error. As I know all compile time errors are checked by compiler before making .class file. So what happen when I changed byte code after successfully compile a program then changed bytecode manually ? Is there any mechanism to handle this ? or if not then how program behaves after execution ?
EDIT :-
As Hot Licks, Darksonn and manouti already gave correct satisfy answers.Now I just conclude for those readers who all seeking answer for this type question :-
Every Java virtual machine has a class-file verifier, which ensures that loaded class files have a proper internal structure. If the class-file verifier discovers a problem with a class file, it throws an exception. Because a class file is just a sequence of binary data, a virtual machine can't know whether a particular class file was generated by a well-meaning Java compiler or by shady crackers bent on compromising the integrity of the virtual machine. As a consequence, all JVM implementations have a class-file verifier that can be invoked on untrusted classes, to make sure the classes are safe to use.
Refer this for more details.

You certainly can use a hex editor (eg, the free "HDD Hex Editor Neo") or some other tool to modify the bytes of a Java .class file. But obviously, you must do so in a way that maintains the file's "integrity" (tables all in correct format, etc). Furthermore (and much trickier), any modification you make must pass muster by the JVM's "verifier", which essentially rechecks everything that javac verified while compiling the program.
The verification process occurs during class loading and is quite complex. Basically, a data flow analysis is done on each procedure to assure that only the correct data types can "reach" a point where the data type is assumed. Eg, you can't change a load operation to load a reference to a HashMap onto the "stack" when the eventual user of the loaded reference will be assuming it's a String. (But enumerating all the checks the verifier does would be a major task in itself. I can't remember half of them, even though I wrote the verifier for the IBM iSeries JVM.)
(If you're asking if one can "jailbreak" a Java .class file to introduce code that does unauthorized things, the answer is no.)

You will most likely get a java.lang.VerifyError:
Thrown when the "verifier" detects that a class file, though well formed, contains some sort of internal inconsistency or security problem.

You can certainly do this, and there are even tools to make it easier, like http://set.ee/jbe/. The Java runtime will run your modified bytecode just as it would run the bytecode emitted by the compiler. What you're describing is a Java-specific case of a binary patch.
The semicolon example wouldn't be an issue, since semicolons are only for the convenience of the compiler and don't appear in the bytecode.

Either the bytecode executes normally and performs the instructions given or the jvm rejects them.
I played around with programming directly in java bytecode some time ago using jasmin, and I noticed some things.
If the bytecode you edited it into makes sense, it will of coursse run as expected. However there are some bytecode patterns that are rejected with a VerifyError.
For the specific examble of out of bounds access, you can compile code with out of bounds just fine. They will get you an ArrayIndexOutOfBoundsException at runtime.
int[] arr = new int[20];
for (int i = 0; i < 100; i++) {
arr[i] = i;
}
However you can construct bytecode that is more fundamentally flawed than that. To give an example I'll explain some things first.
The java bytecode works with a stack, and instructions works with the top elements on the stack.
The stack naturally have different sizes at different places in the program but sometimes you might use a goto in the bytecode to cause the stack to look different depending on how you reached there.
The stack might contain object, int then you store the object in an object array and the int in an int array. Then you go on and from somewhere else in that bytecode you use a goto, but now your stack contains int, object which would result in an int being passed to an object array and vice versa.
This is just one example of things that could happen which makes your bytecode fundamentally flawed. The JVM detects these kinds of flaws when the class is loaded at runtime, and then emits a VerifyError if something dosen't work.

Best choice? Edit bytecode (asm) or edit java file before compiling

Goal
Detecting where comparisons between and copies of variables are made
Inject code near the line where the operation has happened
The purpose of the code: everytime the class is ran make a counter increase
General purpose: count the amount of comparisons and copies made after execution with certain parameters
2 options
Note: I always have a .java file to begin with
1) Edit java file
Find comparisons with regex and inject pieces of code near the line
And then compile the class (My application uses JavaCompiler)
2)Use ASM Bytecode engineering
Also detecting where the events i want to track and inject pieces into the bytecode
And then use the (already compiled but modified) class
My Question
What is the best/cleanest way? Is there a better way to do this?

If you go for the Java route, you don't want to use regexes -- you want a real java parser. So that may influence your decision. Mind, the Oracle JVM includes one, as part of their internal private classes that implement the java compiler, so you don't actually have to write one yourself if you don't want to. But decoding the Oracle AST is not a 5 minute task either. And, of course, using that is not portable if that's important.
If you go the ASM route, the bytecode will initially be easier to analyze, since the semantics are a lot simpler. Whether the simplicity of analyses outweighs the unfamiliarity is unknown in terms of net time to your solution. In the end, in terms of generated code, neither is "better".
There is an apparent simplicity of just looking at generated java source code and "knowing" that What You See Is What You Get vs doing primitive dumps of class files for debugging and etc., but all that apparently simplicity is there because of your already existing comfortability with the Java lanaguage. Once you spend some time dredging through byte code that, too, will become comfortable. Just a question whether it's worth the time to you to get there in the first place.

Generally it all depends how comfortable you are with either option and how critical is performance aspect. The bytecode manipulation will be much faster and somewhat simpler, but you'll have to understand how bytecode works and how to use ASM framework.
Intercepting variable access is probably one of the simplest use cases for ASM. You could find a few more complex scenarios in this AOSD'07 paper.
Here is simplified code for intercepting variable access:
ClassReader cr = ...;
ClassWriter cw = ...;
cr.accept(new MethodVisitor(cw) {
public void visitVarInsn(int opcode, int var) {
if(opcode == ALOAD) { // loading Object var
... insert method call
}
}
});

If it was me i'd probably use the ASM option.
If you need a tutorial on ASM I stumbled upon this user-written tutorial click here

Why are decompiled java programs not always directly compilable and what are the parts that are not?

So I am trying to make slight valid legal changes to a compiled java program. I am decompiling it using JD-GUI for Mac. For the most part the decompiled code is error free but there are some strange things like undeclared variables, multiple identical variable declarations and just some strange statements which are not readily compilable.
Some of the strange statements in the decompiled code are really puzzling. I have been having trouble with one switch statement in particular:
switch ($SWITCH_TABLE$PackageName$ClassName$InnerEnumName()[getPlatform().ordinal()])
Where PackageName.ClassName is the class this statement is in, and InnerEnumName is an inner enum within ClassName.
Also note that getPlatform() is a method in ClassName which returns an enum of type InnerEnumName
The weird part is when I just stripped this class of the problematic statements, compiled it, and inserted it back into the program, it started to work but had a few strange bugs.
For example when I changed the switch statement to
switch (getPlatform().ordinal())
it started hitting case 3 (the third case and the case for value 3) when it is supposed to hit case 4 (once again the fourth case as well as the case for value 4)

Decompiling is always going to be imperfect. The decompiler must take the bytecodes and reverse-engineer the original source, figuring out where loops are, what the loop controls are, etc. I would never expect it to be flawless for non-trivial programs.
In the case of the $ names, these are names generated internally in the process of "faking" inner classes (since the JVM doesn't actually support inner classes). The decompiler is apparently doing an imperfect job of figuring out what the inner classes are and appropriately naming them and the objects the compiler created to fake things out. Someone familiar with the bytecode format could probably sort things out fairly quickly, but, like the rest, it's non-trivial.
(In this particular case it appears that the compiler, for some reason, created a mapping table from inner enum values to some other values, and when you "stripped" the statement you lost that mapping.)
[I'll add that one big problem that decompilers have is that javac is such a moving target. In particular things like inner class implementations are being constantly tweaked, so what worked one week may fail the next, with the next +.001 version of the compiler.]

JD-GUI (JD?) has issues it seems. Try to find a better decompiler? Too bad jad's ancient - it used to be good.

At the risk of resurrecting an ancient question - by stripping out the array indirection on the ordinal, the meaning of the original switch is changed.
I wrote this up here :
http://www.benf.org/other/cfr/switch-on-enum.html
The salient bit is:
The first enum -> integer function that springs to mind is .ordinal(). However - there are a couple of problems with this:
The enum class we're switching on isn't fixed - we can't take a copy of the target ordinals for the case statements - someone might change the enum's definition!
Someone might even remove the field we use as a case label.
So we need a lookup function which isn't dependant on the ordinal of the enum value being fixed (i.e. resolves it at run time), and can cope with a field of the enum being removed.
Hence the array you stripped out - it's a runtime map between ordinals in the enum statement, and the location in your switch statement.
What's really interesting here is that it means Javac creates an extra inner class per switch-on-enum - Fun!

Give instructions to the Java parser/lexer

Is there any way to give instructions directly to the parser and lexar from the java code level? If not, how could one go about doing this at all?
The issue is that I want to have the parser evaluate a variable, back up, then assign the value of that variable as an Object name. Like this:
String s = "text";
SomeClass (s) = new SomeClass();
parser reads--> ok, s evaluates to be "text"...
parser backtracks, while holding "text" in memory and assigns "text" as the name of the new instance of SomeClass, such that one can now do this:
text.callSomeMethod();
I need to do this because I have to instantiate an arbitrary number of objects of SomeClass. Each one has to have a unique name, and it would be ideal to do something like this:
while (someArbitrarySet.hasNext()) {
String s = "token" + Math.random();
SomeClass (s) = new SomeClass();
(s).callSomeMethod();
}
I hope this makes sense...

What you're asking for is what some languages call MACROS. They're also sometimes known as preprocessor definitions, or simply "defines".
A decision was made to not have includes and macros and the like in Java because it introduces additional code maintenance concerns that the designers concluded was going to cause code that would not have been in the style they wanted.
However, just because it's not built into the compiler doesn't mean you couldn't add it to your build script.
As part of your build, you copy all files to a src-comp directory, and as you do, replace your tokens as they're defined.
I don't recommend doing it, but that doesn't mean it isn't possible.

What you describe (creating new named variables at runtime) is possible in interpreted languages like JavaScript, Lua, Bash, but not with a compiled language like Java. When the loop is executed, there is no source code there to manipulate, and all named variables have to be defined before.
Apart from this, your variables don't need a "unique" name, if you are using them sequentially (one after another), you could just as well write your loop as this:
while (someArbitrarySet.hasNext()) {
SomeClass sC = new SomeClass();
sC.callSomeMethod();
}
If you really need your objects at the same time, put them in some sort of data structure. The simplest would be an array, you could use a Collection (like an ArrayList) or a Map (like CajunLuke wrote), if you want to find them again by key.
In fact, an array (in Java) is nothing else than a collection of variables (all of the same type), which you can index by an int.
(And the scripting languages which allow creating new variables on runtime implement this also with some kind of map String → (anything), where this map is either method/script-local or belonging to some surrounding object.)
You wrote in a comment to the question (better add those things to the question itself, it has an "edit" button):
Without getting into too many details, I'm writing an application that runs within a larger program. Normally, the objects would get garbage-collected after I was done with them, but the larger program maintains them, thus the need for a unique name for each. If I don't give each a unique name, the old object will get overwritten, but it is still needed in the context of the greater program.
So, you want to retain the objects to avoid garbage collection? Use an array (or List or anything else).
The thing is, if you want your larger program to be able to use these objects, you somehow have to give them to this larger program anyway. And then this program would have to retain references to these objects, thereby avoiding garbage collection. So it looks you want to solve a problem which does not exist by means which do not exist :-)

Not really an answer to the question you asked, but a possible solution to your problem: using a map.
Map variables = new HashMap();
while (someArbitrarySet.hasNext()) {
String s = "token" + Math.random();
variables.put(s, new SomeClass());
variables.get(s).callSomeMethod();
}
That way, you can use the "variable name" as the keys into the map, and you can get by without messing with the lexer/parser.
I really hope there is a way to do specifically what you state in Java - it would be really cool.

No. That's not possible.
Even if you could I can't think on a way to invoke them, because there won't be compiling code that could successfully reference them.
So the options are the one described by CanjuLuke or to create your own java parser, probably using ANTRL sample Java grammar and hook what you need there.
Consider the map solution.

This is answered in How do you use Java 1.6 Annotation Processing to perform compile time weaving? .
In short, there is an annotation processing tool that allows you to extend java syntax, and create DSLs that compile to java annotations.
Under JDK 1.5 you had to use apt instead of javac, but under 1.6, these are affected by the -processor flag to javac. From javac -help:
-processor <class1>[<class2>,<class3>...]Names of the annotation processors to run; bypasses default discovery process
-processorpath <path> Specify where to find annotation processors

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.