I know (at least using either BCEL, or ASM, for instance), it is possible to somehow access local variables of a method... but, I need something more, what I would like is:
to get the type of such a variable (or a way to convert from the signature)
to know (distinguish) when this variable is used (either sees it value affected, or is passed as parameter)
when this variable is used as parameter, to know which method call it was passed to
to break "method-chains" in their respective method calls and get their return value so I can manipulate them
The basic idea is that I would like to "instrument" methods a bit in the same way a debugger does (though limited to the first frame depth...).
Any pointer appreciated.
If more information need, feel free to ask.
This is only possible using a byte code-level API. cglib does not expose such an API such that you have to choose between ASM, BCEL and Javassist where I would recommend you ASM which has the best documentation.
What you would need to do:
Parse the signature of the method, ASM offers utilities for that. You would get any type by its internal name. You would need to map these names to their index.
Find any use of the variable that is used from that index.
This is however a quite difficult task. In order to predict your code, you would have to emulate the method invocation. The JVM is a stack machine, arguments can be placed on the operand stack as a result of an arbitrary chain of commands. Therefore, you would effectively have to interpret any byte code instruction that you find. You will, more or less, need to write your own simplistic interpreter what is quite a task.
Related
I wish to create a custom remote execution client for my app. The client may look something like this:
interface Client {
<T> T computeRemotely(Function<List<MyBigObject>, T> consumer)
}
and might be used like this:
Client client = new Client();
Integer remoteResult = client.computeRemotely(list -> {
Integer result = -1;
// do some computational work here.
return result;
});
This means I somehow need to take lambda from the client, send it to the server, run the function (passing in a real List<MyBigObject>) and send the result back.
It's worth noting that a restriction on using my client library is that you cannot use anything outside the JDK in that lambda and expect it to work (as the classes may not be on the classpath on the server)...but I would like them to be able to use any of the JDK classes to bring in their own data to the calculation.
Now I can't just serialize the Function<MyBigObject, T> lambda because it serializes like an inner client of whatever class the lambda exists in which will NOT be on the classpath on the server.
So I have been looking at ASM to see whether that could work. Given that I have never done byte code manipulation before, I just wanted to check that what I am saying sounds right:
I can use ASM to convert to read the class that the lambda sits in.
Using a Method Visitor, get the method bytes, send them to the server
Use ASM to create an instance from the bytes and execute it.
Given that the lambda is like an anonymous inner class, I am guessing I will have to do some sort of method renaming in there too..
Is this roughly correct or have I got completely the wrong end of the stick?
Note that lambdas can access all immutable values from their context. Thus, you'd need to either forbid accessing external values (which would severely limit the usefulness of your solution) or identify them and send representations of those values (which runs into the problem you mentioned; their implementation may be outside the server classpath).
So even if you send the method representation (for which you would not even need ASM; you could get the Resource directly from the classloader), it won't work for the general case.
Edit: Given your comment, it could work. You'd need to synthesize a class with
The context attributes as final fields
A constructor with arguments for all the fields (you will passed the deserialized values there on construction)
The lambda method — see this question for details
There is no point in analyzing the runtime generated classes of lambda expressions. They have a very simple structure which they reveal when being serialized. During Serialization they will get replaced by a SerializedLambda instance which contains all information you could ever gather, most notably:
The implemented interface
The target method that will be invoked
The captured values
The crucial point is the target method. For lambda expressions (unlike method references) the target method is a synthetic method that resides within the class which contains the lambda expression. That method is private, by the way, that’s why an attempt to replicate the class invoking that method is doomed to fail, special JVM interaction is required to make it possible to create such a class. But the important point is that you need the target method on the remote side to execute it, the lambda specific runtime class is irrelevant. If you have the target method, you can create the lambda instance without third party libraries.
Of course, you can use ASM to analyze the target method’s code to transfer it (and all dependencies) to the remote side, but note that this is no different from transferring arbitrary Function implementations; the fact that there is a layer of a runtime generated class created via lambda expression does not help you in any way.
Original Question: Given a method I would like to determine if an object returned is created within the execution of that method. What sort of static analysis can or should I use?
Reworked Questions: Given a method I would like to determine if an object created in that method may be returned by that method. So, if I go through and add all instantiations of the return type within that method to a set, is there an analysis that will tell me, for each member of the set, if it may or may not be returned. Additionally, would it be possible to not limit the set to a single method but, all methods called by the original method to account for delegation?
This is not specific to any invocation.
It looks like method escape analysis may be the answer.
Thanks everyone for your suggestions.
Your question seems to be either a simple "reaching" analysis ("does a new value reach a return statements") if you are interested in any invocation and only if a method-local new creates the value. If you need to know if any invocation can return a new value from any subcomputation you need to compute the possible call-graph and determine if any called function can return a new value, or pass a new value from a called function to its parent.
There are a number of Java static analysis frameworks.
SOOT is a byte-code based analysis framework. You could probably implement your static query using this.
The DMS Software Reengineering Toolkit is a generic engine for building custom analyzers and transformation tools. It has a full Java front end, and computes various useful base analyses (def/use chains, call graph) on source code. It can process class files but presently only to get type information.
If you wanted a dynamic analysis, either by itself or as a way to tighten up the static analysis, DMS can be used to instrument the source code in arbitrary ways by inserting code to track allocations.
I'm not sure if this would work for you circumstances, but one simple approach would be to populate a newly added 'instantiatedTime' field in the constructor of the object and compare that with the time the method was call was made. This assumes you have access to the source for the object in question.
Are you sure static analysis is the right tool for the job? Static analysis can give you a result in some cases but not in all.
When running the JVM under a debugger, it assigns objects with increasing object IDs, which you can fetch via System.identityHashCode(Object o). You can use this fact to build a test case that creates an object (the checkpoint), and then calls the method. If the returned object as an id greater than the checkpoint id, then you know the object was created in the method.
Disclaimer: that this is observed behaviour under a debugger, under Windows XP.
I have a feeling that this is impossible to do without a specially modified JVM. Here are some approaches ... and why they won't work in general.
The Static Analysis approach will work in simple cases. However, something like this is likely to stump any current generation static analysis tool:
// Bad design alert ... don't try this at home!
public class LazySingletonStringFactory {
private String s;
public String create(String initial) {
if (s == null) {
s = new String(initial);
}
return s;
}
}
For a static analyser to figure out if a given call to LazySingletonStringFactory.create(...) returns a newly created String it must figure out that it has not been called previously. The Halting Problem tells us that this is theoretically impossible in some cases, and in practice this is beyond the "state of the art".
The IdentityHashCode approach may work in a single-threaded application that completes without the garbage collector running. However, if the GC runs you will get incorrect answers. And if you have multiple threads, then (depending on the JVM) you may find that objects are allocated in different "spaces" resulting in object "id" creation sequence that is no longer monotonic across all threads.
The Code Instrumentation approach works if you can modify the code of the Classes you are concerned about, either direct source-code changes, annotation-based code injection or by some kind of bytecode processing. However, in general you cannot do these things for all classes.
(I'm not aware of any other approaches that are materially different to the above three ... but feel free to suggest them as a comment.)
Not sure of a reliable way to do this statically.
You could use:
AspectJ or a similar AOP library could be use to instrument classes and increment a counter on object creation
a custom classloader (or JVM agent, but classloader is easier) could be used similarly
I have a parser written in bigloo scheme functional language which I need to compile into a java class. The whole of the parser is written as a single function. Unfortunately this is causing the JVM compiler to throw a "Method too large" warning and later give "far label in localvar" error. Is there any possible way where I can circumvent this error? I read somewhere about a DontCompileHugeMethods option, does it work? Splitting the function doesnt seem to be a viable option to me :( !!
Is there any possible way where I can circumvent this error?
Well, the root cause of this compiler error is that there are hard limits in the format of bytecode files. In this case, the problem is that a single method can consist of at most 65536 bytes of bytecodes. (See the JVM spec).
The only workaround is to split the method.
Split the method in related operations or splitting utilities separately.
Well, the case is a bit different
here, the method only consists of a
single function call. Now this
function has a huge parameter list(the
whole of the parser actually!!). So I
have no clues how to split this!!
The way to split up such a beast could be:
define data holder objects for your parameters (put sets of parameters in objects according to the ontology of your data model),
build those data holder objects in their own context
pass the parameter objects to the function
Quick and Dirty: Assign all your parameters to class variables of the same name (you must rename your parameters) at the beginning of your function and start chopping up your function in pieces and put those pieces in functions. This should guarantee that your function will basically operate with the same semantics.
But, this will not lead to pretty code!
I wonder if there's an easy way to determine which classes from a library are "used" by a compiled .NET or Java application, and I need to write a simple utility to do that (so using any of the available decompilers won't do the job).
I don't need to analyze different inputs to figure out if a class is actually created for this or that input set - I'm only concerned whether or not the class is referenced in the application. Most likely the application would subclass from the class I look for and use the subclass.
I've looked through a bunch of .Net .exe's and Java .classes with a hex editor and it appears that the referenced classes are spelled out in plaintext, but I am not sure if it will always be the case - my knowledge of MSIL/Java bytecode is not enough for that. I assume that even though the application itself can be obfuscated, it'll still have to call the library classes by the original name?
Extending what overslacked said.
EDIT: For some reason I thought you asked about methods, not types.
Types
Like finding methods, this doesn't cover access through the Reflection API.
You have to locate the following in a Reflector plugin to identify referenced types and perform a transitive closure:
Method parameters
Method return types
Custom attributes
Base types and interface implementations
Local variable declarations
Evaluated sub-expression types
Field, property, and event types
If you parse the IL yourself, all you have to do is process from the main assembly is the TypeRef and TypeSpec metadata, which is pretty easy (of course I'm speaking from parsing the entire byte code here). However, the transitive closure would still require you process the full byte code of each referenced method in the referenced assembly (to get the subexpression types).
Methods
If you can write a plugin for Reflector to handle the task, it will definitely be the easiest way. Parsing the IL is non-trivial, though I've done it now so I would just use that code if I had to (just saying it's not impossible). :D
Keep in mind that you may have method dependencies you don't see on the first pass that neither method mentioned will catch. These are due to indirect dispatch via the callvirt (virtual and interface method calls) and calli (generally delegates) instructions. For each type T created with newobj and for each method M within the type, you'll have to check all callvirt, ldftn, and ldvirtftn instructions to see if the base definition for the target (if the target is a virtual method) is the same as the base method definition for M in T or M is in the type's interface map if the target is an interface method. This is not perfect, but it is about the best you can do for static analysis without a theorem prover. It is a superset of the actual methods that will be called outside of the Reflection API, and a subset of the full set of methods in the assembly(ies).
For .NET: it looks like there's an article on MSDN that should help you get started. For what it's worth, for .NET the magic Google words are ".net assembly references".
In Java, the best mechanism to find class dependencies (in a programmatic fashion) is through bytecode inspection. This can be done with libraries like BCEL or (preferably) ASM. If you wish to parse the class files with your own code, the class file structure is well documented in the Java VM specification.
Note that class inspection won't cover runtime dependencies (like classes loaded using the service API).
I am trying to record the arguments passed to a method before it is called using bytecode instrumentation.
Currently while instrumenting using java code I have to first pop all the args into a locals, then push them again twice (once for my method which will record and in this case all primitive types have to be converted to their boxed types, and once for the actual method call).
What I would ideally like to do is just duplicate the entire stack for the num of args pushed for the method call. However the jvm bytecode's dup() instruction only allows duplicating the topmost value of the stack.
Is it possible using JNI to somehow duplicate the entire stack in one go?
No. The stack effectively goes away when the method is compiled. The JVM has no way of compiling native code. So even if you did try to directly manipulate the stack, it would change format (and use registers) on the fly.
You can reasonably easily duplicate the top four slot of the stack (using dup2_x2), but any further and you'll probably need to use local variables.