I wish to create a custom remote execution client for my app. The client may look something like this:
interface Client {
<T> T computeRemotely(Function<List<MyBigObject>, T> consumer)
}
and might be used like this:
Client client = new Client();
Integer remoteResult = client.computeRemotely(list -> {
Integer result = -1;
// do some computational work here.
return result;
});
This means I somehow need to take lambda from the client, send it to the server, run the function (passing in a real List<MyBigObject>) and send the result back.
It's worth noting that a restriction on using my client library is that you cannot use anything outside the JDK in that lambda and expect it to work (as the classes may not be on the classpath on the server)...but I would like them to be able to use any of the JDK classes to bring in their own data to the calculation.
Now I can't just serialize the Function<MyBigObject, T> lambda because it serializes like an inner client of whatever class the lambda exists in which will NOT be on the classpath on the server.
So I have been looking at ASM to see whether that could work. Given that I have never done byte code manipulation before, I just wanted to check that what I am saying sounds right:
I can use ASM to convert to read the class that the lambda sits in.
Using a Method Visitor, get the method bytes, send them to the server
Use ASM to create an instance from the bytes and execute it.
Given that the lambda is like an anonymous inner class, I am guessing I will have to do some sort of method renaming in there too..
Is this roughly correct or have I got completely the wrong end of the stick?
Note that lambdas can access all immutable values from their context. Thus, you'd need to either forbid accessing external values (which would severely limit the usefulness of your solution) or identify them and send representations of those values (which runs into the problem you mentioned; their implementation may be outside the server classpath).
So even if you send the method representation (for which you would not even need ASM; you could get the Resource directly from the classloader), it won't work for the general case.
Edit: Given your comment, it could work. You'd need to synthesize a class with
The context attributes as final fields
A constructor with arguments for all the fields (you will passed the deserialized values there on construction)
The lambda method — see this question for details
There is no point in analyzing the runtime generated classes of lambda expressions. They have a very simple structure which they reveal when being serialized. During Serialization they will get replaced by a SerializedLambda instance which contains all information you could ever gather, most notably:
The implemented interface
The target method that will be invoked
The captured values
The crucial point is the target method. For lambda expressions (unlike method references) the target method is a synthetic method that resides within the class which contains the lambda expression. That method is private, by the way, that’s why an attempt to replicate the class invoking that method is doomed to fail, special JVM interaction is required to make it possible to create such a class. But the important point is that you need the target method on the remote side to execute it, the lambda specific runtime class is irrelevant. If you have the target method, you can create the lambda instance without third party libraries.
Of course, you can use ASM to analyze the target method’s code to transfer it (and all dependencies) to the remote side, but note that this is no different from transferring arbitrary Function implementations; the fact that there is a layer of a runtime generated class created via lambda expression does not help you in any way.
Related
Java (1.8+) has an #FunctionalInterface annotation which (basically) suggests that you can pass a method reference in place of an Interface implementation to another method call. The useful one I was playing with today is:
DateTimeFormatter.parse(String, TemporalQuery<T>)
It's nice as it lets you tell the formatter what kind of result to hand back to you. The javadoc even gives you a nice example:
The query is typically a method reference to a from(TemporalAccessor) method. For example:
LocalDateTime dt = parser.parse(str, LocalDateTime::from);
Once I got my head around what the #FunctionalInterface is and means, I started to wonder how a consumer of an API is to figure out what they can actually use in its place. The example above tells you what you can use, and if you trace through the java.time package, you can find other method references you can use. However, any contributors to the API need to read through the entire javadoc to make sure they don't break any implicit contracts mentioned in other places (of course they should, especially for the JDK, but that's not the purpose of javadoc!)
So.. If a contributor to this API were to change the signature of LocalDateTime::from, then there's no compile time checking to say that this method no longer conforms to the FuncitonalInterface of 'TemporalQuery'. This would obviously break any consumers of the API and they could change their code to use an explicit lambda instead. I do understand that it does not need to, but if an annotation, similar to the optional '#Override' annotation were available, then it would provide some compile time checks as well as the possibility of introspecting/reflecting to discover available method references.
e.g.
#ConformsTo(TemporalQuery.class)
public static LocalDateTime from(TemporalAccessor temporal)
It would also then be possible to find, through introspection, any other method references that can be used for a FunctionalInterface.
So, to be clear, I understand that this is not necessary, but do think it seems to be an oversight not to include it as an optional Annotation. Is there any particular reason this could/should not exist?
The problems that arise from changing the signature or return type of a method, e.g. LocalDateTime::from isn't limited to functional interfaces. Even before Java 8 changing those things risked breaking existing code that relied on those things. That's why designing an API is always a challenge because changes to existing code can mean a lot of work.
Additionally, assuming the functional interface and the matching methods are part of different libraries, would you really want that they are closely coupled, i.e. both need to change when one changes? What if they are maintained by different organizations (let's say different open source projects or companies) - how should they coordinate?
As an example take Comparator.comparing(Function<? super T, ? extends U> keyExtractor). That basically accepts a reference to any method that takes no parameter and returns something comparable. There are so many libraries that already provide those methods, would you want them all to have to add #ConformsTo?
That said, a #ConformsTo would at best be incomplete and might even be misleading/outdated.
Edit:
Let's tackle both annotations from the view of the compiler.
#FunctionalInterface tells the compiler that it should complain when you define more than one abstract method or use it on something else other than an interface.
That means that the requirements/contract definition ("this interface is a functional interface") and the implementation (the interface itself) are contained in the same file and thus have to be changed together anyways.
#ConformsTo could tell the compiler to check the requirements of the functional interface (or even interfaces) and see if that method satisfies them.
So far so good, but the problem arises when the interface changes: it would couple the method and the interface which could be part of different and otherwise totally unrelated libraries. And even if they were part of the same library you could run into problems when the method itself wouldn't be recompiled - the compiler might miss that incompatibility and thus defy the purpose of that annotation (if it were only meant for humans then a simple comment would be sufficient as well).
I know (at least using either BCEL, or ASM, for instance), it is possible to somehow access local variables of a method... but, I need something more, what I would like is:
to get the type of such a variable (or a way to convert from the signature)
to know (distinguish) when this variable is used (either sees it value affected, or is passed as parameter)
when this variable is used as parameter, to know which method call it was passed to
to break "method-chains" in their respective method calls and get their return value so I can manipulate them
The basic idea is that I would like to "instrument" methods a bit in the same way a debugger does (though limited to the first frame depth...).
Any pointer appreciated.
If more information need, feel free to ask.
This is only possible using a byte code-level API. cglib does not expose such an API such that you have to choose between ASM, BCEL and Javassist where I would recommend you ASM which has the best documentation.
What you would need to do:
Parse the signature of the method, ASM offers utilities for that. You would get any type by its internal name. You would need to map these names to their index.
Find any use of the variable that is used from that index.
This is however a quite difficult task. In order to predict your code, you would have to emulate the method invocation. The JVM is a stack machine, arguments can be placed on the operand stack as a result of an arbitrary chain of commands. Therefore, you would effectively have to interpret any byte code instruction that you find. You will, more or less, need to write your own simplistic interpreter what is quite a task.
I'm investigating ways to ensure a java class only calls a limited set of allowed methods from other classes. The usecase I have receives the class via the standard java serialization.
The approach I want to try is to simply list the methods it calls and only run the code if it passes a short whire list.
The question I have : how do I list the methods used in that class?
This is not a perfect solution but you coud use this if you can't find something better. You can use javap, if you're in Linux, run in the command line (or run a proccess using Runtime.exec()): javap -verbose /path/to/my/classfile.class | grep invoke and you'll have the binary signatures that the class "calls" from other classes. Yes, I know, it's not what you wanted but you could use it as a last resource.
If you need a more "javaish" solution, you could have a look at a java library called "asm": http://asm.ow2.org/
You could pass a dynamic proxy object to the caller, which inside checks the methods against your white list and throws exception when the call is not allowed.
Dynamic proxies basically allows you to insert piece of code between the caller's method invocation and the actual invocation of the called method.
I'd really think through though to if you really need this. Dynamic proxies are useful but they can also be very confusing and annoying to debug.
Original Question: Given a method I would like to determine if an object returned is created within the execution of that method. What sort of static analysis can or should I use?
Reworked Questions: Given a method I would like to determine if an object created in that method may be returned by that method. So, if I go through and add all instantiations of the return type within that method to a set, is there an analysis that will tell me, for each member of the set, if it may or may not be returned. Additionally, would it be possible to not limit the set to a single method but, all methods called by the original method to account for delegation?
This is not specific to any invocation.
It looks like method escape analysis may be the answer.
Thanks everyone for your suggestions.
Your question seems to be either a simple "reaching" analysis ("does a new value reach a return statements") if you are interested in any invocation and only if a method-local new creates the value. If you need to know if any invocation can return a new value from any subcomputation you need to compute the possible call-graph and determine if any called function can return a new value, or pass a new value from a called function to its parent.
There are a number of Java static analysis frameworks.
SOOT is a byte-code based analysis framework. You could probably implement your static query using this.
The DMS Software Reengineering Toolkit is a generic engine for building custom analyzers and transformation tools. It has a full Java front end, and computes various useful base analyses (def/use chains, call graph) on source code. It can process class files but presently only to get type information.
If you wanted a dynamic analysis, either by itself or as a way to tighten up the static analysis, DMS can be used to instrument the source code in arbitrary ways by inserting code to track allocations.
I'm not sure if this would work for you circumstances, but one simple approach would be to populate a newly added 'instantiatedTime' field in the constructor of the object and compare that with the time the method was call was made. This assumes you have access to the source for the object in question.
Are you sure static analysis is the right tool for the job? Static analysis can give you a result in some cases but not in all.
When running the JVM under a debugger, it assigns objects with increasing object IDs, which you can fetch via System.identityHashCode(Object o). You can use this fact to build a test case that creates an object (the checkpoint), and then calls the method. If the returned object as an id greater than the checkpoint id, then you know the object was created in the method.
Disclaimer: that this is observed behaviour under a debugger, under Windows XP.
I have a feeling that this is impossible to do without a specially modified JVM. Here are some approaches ... and why they won't work in general.
The Static Analysis approach will work in simple cases. However, something like this is likely to stump any current generation static analysis tool:
// Bad design alert ... don't try this at home!
public class LazySingletonStringFactory {
private String s;
public String create(String initial) {
if (s == null) {
s = new String(initial);
}
return s;
}
}
For a static analyser to figure out if a given call to LazySingletonStringFactory.create(...) returns a newly created String it must figure out that it has not been called previously. The Halting Problem tells us that this is theoretically impossible in some cases, and in practice this is beyond the "state of the art".
The IdentityHashCode approach may work in a single-threaded application that completes without the garbage collector running. However, if the GC runs you will get incorrect answers. And if you have multiple threads, then (depending on the JVM) you may find that objects are allocated in different "spaces" resulting in object "id" creation sequence that is no longer monotonic across all threads.
The Code Instrumentation approach works if you can modify the code of the Classes you are concerned about, either direct source-code changes, annotation-based code injection or by some kind of bytecode processing. However, in general you cannot do these things for all classes.
(I'm not aware of any other approaches that are materially different to the above three ... but feel free to suggest them as a comment.)
Not sure of a reliable way to do this statically.
You could use:
AspectJ or a similar AOP library could be use to instrument classes and increment a counter on object creation
a custom classloader (or JVM agent, but classloader is easier) could be used similarly
I wonder if there's an easy way to determine which classes from a library are "used" by a compiled .NET or Java application, and I need to write a simple utility to do that (so using any of the available decompilers won't do the job).
I don't need to analyze different inputs to figure out if a class is actually created for this or that input set - I'm only concerned whether or not the class is referenced in the application. Most likely the application would subclass from the class I look for and use the subclass.
I've looked through a bunch of .Net .exe's and Java .classes with a hex editor and it appears that the referenced classes are spelled out in plaintext, but I am not sure if it will always be the case - my knowledge of MSIL/Java bytecode is not enough for that. I assume that even though the application itself can be obfuscated, it'll still have to call the library classes by the original name?
Extending what overslacked said.
EDIT: For some reason I thought you asked about methods, not types.
Types
Like finding methods, this doesn't cover access through the Reflection API.
You have to locate the following in a Reflector plugin to identify referenced types and perform a transitive closure:
Method parameters
Method return types
Custom attributes
Base types and interface implementations
Local variable declarations
Evaluated sub-expression types
Field, property, and event types
If you parse the IL yourself, all you have to do is process from the main assembly is the TypeRef and TypeSpec metadata, which is pretty easy (of course I'm speaking from parsing the entire byte code here). However, the transitive closure would still require you process the full byte code of each referenced method in the referenced assembly (to get the subexpression types).
Methods
If you can write a plugin for Reflector to handle the task, it will definitely be the easiest way. Parsing the IL is non-trivial, though I've done it now so I would just use that code if I had to (just saying it's not impossible). :D
Keep in mind that you may have method dependencies you don't see on the first pass that neither method mentioned will catch. These are due to indirect dispatch via the callvirt (virtual and interface method calls) and calli (generally delegates) instructions. For each type T created with newobj and for each method M within the type, you'll have to check all callvirt, ldftn, and ldvirtftn instructions to see if the base definition for the target (if the target is a virtual method) is the same as the base method definition for M in T or M is in the type's interface map if the target is an interface method. This is not perfect, but it is about the best you can do for static analysis without a theorem prover. It is a superset of the actual methods that will be called outside of the Reflection API, and a subset of the full set of methods in the assembly(ies).
For .NET: it looks like there's an article on MSDN that should help you get started. For what it's worth, for .NET the magic Google words are ".net assembly references".
In Java, the best mechanism to find class dependencies (in a programmatic fashion) is through bytecode inspection. This can be done with libraries like BCEL or (preferably) ASM. If you wish to parse the class files with your own code, the class file structure is well documented in the Java VM specification.
Note that class inspection won't cover runtime dependencies (like classes loaded using the service API).