I am trying to write an instrumentation module for Java programs. One particular instrumentation I am looking to add is collecting all the objects in a method's argument list and do some processing on them.
Currently, to get the list of object arguments, I pop all the method args from stack, and then push them in one by one, adding my instrumentation call in between. While this mostly works, I see some
java.lang.VerifyError, [1] (****) Incompatible argument to function
type errors in large programs. Does popping and then pushing an object back to stack change its type somehow? Alternatively, is there a better solution for duplicating 'N' arguments from the stack without popping?
Where are you popping your arguments to? You need to store them in the local variable array, I assume? It is perfectly possible that you override variables that are stored there already but which are accessed later. In this case, you might have changed the types of the stored variables which yields an error during verification.
As verification is a determinisitic process: Simply compare the byte code of a failing method to the verifiers complaint and make sure that the types match.
Related
What I'm doing
I'm using reflection in my code to decouple all my classes and To do so I need to be able to dynamically create instances of objects. I've done this by text matching parameter names to input data. To do a text match however, I need access to the formal parameter names rather than the synthetic arg0, arg1 ... that I know gets created if IsNamePresent returns false.
What I've done
I researched how to get the formal names (google searching things like: "when I compile my java classes with the parameter option enabled, does that make reflection work forever? or only one time when the classes are run?" to no useful results). I've also tried searches similar to that here and seen info related to javac with one of the questions being "Drawbacks of javac -parameters flag" as an example. While these addressed parts of my question they really didn't answer the meat of what I need. I've found that in java 8 you can just do "javac -parameters " and you will be fine. Note that I had to use the directory of the jdk as the starting point (my command line input looks exactly like this):
C:\Program Files\Java\jdk-10.0.1\bin>javac -parameters C:\Users\abbotts1\IdeaProjects\project\src\Sales_Rep_Data\*.java
and so far that works without any errors and my project has bytecode compiled files in it now so I know its doing something. Just what exactly (or if its as a result of the above) is a mystery to me because there is no timestamp or anything I can find for these files that points to which command I tried that made it (I've been trying these commands for a while).
Detailed description of question scope
My question is this: is this command line input the only way to get formal parameters. If it is then, am I doing it right (correct input syntax)? If I'm doing it right then how can I make it so that when I debug my code and run param.getName() it actually returns the formal name? So far I've ran the above command on the command line and tried to debug in my java code this line:
Boolean check = param.isNamePresent();
where param is just the parameter coming from a for-each loop that uses the constructor of the class I am getting through reflection. Point is, every-time I run it this Boolean returns false in the debugger and the names are synthetic (arg0, arg1 ect). I want it to return true (and actually use the formal names) so I can debug the rest of my code.
If this isn't the only way to achieve the stated goal of getting formal parameter names then where can I find a better way? I've seen some framework stuff and heard of Eclipse being used to do this, however I don't want to get too deep into new software just to accomplish one thing AND I am working so I don't have administrator privileges (which is why I needed to specify the jdk in cmd directly rather than just set the PATH variables the usual way). This would make it a hassle to have to download something like Eclipse.
Update
I've researched into using annotations to get the parameter names since I have no good idea why the compiled class won't actually store the parameter names. This strategy was suggested in an initial answer (since deleted) and I took it upon myself to go learn some basic annotations. They have worked to a point but right now I'm getting a wrong arguments error where I shouldn't. I've checked the debugger and the number of arguments passed in are the same number needed so it must be a type error with the wrapping/unwrapping according to the javadoc for the newInstance(Object[]) method). I want to be able to initialize null parameters and I think thats the source of my problem (i. e. null type errors or something but not shown as NPE). Other potential sources include the fact that I'm passing in an Object[] and typing it stricter in the class (i.e. newInstance(Object[] array) is creating an instance of a class that has String parameters and other various sub classes of Object including array lists) Since asking about that error here would constitute an XY problem I won't ask but just describe it for clarification on the original question. My original question still stands even as this workaround is being worked on because I'd still love to know why compiling this class with the -parameters flag didn't store the parameter names. I'm 99% sure the class path is correct since I copy pasted it from the directory. This sounds silly but do I have to actually run the class using the below line?
C:\Program Files\Java\jdk-10.0.1\bin>java C:\Users\abbotts1\IdeaProjects\project\src\Sales_Rep_Data\Data_Parser.java
I was under the impression that compiling it with the parameters flag was all you needed and then the formal parameter names would be available.
I know (at least using either BCEL, or ASM, for instance), it is possible to somehow access local variables of a method... but, I need something more, what I would like is:
to get the type of such a variable (or a way to convert from the signature)
to know (distinguish) when this variable is used (either sees it value affected, or is passed as parameter)
when this variable is used as parameter, to know which method call it was passed to
to break "method-chains" in their respective method calls and get their return value so I can manipulate them
The basic idea is that I would like to "instrument" methods a bit in the same way a debugger does (though limited to the first frame depth...).
Any pointer appreciated.
If more information need, feel free to ask.
This is only possible using a byte code-level API. cglib does not expose such an API such that you have to choose between ASM, BCEL and Javassist where I would recommend you ASM which has the best documentation.
What you would need to do:
Parse the signature of the method, ASM offers utilities for that. You would get any type by its internal name. You would need to map these names to their index.
Find any use of the variable that is used from that index.
This is however a quite difficult task. In order to predict your code, you would have to emulate the method invocation. The JVM is a stack machine, arguments can be placed on the operand stack as a result of an arbitrary chain of commands. Therefore, you would effectively have to interpret any byte code instruction that you find. You will, more or less, need to write your own simplistic interpreter what is quite a task.
I've recently been looking at The Java Virtual Machine Specifications (JVMS) to try to better understand the what makes my programs work, but I've found a section that I'm not quite getting...
Section 4.7.4 describes the StackMapTable Attribute, and in that section the document goes into details about stack map frames. The issue is that it's a little wordy and I learn best by example; not by reading.
I understand that the first stack map frame is derived from the method descriptor, but I don't understand how (which is supposedly explained here.) Also, I don't entirely understand what the stack map frames do. I would assume they're similar to blocks in Java, but it appears as though you can't have stack map frames inside each other.
Anyway, I have two specific questions:
What do the stack map frames do?
How is the first stack map frame created?
and one general question:
Can someone provide an explanation less wordy and easier to understand than the one given in the JVMS?
Java requires all classes that are loaded to be verified, in order to maintain the security of the sandbox and ensure that the code is safe to optimize. Note that this is done on the bytecode level, so the verification does not verify invariants of the Java language, it merely verifies that the bytecode makes sense according to the rules for bytecode.
Among other things, bytecode verification makes sure that instructions are well formed, that all the jumps are to valid instructions within the method, and that all instructions operate on values of the correct type. The last one is where the stack map comes in.
The thing is that bytecode by itself contains no explicit type information. Types are determined implicitly through dataflow analysis. For example, an iconst instruction creates an integer value. If you store it in slot 1, that slot now has an int. If control flow merges from code which stores a float there instead, the slot is now considered to have invalid type, meaning that you can't do anything more with that value until overwriting it.
Historically, the bytecode verifier inferred all the types using these dataflow rules. Unfortunately, it is impossible to infer all the types in a single linear pass through the bytecode because a backwards jump might invalidate already inferred types. The classic verifier solved this by iterating through the code until everything stopped changing, potentially requiring multiple passes.
However, verification makes class loading slow in Java. Oracle decided to solve this issue by adding a new, faster verifier, that can verify bytecode in a single pass. To do this, they required all new classes starting in Java 7 (with Java 6 in a transitional state) to carry metadata about their types, so that the bytecode can be verified in a single pass. Since the bytecode format itself can't be changed, this type information is stored seperately in an attribute called StackMapTable.
Simply storing the type for every single value at every single point in the code would obviously take up a lot of space and be very wasteful. In order to make the metadata smaller and more efficient, they decided to have it only list the types at positions which are targets of jumps. If you think about it, this is the only time you need the extra information to do a single pass verification. In between jump targets, all control flow is linear, so you can infer the types at in between positions using the old inference rules.
Each position where types are explicitly listed is known as a stack map frame. The StackMapTable attribute contains a list of frames in order, though they are usually expressed as a difference from the previous frame in order to reduce data size. If there are no frames in the method, which occurs when control flow never joins (i.e. the CFG is a tree), then the StackMapTable attribute can be omitted entirely.
So this is the basic idea of how StackMapTable works and why it was added. The last question is how the implicit initial frame is created. The answer of course is that at the beginning of the method, the operand stack is empty and the local variable slots have the types given by the types of the method parameters, which are determined from the method decriptor.
If you're used to Java, there are a few minor differences to how method parameter types work at the bytecode level. First off, virtual methods have an implicit this as first parameter. Second, boolean, byte, char, and short do not exist at the bytecode level. Instead, they are all implemented as ints behind the scenes.
In trying to use weka from clojure, I'm trying to convert this howto guide from the weka wiki to clojure using the java interop features of clojure.
This has worked well so far, except in one case, where the clojure reflection mechanism can't seem to find the right method to invoke - I have:
(def c-model (doto (NaiveBayes.) (.buildClassifier is-training-set)))
Later this will be invoked by the .evaluateModel method of the Evaluation class:
(.evaluateModel e-test c-model is-testing-set)
where e-test is of type weka.classifiers.Evaluation and, according to their api documentation the method takes two parameters of types Classifier and Instances
What I get from clojure though is IllegalArgumentException No matching method found: evaluateModel for class weka.classifiers.Evaluation clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:53) - I guess that this is because c-model is actually of type NaiveBayes, although it should also be a Classifier - which it is, according to instance?.
I tried casting with cast to no avail, and from what I understand this is more of a type assertion (and passes without problems, of course) than a real cast in clojure. Is there another way of explicitly telling clojure which types to cast to in java interop method calls? (Note that the original guide I linked above also uses an explicit cast from NaiveBayes to Classifier)
Full code here: /http://paste.lisp.org/display/129250
The linked javadoc contradicts your claim that there is a method taking a Classifier and an Instances - what there is, is a method taking a Classifier, an Instances, and a variable number of Objects. As discussed in a number of SO questions (the only one of which I can find at the moment is Why Is String Formatting Causing a Casting Exception?), Clojure does not provide implicit support for varargs, which are basically fictions created by the javac compiler. At the JVM level, it is simply an additional required parameter of type Object[]. If you pass a third parameter, an empty object-array, into your method, it will work fine.
IllegalArgumentException No matching method found happens anytime the arguments don't match the class. They can fail to match because no method exists with that name and number of arguments or because no method exists with that name in the called class. so also check the number and type of the arguments.
I basically always resort to repl-utils/show in these cases
I am trying to record the arguments passed to a method before it is called using bytecode instrumentation.
Currently while instrumenting using java code I have to first pop all the args into a locals, then push them again twice (once for my method which will record and in this case all primitive types have to be converted to their boxed types, and once for the actual method call).
What I would ideally like to do is just duplicate the entire stack for the num of args pushed for the method call. However the jvm bytecode's dup() instruction only allows duplicating the topmost value of the stack.
Is it possible using JNI to somehow duplicate the entire stack in one go?
No. The stack effectively goes away when the method is compiled. The JVM has no way of compiling native code. So even if you did try to directly manipulate the stack, it would change format (and use registers) on the fly.
You can reasonably easily duplicate the top four slot of the stack (using dup2_x2), but any further and you'll probably need to use local variables.