I am reading the book "The Well Grounded Java Developer 2nd edition", and in chapter 17 (Modern internals) there is a description of how the reflection internals work. First the duality of the Entry vs Entry.class is discussed (the first picture), which shows how the array of Methods from the Class will match the ordering (indexing) as the ones from the klassOop for the actual type. Following that, in the second screenshot, we are shown a DelegatingMethodAccessorImpl whose role is to delegate the actual invocation to the native or custom method accessor.
When and how is the bytecode for the method to be executed retrieved from the klassOop by the MethodAccessor's invocation. I guess the Method.slot field used, but from the available code, I don't see exactly how, but I suspect it's some native call which may read it.
Related
I know (at least using either BCEL, or ASM, for instance), it is possible to somehow access local variables of a method... but, I need something more, what I would like is:
to get the type of such a variable (or a way to convert from the signature)
to know (distinguish) when this variable is used (either sees it value affected, or is passed as parameter)
when this variable is used as parameter, to know which method call it was passed to
to break "method-chains" in their respective method calls and get their return value so I can manipulate them
The basic idea is that I would like to "instrument" methods a bit in the same way a debugger does (though limited to the first frame depth...).
Any pointer appreciated.
If more information need, feel free to ask.
This is only possible using a byte code-level API. cglib does not expose such an API such that you have to choose between ASM, BCEL and Javassist where I would recommend you ASM which has the best documentation.
What you would need to do:
Parse the signature of the method, ASM offers utilities for that. You would get any type by its internal name. You would need to map these names to their index.
Find any use of the variable that is used from that index.
This is however a quite difficult task. In order to predict your code, you would have to emulate the method invocation. The JVM is a stack machine, arguments can be placed on the operand stack as a result of an arbitrary chain of commands. Therefore, you would effectively have to interpret any byte code instruction that you find. You will, more or less, need to write your own simplistic interpreter what is quite a task.
I am reading about dynamic dispatch, as I have an exam tomorrow.
In C++ we have conforming subclasses, so through the static type of the identifier we know what index to access in the virtual method table of the runtime object.
From what I am reading, Java has conformance for subclasses as well, but instead of including the known index of a method in the virtual method table in the compiled code, it only includes a symbolic reference to the method, that needs to be resolved.
What is the point of this if the static type does not refer to an interface? It could be much faster to do it the C++ way.
The Java platform defines linkage as a step taken at runtime. Virtual method tables aren't even involved in the JVM specification; they are just a typical way to implement linkage.
Note, however, that after the symbolic reference is resolved into a direct reference, there is nothing stopping the runtime from using very fast code paths for method invocation sites. That includes special-case optimizations such as monomorphic call sites, which have a hardwired direct pointer to the method code and are thus faster than vtable lookups. Monomorphic sites then become an easy target for method inlining, which opens a whole new field of applicable optimizations. Another option is an n-polymorphic site, accommodating up to n different target types in an inline cache.
As opposed to C++, all these optimizing decisions happen at runtime, subject to the specific conditions at work: the exact set of loaded classes, profiling data for each individual call site, etc. This gives managed-runtime platforms such as Java advantages of their own.
I've recently been looking at The Java Virtual Machine Specifications (JVMS) to try to better understand the what makes my programs work, but I've found a section that I'm not quite getting...
Section 4.7.4 describes the StackMapTable Attribute, and in that section the document goes into details about stack map frames. The issue is that it's a little wordy and I learn best by example; not by reading.
I understand that the first stack map frame is derived from the method descriptor, but I don't understand how (which is supposedly explained here.) Also, I don't entirely understand what the stack map frames do. I would assume they're similar to blocks in Java, but it appears as though you can't have stack map frames inside each other.
Anyway, I have two specific questions:
What do the stack map frames do?
How is the first stack map frame created?
and one general question:
Can someone provide an explanation less wordy and easier to understand than the one given in the JVMS?
Java requires all classes that are loaded to be verified, in order to maintain the security of the sandbox and ensure that the code is safe to optimize. Note that this is done on the bytecode level, so the verification does not verify invariants of the Java language, it merely verifies that the bytecode makes sense according to the rules for bytecode.
Among other things, bytecode verification makes sure that instructions are well formed, that all the jumps are to valid instructions within the method, and that all instructions operate on values of the correct type. The last one is where the stack map comes in.
The thing is that bytecode by itself contains no explicit type information. Types are determined implicitly through dataflow analysis. For example, an iconst instruction creates an integer value. If you store it in slot 1, that slot now has an int. If control flow merges from code which stores a float there instead, the slot is now considered to have invalid type, meaning that you can't do anything more with that value until overwriting it.
Historically, the bytecode verifier inferred all the types using these dataflow rules. Unfortunately, it is impossible to infer all the types in a single linear pass through the bytecode because a backwards jump might invalidate already inferred types. The classic verifier solved this by iterating through the code until everything stopped changing, potentially requiring multiple passes.
However, verification makes class loading slow in Java. Oracle decided to solve this issue by adding a new, faster verifier, that can verify bytecode in a single pass. To do this, they required all new classes starting in Java 7 (with Java 6 in a transitional state) to carry metadata about their types, so that the bytecode can be verified in a single pass. Since the bytecode format itself can't be changed, this type information is stored seperately in an attribute called StackMapTable.
Simply storing the type for every single value at every single point in the code would obviously take up a lot of space and be very wasteful. In order to make the metadata smaller and more efficient, they decided to have it only list the types at positions which are targets of jumps. If you think about it, this is the only time you need the extra information to do a single pass verification. In between jump targets, all control flow is linear, so you can infer the types at in between positions using the old inference rules.
Each position where types are explicitly listed is known as a stack map frame. The StackMapTable attribute contains a list of frames in order, though they are usually expressed as a difference from the previous frame in order to reduce data size. If there are no frames in the method, which occurs when control flow never joins (i.e. the CFG is a tree), then the StackMapTable attribute can be omitted entirely.
So this is the basic idea of how StackMapTable works and why it was added. The last question is how the implicit initial frame is created. The answer of course is that at the beginning of the method, the operand stack is empty and the local variable slots have the types given by the types of the method parameters, which are determined from the method decriptor.
If you're used to Java, there are a few minor differences to how method parameter types work at the bytecode level. First off, virtual methods have an implicit this as first parameter. Second, boolean, byte, char, and short do not exist at the bytecode level. Instead, they are all implemented as ints behind the scenes.
In trying to use weka from clojure, I'm trying to convert this howto guide from the weka wiki to clojure using the java interop features of clojure.
This has worked well so far, except in one case, where the clojure reflection mechanism can't seem to find the right method to invoke - I have:
(def c-model (doto (NaiveBayes.) (.buildClassifier is-training-set)))
Later this will be invoked by the .evaluateModel method of the Evaluation class:
(.evaluateModel e-test c-model is-testing-set)
where e-test is of type weka.classifiers.Evaluation and, according to their api documentation the method takes two parameters of types Classifier and Instances
What I get from clojure though is IllegalArgumentException No matching method found: evaluateModel for class weka.classifiers.Evaluation clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:53) - I guess that this is because c-model is actually of type NaiveBayes, although it should also be a Classifier - which it is, according to instance?.
I tried casting with cast to no avail, and from what I understand this is more of a type assertion (and passes without problems, of course) than a real cast in clojure. Is there another way of explicitly telling clojure which types to cast to in java interop method calls? (Note that the original guide I linked above also uses an explicit cast from NaiveBayes to Classifier)
Full code here: /http://paste.lisp.org/display/129250
The linked javadoc contradicts your claim that there is a method taking a Classifier and an Instances - what there is, is a method taking a Classifier, an Instances, and a variable number of Objects. As discussed in a number of SO questions (the only one of which I can find at the moment is Why Is String Formatting Causing a Casting Exception?), Clojure does not provide implicit support for varargs, which are basically fictions created by the javac compiler. At the JVM level, it is simply an additional required parameter of type Object[]. If you pass a third parameter, an empty object-array, into your method, it will work fine.
IllegalArgumentException No matching method found happens anytime the arguments don't match the class. They can fail to match because no method exists with that name and number of arguments or because no method exists with that name in the called class. so also check the number and type of the arguments.
I basically always resort to repl-utils/show in these cases
I have a class named ActivityLog. This class holds a list of ActivityRecords. I want to return a list of ActivityRecords by these criterias: Environment and Condition. Should the method name include the "criteria"? See example:
activityLog.allRecords();
activityLog.allRecordsBy(Environment environment);
activityLog.allRecordsBy(Condition condition);
activityLog.allRecordsBy(Condition condition, Environment environment);
or
activityLog.allRecordsByEnvironment(Environment environment);
activityLog.allRecordsByCondtion(Condition condition);
I probably think the first is better because you will read the method name and you will understand from the parameter what it does, but I may be wrong? Which is the best, or are there even better alternatives?
I could have named the methods records(), recordsBy etc. too, but I want to have a consitency through my API where you always start writing all for lists of objects so you get help from for example Intelli Sense.
I like putting the criteria in the actual method name. So I would use:
activityLog.allRecordsByEnvironment(Environment environment);
To me proper method naming expresses a small summary of what the method does. Since the parameters are included in the method signature I would not consider the parameters to be part of the actual name, therefore not placing the criteria in the name gives the user of an api incomplete information about the methods functionality. (IMO)
I applaud your effort to practice self documenting code, great practice.
I like the overloaded variant (your first example), because it communicates that the methods are all related and provide largely the same functionality, aka, you are returning records, filtered by some criteria. You will see examples of this in many open source libraries and even the SDK itself.
I'd treat it the same as static factory methods, which are named constructors. And there not only parameter says what this method does, its name itself does it. So I'd choose 2nd option.
#Bob, about names being too long - even if you would put 2 parameters into its name, it still would be ok for me. Anyway you should avoid having methods with more than 3 parameters. Following this rule will prevent your methods' names from being enormous long.
I would take the first one.
If these methods are doing the same thing or providing the same functionality then they should have the same name. But be aware of Effective Java Item 41 and 42. You've to ensure that at least one corresponding param of overloaded method are having radically different types.
The 2nd approach becomes ugly very fast with every param added. I see this in often in Broker classes at work. There are people writing methods like findByFirstnameAndLastnameAndBirthdayOrderByUgliness(blablub). No comment.
Methods in OOP represent behavior, so I would name all of them getRecords() and made them overloaded.
In my opinion, specifying criteria in the name of method looks like naming heirarchy classes like this
Car -> BMW_Car -> X5_BMW_Car