I've recently been looking at The Java Virtual Machine Specifications (JVMS) to try to better understand the what makes my programs work, but I've found a section that I'm not quite getting...
Section 4.7.4 describes the StackMapTable Attribute, and in that section the document goes into details about stack map frames. The issue is that it's a little wordy and I learn best by example; not by reading.
I understand that the first stack map frame is derived from the method descriptor, but I don't understand how (which is supposedly explained here.) Also, I don't entirely understand what the stack map frames do. I would assume they're similar to blocks in Java, but it appears as though you can't have stack map frames inside each other.
Anyway, I have two specific questions:
What do the stack map frames do?
How is the first stack map frame created?
and one general question:
Can someone provide an explanation less wordy and easier to understand than the one given in the JVMS?
Java requires all classes that are loaded to be verified, in order to maintain the security of the sandbox and ensure that the code is safe to optimize. Note that this is done on the bytecode level, so the verification does not verify invariants of the Java language, it merely verifies that the bytecode makes sense according to the rules for bytecode.
Among other things, bytecode verification makes sure that instructions are well formed, that all the jumps are to valid instructions within the method, and that all instructions operate on values of the correct type. The last one is where the stack map comes in.
The thing is that bytecode by itself contains no explicit type information. Types are determined implicitly through dataflow analysis. For example, an iconst instruction creates an integer value. If you store it in slot 1, that slot now has an int. If control flow merges from code which stores a float there instead, the slot is now considered to have invalid type, meaning that you can't do anything more with that value until overwriting it.
Historically, the bytecode verifier inferred all the types using these dataflow rules. Unfortunately, it is impossible to infer all the types in a single linear pass through the bytecode because a backwards jump might invalidate already inferred types. The classic verifier solved this by iterating through the code until everything stopped changing, potentially requiring multiple passes.
However, verification makes class loading slow in Java. Oracle decided to solve this issue by adding a new, faster verifier, that can verify bytecode in a single pass. To do this, they required all new classes starting in Java 7 (with Java 6 in a transitional state) to carry metadata about their types, so that the bytecode can be verified in a single pass. Since the bytecode format itself can't be changed, this type information is stored seperately in an attribute called StackMapTable.
Simply storing the type for every single value at every single point in the code would obviously take up a lot of space and be very wasteful. In order to make the metadata smaller and more efficient, they decided to have it only list the types at positions which are targets of jumps. If you think about it, this is the only time you need the extra information to do a single pass verification. In between jump targets, all control flow is linear, so you can infer the types at in between positions using the old inference rules.
Each position where types are explicitly listed is known as a stack map frame. The StackMapTable attribute contains a list of frames in order, though they are usually expressed as a difference from the previous frame in order to reduce data size. If there are no frames in the method, which occurs when control flow never joins (i.e. the CFG is a tree), then the StackMapTable attribute can be omitted entirely.
So this is the basic idea of how StackMapTable works and why it was added. The last question is how the implicit initial frame is created. The answer of course is that at the beginning of the method, the operand stack is empty and the local variable slots have the types given by the types of the method parameters, which are determined from the method decriptor.
If you're used to Java, there are a few minor differences to how method parameter types work at the bytecode level. First off, virtual methods have an implicit this as first parameter. Second, boolean, byte, char, and short do not exist at the bytecode level. Instead, they are all implemented as ints behind the scenes.
Related
i am trying to get a compile time safe field reference in java, done not with reflection and Strings, but directly referencing the field. Something like
MyClass::myField
I have tried the usual reflection way, but you need to reference the fields as strings, and this is error prone in case of a rename, and will not throw a compile time error
EDIT: just want to clarify that my end goal is to get the field NAME for entity purposes, such as reference the entity field in a query, and not the value
Unfortunately, you might as well want to wish for a unicorn. The notion of 'a field reference', in the sense that you are asking for, simply isn't part of java-the-language.
That MyClass::myThing syntax works only for methods. There's simply no such thing for fields. It's unfortunate.
It's very difficult to give objective reasons for the design decisions of any language; it either requires spelunking through the designer's collective heads which requires magic or science fiction, or asking them to spill the beans, which they're probably not going to do in a stack overflow question. Sometimes (and more recent java features, such as this one), design is debated in public. Specifically, you can search for the openjdk lamba-dev mailing list where no doubt this question was covered. You'll need to go through, and I'm not exaggerating, tens of thousands of posts, but, the good news is, it's searchable.
But, I can guess / dig through my own memory as I spent some time discussing Project Lambda as it was designed:
Direct field access isn't common in the java ecosystem. The language allows direct field access but few java programs are written that way, so why make a language feature that would only be immediately useful and familiar to an exotic bunch.
The infrastructure required is also rather significant - a method lambda isn't allowed to be written in java unless you use it in a context that makes it possible for the compiler to 'treat' the lambda as a type - specifically, a #FunctionalInterface - any interface that contains exactly 1 method (other than methods that already exist in j.l.Object itself). In other words, this is fine:
Function<String, String> f = String::toLowerCase;
But this is not:
Object o = String::toLowerCase;
So, let's imagine for a moment that field refs did exist. What does that mean? What is the 'type' of the expression MyClass::myField? Perhaps a new concept: An interface with 2 methods; one of them takes no arguments and returns a T, the other wants a T and returns nothing (to match the act of reading the field, and writing it), but where it's also acceptable if it's a FunctionalInterface that is either one of those, perhaps? That sounds complicated.
The general mindset of the java design team right now (and has been for a while) is not to overcomplicate matters: Do not add features unless you have a good reason. After all, if it turns out that the community really clamours for field refs, they can be added. But, if on the other hand, they were added but nobody uses them, they can't be removed (and thus you've now permanently made the language more complicated and reduced room for future language features for a thing nobody uses and which most style guides tell you to actively avoid).
That's, I'm pretty sure, why they don't exist.
I am little bit curious about that what happen if I manually changed something into bytecode before execution. For instance, let suppose assigning int type variable into byte type variable without casting or remove semicolon from somewhere in program or anything that leads to compile time error. As I know all compile time errors are checked by compiler before making .class file. So what happen when I changed byte code after successfully compile a program then changed bytecode manually ? Is there any mechanism to handle this ? or if not then how program behaves after execution ?
EDIT :-
As Hot Licks, Darksonn and manouti already gave correct satisfy answers.Now I just conclude for those readers who all seeking answer for this type question :-
Every Java virtual machine has a class-file verifier, which ensures that loaded class files have a proper internal structure. If the class-file verifier discovers a problem with a class file, it throws an exception. Because a class file is just a sequence of binary data, a virtual machine can't know whether a particular class file was generated by a well-meaning Java compiler or by shady crackers bent on compromising the integrity of the virtual machine. As a consequence, all JVM implementations have a class-file verifier that can be invoked on untrusted classes, to make sure the classes are safe to use.
Refer this for more details.
You certainly can use a hex editor (eg, the free "HDD Hex Editor Neo") or some other tool to modify the bytes of a Java .class file. But obviously, you must do so in a way that maintains the file's "integrity" (tables all in correct format, etc). Furthermore (and much trickier), any modification you make must pass muster by the JVM's "verifier", which essentially rechecks everything that javac verified while compiling the program.
The verification process occurs during class loading and is quite complex. Basically, a data flow analysis is done on each procedure to assure that only the correct data types can "reach" a point where the data type is assumed. Eg, you can't change a load operation to load a reference to a HashMap onto the "stack" when the eventual user of the loaded reference will be assuming it's a String. (But enumerating all the checks the verifier does would be a major task in itself. I can't remember half of them, even though I wrote the verifier for the IBM iSeries JVM.)
(If you're asking if one can "jailbreak" a Java .class file to introduce code that does unauthorized things, the answer is no.)
You will most likely get a java.lang.VerifyError:
Thrown when the "verifier" detects that a class file, though well formed, contains some sort of internal inconsistency or security problem.
You can certainly do this, and there are even tools to make it easier, like http://set.ee/jbe/. The Java runtime will run your modified bytecode just as it would run the bytecode emitted by the compiler. What you're describing is a Java-specific case of a binary patch.
The semicolon example wouldn't be an issue, since semicolons are only for the convenience of the compiler and don't appear in the bytecode.
Either the bytecode executes normally and performs the instructions given or the jvm rejects them.
I played around with programming directly in java bytecode some time ago using jasmin, and I noticed some things.
If the bytecode you edited it into makes sense, it will of coursse run as expected. However there are some bytecode patterns that are rejected with a VerifyError.
For the specific examble of out of bounds access, you can compile code with out of bounds just fine. They will get you an ArrayIndexOutOfBoundsException at runtime.
int[] arr = new int[20];
for (int i = 0; i < 100; i++) {
arr[i] = i;
}
However you can construct bytecode that is more fundamentally flawed than that. To give an example I'll explain some things first.
The java bytecode works with a stack, and instructions works with the top elements on the stack.
The stack naturally have different sizes at different places in the program but sometimes you might use a goto in the bytecode to cause the stack to look different depending on how you reached there.
The stack might contain object, int then you store the object in an object array and the int in an int array. Then you go on and from somewhere else in that bytecode you use a goto, but now your stack contains int, object which would result in an int being passed to an object array and vice versa.
This is just one example of things that could happen which makes your bytecode fundamentally flawed. The JVM detects these kinds of flaws when the class is loaded at runtime, and then emits a VerifyError if something dosen't work.
Could someone list major tasks that the bytecode verifier has to perform to guarantee correctness of the program? Is there a standard, minimal set of responsibilities defined in JVM specification? I was also wondering whether verifications spans across other phases such as loading and initializing.
This is specified in the JVM Specification: Chapter 4.10. Verification of class Files .
The bulk of the page describes the various aspects of type safety. To check that the program is type-safe the verifier needs to figure out what types of operands reside in the operand stack at each program point, and make sure that they match the type expected by the respective instruction.
Other things it verifies include, but is not limited to the following:
Branches must be within the bounds of the code array for the method.
The targets of all control-flow instructions are each the start of an instruction. In the case of a wide instruction, the wide opcode is considered the start of the instruction, and the opcode giving the operation modified by that wide instruction is not considered to start an instruction. Branches into the middle of an instruction are disallowed.
No instruction can access or modify a local variable at an index greater than or equal to the number of local variables that its method indicates it allocates.
All references to the constant pool must be to an entry of the appropriate type. (For example, the instruction getfield must reference a field.)
The code does not end in the middle of an instruction.
Execution cannot fall off the end of the code.
For each exception handler, the starting and ending point of code protected by the handler must be at the beginning of an instruction or, in the case of the ending point, immediately past the end of the code. The starting point must be before the ending point. The exception handler code must start at a valid instruction, and it must not start at an opcode being modified by the wide instruction.
As a final step the verifier also performs a data-flow analysis, which makes sure that no instruction reference any uninitialized local variables.
Alternatively you might like to give it a look at the Java Language Environment white paper by James Gosling.
The bytecode verifier traverses the bytecodes, constructs the type
state information, and verifies the types of the parameters to all the
bytecode instructions.
The illustration shows the flow of data and control from Java language
source code through the Java compiler, to the class loader and
bytecode verifier and hence on to the Java virtual machine, which
contains the interpreter and runtime system. The important issue is
that the Java class loader and the bytecode verifier make no
assumptions about the primary source of the bytecode stream--the code
may have come from the local system, or it may have travelled halfway
around the planet. The bytecode verifier acts as a sort of gatekeeper:
it ensures that code passed to the Java interpreter is in a fit state
to be executed and can run without fear of breaking the Java
interpreter. Imported code is not allowed to execute by any means
until after it has passed the verifier's tests. Once the verifier is
done, a number of important properties are known:
There are no operand stack overflows or underflows
The types of the parameters of all bytecode instructions are known to always be correct
Object field accesses are known to be legal--private, public, or protected
While all this checking appears excruciatingly detailed, by the time
the bytecode verifier has done its work, the Java interpreter can
proceed, knowing that the code will run securely. Knowing these
properties makes the Java interpreter much faster, because it doesn't
have to check anything. There are no operand type checks and no stack
overflow checks. The interpreter can thus function at full speed
without compromising reliability.
It does the following:
There are no operand stack overflows or underflows
The types of the
parameters of all bytecode instructions are known to always be
correct
Object field accesses are known to be legal--private,
public, or protected
Reference:
http://java.sun.com/docs/white/langenv/Security.doc3.html
I heard about placement new operator of C++. I am confused what it is. However, I can see where it can be used under a question in stackoverflow. I am also confused whether we have this in java or not.
So my question is very precise: What is placement new operator and do we have something like it in java?
Note please, don't be confused with other questions on stackoverflow: they are not duplicate of this question.
The following article explains the meaning of placement new in C++: http://www.glenmccl.com/nd_cmp.htm
This term itself is relevant for overloaded new statement. Since Java does not allow to overload operators at all and specifically new operator the placement new is irrelevant for Java.
But you have several alternatives.
Using factory or builder pattern
Using wrapper/decorator pattern (probably together with factory) that allows changin some class functionality by wrapping its methods.
Aspect oriented programming. It works almost like decorator pattern but can be implemented using byte code modifiction.
Class loader interception
The term "placement new" itself is somewhat ambiguous. The term is used
in two different ways in the C++ standard, and thus by the C++
community.
The first meaning refers to any overloaded operator new function
which has more than one parameter. The additional parameters can be
used for just about anything—there are two examples in the
standard itself: operator new(size_t, void*) and operator new(size_t,
std::nothrow_t const&).
The second meaning refers to the specific overload operator new(size_t,
void*), which is used in fact to explicitly call the constructor of an
object on memory obtained from elsewhere: to separate allocation
from initialization. (It will be used in classes like std::vector,
for example, where capacity() may be greater than size().)
In Java, memory management is integrated into the language, and is not
part of the library, so there can be no equivalents.
Placement new allows to specify custom allocators that take extra parameters.
There is also a predefined placement allocator that takes as extra parameter a pointer and that just returns as result of allocation that pointer, basically allowing your code to create objects at the address you specify.
You can however define other types of allocators that take other parameters, for example our debug allocator takes as extra parameters the filename and the line on which the allocation is performed. Storing this extra information with the allocated object allows us to track back to the source code where has been created a certain object instance that for example got leaked or overwritten or used after deallocation.
AFAIK Java works at an higher conceptual level and has no pointer concept (only the null pointer exception ;-) ). Memory is just a black magic box and the programmer never use the idea of memory address.
I only knew Java 1.1 and back then decided to not invest time on that commercial product so may be the logical level of Java lowered enough today to reach the random access memory concept.
I am trying to record the arguments passed to a method before it is called using bytecode instrumentation.
Currently while instrumenting using java code I have to first pop all the args into a locals, then push them again twice (once for my method which will record and in this case all primitive types have to be converted to their boxed types, and once for the actual method call).
What I would ideally like to do is just duplicate the entire stack for the num of args pushed for the method call. However the jvm bytecode's dup() instruction only allows duplicating the topmost value of the stack.
Is it possible using JNI to somehow duplicate the entire stack in one go?
No. The stack effectively goes away when the method is compiled. The JVM has no way of compiling native code. So even if you did try to directly manipulate the stack, it would change format (and use registers) on the fly.
You can reasonably easily duplicate the top four slot of the stack (using dup2_x2), but any further and you'll probably need to use local variables.