How Java linker works? - java

I want to know how Java linker works. Specifically, in which order it combines classes, interfaces, packages, methods and etc into jvm-executable format. I have found some information here, but there is not so much information about linking order.

There is no such thing as a Java "linker". There is, however, the concept of a classloader which - given an array of java byte codes from "somewhere" - can create an internal representation of a Class which can then be used with new etc.
In this scenario interfaces are just special classes. Methods and fields are available when the class has been loaded.

First of all: methods are always part of a class. Interfaces are basically just special classes, and packages are just a part of the fully qualified name of a class with some impact on visibility and the physical organization of class files.
So the question comes down to: how does a JVM link class files? The JVM spec you linked to says:
The Java programming language allows
an implementation flexibility as to
when linking activities (and, because
of recursion, loading) take place,
provided that the semantics of the
language are respected, that a class
or interface is completely verified
and prepared before it is initialized,
and that errors detected during
linkage are thrown at a point in the
program where some action is taken by
the program that might require linkage
to the class or interface involved in
the error.
For example, an implementation may
choose to resolve each symbolic
reference in a class or interface
individually, only when it is used
(lazy or late resolution), or to
resolve them all at once, for example,
while the class is being verified
(static resolution). This means that
the resolution process may continue,
in some implementations, after a class
or interface has been initialized.
Thus, the question can only be answered for a specific JVM implementation.
Furthermore, it should never make a difference in the behaviour of Java programs, except possibly for the exact point where linking errors result in runtime Error instances being thrown.

Java doesn't do linking the way C does. The principle unit is the class definition. A lot of the matching of a class reference to its definition happens at runtime. So you could compile a class against one version of a library, but provide another version at runtime. If the relevant signatures match, everything will be ok. There's some in-lining of constants at compile time, but that's about it.

As noted previously Java compiler doesn't have a linker. However, JVM has a linking phase, which performed after class loading. JVM spec defines it at best:
Linking a class or interface involves verifying and preparing that
class or interface, its direct superclass, its direct superinterfaces,
and its element type (if it is an array type), if necessary.
Resolution of symbolic references in the class or interface is an
optional part of linking.
This specification allows an implementation flexibility as to when
linking activities (and, because of recursion, loading) take place,
provided that all of the following properties are maintained:
A class or interface is completely loaded before it is linked.
A class or interface is completely verified and prepared before it is
initialized.
Errors detected during linkage are thrown at a point in the program
where some action is taken by the program that might, directly or
indirectly, require linkage to the class or interface involved in the
error.
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4

Linking is one of the three activities performed by ClassLoaders. It includes verification, preparation, and (optionally) resolution.
Verification : It ensures the correctness of .class file i.e. it check whether this file is properly formatted and generated by valid compiler or not. If verification fails, we get run-time exception java.lang.VerifyError.
Preparation : JVM allocates memory for class variables and initializing the memory to default values.
Resolution : It is the process of replacing symbolic references from the type with direct references. It is done by searching into method area to locate the referenced entity.

Related

Does Java Class Linkage Resolution step OR Initialisation lead to loading of other resolved classes?

I was going through the JVM specification document and JLS , on the classloading mechanism in java .
Here is what I understand .
At first the when the main class is being asked to be loaded , it
looks if the binary representation of the class has been already
loaded or not , if not the class loader loads the class file from
the disk .
Linkage Steps: Verification ,Preparation and Resolution
Initialisation.
What I find confounding is , while in the Resolution and Initialisation steps if a class is referenced which has yet not been loaded from the source , what happens ? does the Resolution or Initialisation step pause for the Classloading to happen by it's parent classloader?
Or is the loading , Linking and Initialization deferred till the time actual method or code using that reference is executed at runtime ?
JVMS §5.4. Linking states:
Linking a class or interface involves verifying and preparing that class or interface, its direct superclass, its direct superinterfaces, and its element type (if it is an array type), if necessary. Resolution of symbolic references in the class or interface is an optional part of linking.
So when not talking about the direct supertypes of a class, the resolution is optional and may be deferred.
The same section also contains
For example, a Java Virtual Machine implementation may choose to resolve each symbolic reference in a class or interface individually when it is used ("lazy" or "late" resolution), or to resolve them all at once when the class is being verified ("eager" or "static" resolution). This means that the resolution process may continue, in some implementations, after a class or interface has been initialized.
So the process does not always strictly follow the graphic you’ve shown in the question. Instead, the resolution can be seen as an ongoing process.
In practice, in case of the HotSpot JVM, some classes have to get resolved immediately, like the superclasses. Other classes are resolved when verifying code of a method, which happens right before the first execution of a method for this JVM.
This does not affect all classes referenced by a method’s code but depend on the actual type use, e.g. HotSpot’s verifier will resolve types for checking the validity of assignments against the actual type hierarchy, but skip this step, if a type is assigned to itself or to java.lang.Object, i.e. where the assignment is always valid. So some types may get resolved only at their first actual use, e.g. when they are instantiated via new or a static method declared by the type is invoked. But this depends on subtle aspects of the code. See also When is a Java Class loaded? or Does the JVM throw if an unused class is absent?
There might be types referenced only in reflective data like annotations or debug attributes which get never resolved during a run, but may be in another.
But since this implies that the resolution of a type is deferred to the point when it is actually needed, it also implies that right at this point, the operation will stop and wait for the completion of this process for the prerequisite classes. So, e.g. loading a class always implies resolving its direct superclass, loading it if not already loaded, which in turn implies resolving of the superclass of the superclass and so on. So it won’t return before the complete super class hierarchy has been resolved.
The JVMS also states in Section 5.3
If the Java Virtual Machine ever attempts to load a class C during verification
(§5.4.1) or resolution (§5.4.3) (but not initialization (§5.5)), and the class loader
that is used to initiate loading of C throws an instance of ClassNotFoundException ,
then the Java Virtual Machine must throw an instance of NoClassDefFoundError
whose cause is the instance of ClassNotFoundException .
(A subtlety here is that recursive class loading to load superclasses is performed
as part of resolution (§5.3.5, step 3). Therefore, a ClassNotFoundException that
results from a class loader failing to load a superclass must be wrapped in a
NoClassDefFoundError .)
So there is indeed a recursion going on in the resolution phase of the classloading.

How does Java restricts Multiple inheritance?

While reading about the multiple inheritance or diamond problem in Java I realized that it is not supported. But I wonder how does Java actually restricts multiple inheritance?
Is there any class to check if the programmer is passing more than one class name after extends keyword or some other way to detect this functionality?
I read few articles but nothing suggest about how exactly Java prevents Multiple inheritance except the one common answer that it throws an error: classname is inheriting multiple classes.
But I wonder how does Java actually restricts multiple inheritance?
It is disallowed at the syntax level. The syntax for a class declaration allows one class name after the extends keyword. And the names in the implements list must be interface names not class names. See Section 8.1 Class Declarations of the JLS. The compiler checks both of these things. Java source code that attempts to declare multiple super-classes will not compile.
At the implementation level, the format for a ".class" file only allows one class to be listed as the super_class; see the ClassFile structure in Section 4.1 of the JVM spec. The identifiers in the interfaces must all refer to interfaces. The various classfile constraints specified in the JVM spec are enforced by the JVM's native classloader.
If you want to see how these restrictions are enforced, you can download an OpenJDK source tree and read the code for yourself. (I don't see the point though. All you really need to know is that the restrictions are strictly enforced and there is no practical way to get around that enforcement.)
If you try to extend more than one class, the compiler will actually complain, and state error: '{' expected. If you are interested in what part of the JDK actually does this, I suggest taking a look at the OpenJDK sources. Source code for the javac parser can be found here.
As a side note, Java disallows multiple inheritance of state, which is what you are referring to. You can still achieve multiple inheritance of behavior through implementing multiple interfaces, though.

Inheritance of final class from the Java internals perspective

While declaring a class as final , we cannot Inheritance this class , my question is why ? - from the java internals perspective.
I assume that the same principle apply to methods and instance as well.
is it somehow related to the class loader as well ? who is actually stopping me from override it?
There's nothing related to the JVM or internals (not really sure what exaclty you mean by that), it's a compile issue simply because you're breaking the rules.
If I think myself as a Java compiler, after parsing the tokens in your code I'm just going to look around for logical errors (semantic analysis) e.g. a circular inheritance scheme. The moment I see someone's attempt at extending a final class, I'm gonna go bazooka. That's it. No need to wake up the big bosses, the JVM or any other internals because the program cannot be correctly compiled in the first place.
If you want to know how the compiler works the way it does internally, think that while the compiler parses your code, it creates and fills some structures internal to itself for the purpose of error-checking and bytecode-translation. Also imagine in a simplified scenario that the final keyword attached to a class just sets a field in one of these structures attached to your class. After syntactic analysis, the compiler goes on with "logical" (semantic) analysis and checks (among other things) if some lunatic tries extending a final class. Even a brute search in an inheritance graph can pull that off. If a class is final and still has children, halt and notify the lunatic. The issue won't get more internal than the compiler.
It is nothing to do with Java internals.
The purpose of declaring a class to be final it to prevent it from being subclassed.
My question was what happening "underground" while declaring final ...
Well ... when a class is declared as final a flag is set in the class file to say this. If you then attempt to load a class that purports to be a subclass of a final class, the classloader will throw a VerifyError exception. The checks are done in the ClassLoader.defineClass(...) methods ... which are also final, so that normal programs can't interfere with them.
This aspect of classfile verification needs to be watertight for Java security reasons. If it wasn't then you could probably cause mayhem in a Java security sandbox by tricking trusted code into using (say) a mutable subtype of String.
The Java compiler also checks that you don't extend a final class, but you could subvert that by (for example) creating ".class" files by hand. Hence the need for load-time checks ...
Who is actually stopping me from override it?
Actually, it is the classloader. See above.
Let's look at it elementally, When you declare a variable as final, you did that because you don't want the value of that variable be changed for any reason afterwards, Right?.
Okay, under the assumption that you agree to that. The same principle is also applicable to classes.
Let's look at it this way: Why will you ever want to inherit a class? Probably because you want get access to the properties of the class and her behaviors (methods), Right? Once you have inherited these properties and behaviors you have the right the modify the accessible behavior to suite your precise need without having to re-implement all other behaviors. This is the value and power of in inheritance.
Hence, declaring a class as final implies that you don't want anyone to modify any behavior of the class. You tries to state that who so ever that will want use your class should use it as IS.
Therefore, any attempt to modify a final class is illogical and should be considered as error.
Eg.
Imaging if someone should be able to inherit your final Authentication class and modifying the actual authentication behavior (method). This should be a security bridge as it might compromise your reasons for setting the class as final.
Hence, it is a design practice.
I hope that make some sense?

JVM tries to load a class that isn't called

I am writing code that adds functions to a 'mod' if it exists in the classpath (referenced by pixelmonPresent)
PixelHammerTool extends ItemHammer
, ItemHammer only exists if pixelmon is present
The issue im having is, if i do this in the class (same package)
if(Basemod.pixelmonPresent) {
rubyHammer = new PixelHammerTool(Basemod.RUBY, "pixelutilitys:RubyHammer", "rubyHammer");
}
It will cause a class not found on PixelHammerTool,
Why is this being called if the if statement is false and what can i do about it ?
The why is simple and straightforward: because when a class is loaded, all the classes referenced by it are loaded too. (In fact they are loaded first.)
Avoiding it isn't too complicated either, although the code won't look nice: you need to load the class with reflection, using Class.forName(), find the constructor you want from the array returned by Class.getConstructors() and then create an instance using Constructor.newInstance().
Note that while if it only happens a few times in your code, this solution is fine, if you find yourself doing this a lot then you should probably look for a dependency injection framework that will do the heavy lifting for you.
Under the Linking section in the specs, we see this:
For example, a Java Virtual Machine implementation may choose to resolve each symbolic reference in a class or interface individually when it is used ("lazy" or "late" resolution), or to resolve them all at once when the class is being verified ("eager" or "static" resolution). This means that the resolution process may continue, in some implementations, after a class or interface has been initialized. Whichever strategy is followed, any error detected during resolution must be thrown at a point in the program that (directly or indirectly) uses a symbolic reference to the class or interface.
So when the constant has to be defined is implementation-dependent, based on the class loader. The behavior you're seeing is consistent with the "eager" resolution mentioned: when you reference PixelHammerTool in your code, even if it's for a runtime path that will never be hit, the class loader tries to link in its definition, which does not exist.
This strategy causes the JVM to start slower but execute faster at runtime, which is generally the strategy taken in all the implementations I'm familiar with. Indeed, the default class loader is given the name "bootstrap class loader" because it has this behavior - load classes at JVM bootstrap time.
You can either instantiate the class via reflection, as biziclop suggested (the easier route), which forces linking at runtime, or find or create a class loader that instantiates classes lazily.

JVM verifierification -when is it performed?

I would like to know in what situations exactly would the verifier in JVM kick in and check the class. I know one such instance is when you load the class, but sometimes class is loaded and later on verified. That's why I want to know precisely when that happens.
The spec (§4.10) says the following:
A Java virtual machine implementation verifies that each class file
satisfies the necessary constraints at linking time (§5.4).
§5.4 defines what exactly "linking time" means:
Linking a class or interface involves verifying and preparing that
class or interface, its direct superclass, its direct superinterfaces,
and its element type (if it is an array type), if necessary.
Resolution of symbolic references in the class or interface is an
optional part of linking.
This specification allows an implementation flexibility as to when
linking activities (and, because of recursion, loading) take place,
provided that all of the following properties are maintained:
A class or interface is completely loaded before it is linked.
A class or interface is completely verified and prepared before it is initialized.
Errors detected during linkage are thrown at a point in the program where some action is taken by the program that might, directly
or indirectly, require linkage to the class or interface involved in
the error.
For example, a Java virtual machine implementation may choose to
resolve each symbolic reference in a class or interface individually
when it is used ("lazy" or "late" resolution), or to resolve them all
at once when the class is being verified ("eager" or "static"
resolution). This means that the resolution process may continue, in
some implementations, after a class or interface has been initialized.
Whichever strategy is followed, any error detected during resolution
must be thrown at a point in the program that (directly or indirectly)
uses a symbolic reference to the class or interface.
Note as a matter of fact at least Hotspot is doing lazy initialization as described (and I'd be extremely surprised if JRockit and co did otherwise).
Source:
http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.10
http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4

Categories