How does the java compiler find classes without header files? - java

When we refer to a class className in jar, how does it know whether it's defined or not when there's no header files(like in c/c++) ?

Java works with classloaders. Classes are needed for compilation, since it will perform static type checking to ensure that you are using the correct signatures of every method.
After compiling them, though, they are not linked like you have in a C/C++ compiler so basically every .class file is standalone. Of course this means that you will have to provide compiled classed used by your program when you are going to execute it. So it's a little bit different from how C and C++ prepare executables. You don't actually have a linking phase at all, it is not needed.
The classloader will dinamically load them by adding them to the runtime base used by the JVM.
Actually there are many classloaders that are used by the JVM that have different permissions and properties, you can also invoke it explicitly to ask for a class to be loaded. What happens can also be a sort of "lazy" loading in which the compiled .class code is loaded just when needed (and this loading process can throw a ClassNotFoundException if the asked class is not inside the classpath)

When you run the Java compiler or your application itself, you can specify a classpath which lists all the jars and directories you're loading classes from. A jar just contains a bunch of class files; these files have enough metadata in them that no extra header files are necessary.

The classes in the jar file contain all the required information (class names, method signatures etc) so header files are not needed.
When you compile multiple classes javac is clever enough to compile dependencies automatically so the system still works.

It looks at the classpath and tries to load the class from there to get its definition.

Java files are compiled into class files which are java bytecode. These class files reside in a file structure where the top level is pointed to by the classpath variable. Compiling in C/C++ creates object files which can be linked into executable binaries. Java only compiles into bytecode files which are pulled in by the JVM at runtime. The following provide more explanation.
http://en.wikipedia.org/wiki/Java_bytecode
http://en.wikipedia.org/wiki/Java_compiler
http://en.wikipedia.org/wiki/Java_Virtual_Machine

Related

JVM language interoperability

Recently I've been writing a compiler for a JVM programming language and I've realised a problem.
I would like to access a Java method from my programming language and also allow a Java method to access a method in my language. The problem is that I need to know the Java methods signature to call it in the bytecode I generate and vice versa.
I've been trying to think of any methods for how Scala does this. Here are my thoughts.
Scala accesses the .java files on the class path and parses them, extracting the method signatures from there.
.java files are compiled to .class files. The Java ASM library is then used to access the .class files and get the method signatures. The problem with this method is that the .java files must be compiled first.
.java files are loaded dynamically using reflection. The problem with this is I believe that the JVM doesn't allow for loading classes that are outside of the compilers class path.
Looking into Scala it works well with other JVM languages but I can't find information on exactly how it does it.
How does Scala get method signatures of other JVM language methods?
I think you are confusing class path and source path: there are no .java or .scala files on the class path, there are .class files (possibly inside .jars). So for dependencies (on the class path), you don't need to do anything special. They can have their own dependencies on your language, including previous versions of your project, but they are already compiled by definition.
Now, for mixed projects, where you have Java and your language on the source path, scalac does parse Java with its own parser (i.e. your option 1).
The problem with option 3 is not that "the JVM doesn't allow for loading classes that are outside of the compilers class path", but that reflection also only works on classes, not on source files.

How does Java link lib/rt.jar to your app at runtime?

The Java standard/system libraries (java.*, javax.*, etc.) are stored in lib/rt.jar inside each JRE distribution.
Say I have an application that I have compiled and jarred into myapp.jar. This JAR only contains my app's class files, and merely references system classes like System, File, Runtime, Thread, String, Boolean, etc.
So when I run my app, say via java -jar myapp.jar, the JVM is obviously doing so last minute linking (or something) where it is executing the bytecode of my class files (inside myapp.jar) and then "jumping" into lib/rt.jar to run bytecode located there. I would imagine the process is the same if myapp.jar depends on other JARs provided on the runtime classpath.
My question is: what is this "linking" process called, and how does it essentially work?
That rt.jar is part of the bootstrap classpath, a parent of the usual classpath you already know and that you configure when you use the -cp option (you can actually change the bootstrap classpath too using the -Xbootclasspath option to load, for example, a custom Java runtime).
See Oracle documentation for a detailed description of how classes are searched/loaded from the system defined classpaths hierarchy.
Now, the additional questions you seemed to have:
How is the archive actually found?
It's simply hardcoded. If the java binary is located in <common_root>/bin/java, rt.jar will be searched in <common_root>/lib/rt.jar.
How is the "linking" performed?
On the JVM there is no actual linking, the classes are dynamically loaded using a mechanism based on a hierarchy of ClassLoader that are the software components that actually do the class file loading/parsing. When you try to load a class, the search starts from the application-facing default classloader(or a child classloader if you have defined one) and if the class cannot be loaded the loading attempt is repeated with a parent classloader until the bootstrap classloader is reached.
If the class is found, the .class file is loaded, parsed and internal structures representing the class and its data are created.Once the class is loaded a new instance can be created.
If instead, even the boot classloader could not load your class a user-visible ClassNotFoundException is thrown.

Compile and execute java source file in memory

Say I have a String containing the content of a .java file. Are any APIs out there that would allow me to compile this source file into a virtual .class file (i.e. generate and store the content in memory, NOT creating an actual physical .class file on disk)? And this "virtual" .class would then be loaded and executed in the JVM?
Edit 1: The only reason I want to do this is because sometimes, my application might not have the write permission.
Use the JavaCompiler for this. I think the trick will be to define a custom JavaFileManager.
Java does have a compilation API to compile files dynamically, but I'm not aware of an option that would not persist the class files to disk. You can always use a ClassLoader and load those classes dynamically and then use them. You might be able to load the classes in memory by overriding the getFileForOutput method.
Optionally, this file manager might consider the sibling as a hint for
where to place the output. The exact semantics of this hint is
unspecified. The JDK compiler, javac, for example, will place class
files in the same directories as originating source files unless a
class file output directory is provided. To facilitate this behavior,
javac might provide the originating source file as sibling when
calling this method.
Another option is to use an Interpreter like BeanShell that will run the java code for you. It executes script like code and can work in repl mode.

Java packages and compilation (why, not how)

I'm working on some Java code in eclipse. Code is contained in a single class called Adder, which in Eclipse is in the package org.processing. The first thing in the class file is the line
package org.processing
Q1) What, exactly is this line doing? Why is there, what's it's role.
The code runs fine in eclipse, however, when I move into the workspace if I go to the src/org/processing/ folder in src, compile with javac Adder.class when I try and run using java Adder I get the following error
java.lang.NoClassDefFoundError: Adder (wrong name: org/processing/Adder)
On the other hand, if I compile from src using
javac org/processing/Adder.java
and I can run it from src using java org.processing.Adder but STILL not from within the processing directory.
Q2) Does this mean that compilation is always relative to directory structure?
Finally, if I remove the package org.processing line from the start are the .class file I can compile and run from within the .class file's directory.
Q3) Why is all this the way it is? I can fully understand enforcing a directory structure for code development, but once you're in bytecode this seems a bit over the top, because now I can (apparently) only run the bytecode from one director (src) using java org.processing.Adder. Now, I'm sure I'm missing the point here, so if someone could point out what it is, that would be great.
The compiler has to be able to find related source code files when compiling. This is why the package and directory structure must agree for source code. Similarly, the JVM must be able to find referenced .class files. So the same directory structure is required at runtime. It's no more complex than that.
Q1) The issue here is that once you got into the folders that represent your package hierarchy, you set that as the working directory. It's gonna look inside of org/processing/Adder for the path org/processing/Adder (essentially looking from the root for org/processing/Adder/org/processing/Adder). You need to call it from the root with the full path. The purpose of packages is A: to organize related classes into groups. And B: Along with A, classes in package Foo.bar can't view private classes in other packages, as they are like internal classes for that package, only the package they're in can use them
Q2) Yes
Q3) The paths are used as a basic structure for the JVM to know where exactly the class files (each containing their bytecode) are. If you change where you call it from, your basically trying to change the location for the JVM to look for the class files, but their true location hasn't changed.
The short answer - Packages help keep your project structure well-organized, allow you to reuse names (try having two classes named Account), and are a general convention for very large projects. They're nothing more than folder structures, but why they're used can burn beginners pretty badly. Funnily enough, with a project less than 5 classes, you probably won't need it.
What, exactly is this line doing? Why is there, what's it's role.
The line
package org.processing
is telling Java that this class file lives in a folder called /org/processing. This allows you to have a class which is fully defined as org.processing.Processor here, and in another folder - let's say /org/account/processing, you can have a class that's fully defined as org.account.processing.Processor. Yes, both use the same name, but they won't collide - they're in different packages. If you do decide to use them in the same class, you would have to be explicit about which one you want to use, either through the use of either import statements or the fully qualified object name.
Does this mean that compilation is always relative to directory structure?
Yes. Java and most other languages have a concept known as a classpath. Anything on this classpath can be compiled and run, and by default, the current directory you're in is on the classpath for compilation and execution. To place other files on the classpath, you would have to use another command-line invocation to your compilation:
javac -sourcepath /path/to/source MainClass.java
...and this would compile everything in your source path to your current directory, neatly organized in the folder structure specified by your package statements.
To run them, as you've already established, you would need to include the compiled source in your classpath, and then execute via the fully qualified object name:
java -cp /path/to/source org.main.MainClass
Why is all this the way it is?
Like I said before, this is mostly useful for very large projects, or projects that involve a lot of other classes and demand structure/organization, such as Android. It does a few things:
It keeps source organized in an easy-to-locate structure. You don't have objects scattered all over the place.
It keeps the scope of your objects clear. If I had a package named org.music.db, then it's pretty clear that I'm messing with objects that deal with the database and persistence. If I had a package named org.music.gui, then it's clear that this package deals with the presentation side. This can help when you want to create a new feature, or update/refactor an existing one; you can remember what it does, but you can't recall its name exactly.
It allows you to have objects with the same name. There is more than one type of Map out there, and if you're using projects that pull that in, you'd want to be able to specify which Map you get - again, accomplished through either imports or the fully qualified object name.
For Q1: The package declaration allows you to guarantee that your class will never be mistaken for another class with the same name. This is why most programmers put their company's name in the package; it's unlikely that there will be a conflict.
For Q2: There is a one-to-one correspondence between the package structure and the directory structure. The short of it is that directories and packages must be the same, excepting the package is usually rooted under a folder called src.
For Q3: Once it's compiled, the class files will probably be in the appropriate folders in a jar file. Your ant or maven tasks will build the jar file so you won't really have to bother with it beyond getting the ant task set up the first time.

How JVM starts looking for classes?

I was curious about what all locations JVM looks for executing a program? I'm more interested in understanding in what sequence and where does JVM look for class files, like does it look into java libs, extension libs, classpath any directory like the current directory from where the java is invoked? I'm more interested in JVM behaviour and not how class loader load class, which I know has parent delegation mechanism till root.
If a class is executed from directory where the compiled class is kept on file system and also in a jar file in the same directory, would JVM load both or just one and which one?
Say you have a thread unsafe Vector and if we compare it performance to ArrayList, which one would be better and why?
How classes are found.
Answer is here:
http://docs.oracle.com/javase/1.5.0/docs/tooldocs/findingclasses.html
Answer for point 2:
Order of finding classes is as follows:
classes or packages in current directory.
classes found from CLASSPATH environment variable. [overrides 1]
classes found from -classpath command line option. [overrides 1,2]
classes found from jar archives specified via -jar command line option [overrides 1,2,3]
So if you use -jar option while running, classes come from jarfile.
Only one class is loaded though.
Without using any additional classloader:
Search order for a JVM:
Runtime classes (basically, rt.jar in $JRE_HOME/lib`)
Extension classes (some JARs in $JRE_HOME/lib/ext`)
Classpath, in order. There are four possibilities for specifying classpath:
If -jar was specified, then that JAR is in the classpath. Whatever classpath is declared as classpath in META-INF/MANIFEST.MF is also considered.
Else, if -cp was specified, that is the classpath.
Else, if $CLASSPATH is set, that is the classpath.
Else, the current directory from which java has been launched is the classpath.
So, if I specify -cp src/A.jar:src/B.jar, then A.jar will be searched first, then B.jar
The JVM loads only the class that is found first, according to the order in which the directories/JARs are declared in the classpath. This is important if you use -cp or $CLASSPATH.
In single thread scenarios and with recent JVMs, Vector and ArrayList should have similar performance (ArrayList should perform slightly better as it is not synchronized, but locking is fast currently when there is no contention, so the difference should be small). Anyway, Vector is obsolete: don't use it in new code.
I believe Java looks in the current directory, then at the class path, per the "-cp" VM argument. You can put any combination of folders of classes (e.g. /project/bin/com/putable), specific class files (e.g. /project/bin/com/putable/MyClass.class), and JAR files (e.g. /project/lib/MyJar.jar) on the class path. Locations are separated by either a colon (Unix-based OSes) or semicolon (Windows-based OSes). So anything on the classpath is fair game for Java to look at when obtaining class definitions. With respect to sequence, classes are loaded lazily. So they only get loaded when your application first requires them. If your application doesn't require a certain class during the duration of its runtime, then that class will NEVER get loaded.
If you don't put anything on the class path, I think Java will load from the class file and not the Jar. If you specify one or the other on the classpath, then that's the place Java will look for. If you put both on the classpath, Java's class-loading behavior is undefined and it could pick either, depending on the JVM implementation.
Depends on what you want to do. Vectors are actually always thread safe, per the Java API, so if you don't require concurrent access, the ArrayList will be faster. Vectors and ArrayLists are both backed by arrays, but they increase capacity at different rates (Vector capacity doubles whenever the end is reached and more space is needed, but ArrayList increases by 50%). Depending on how often you have to grow or shrink, the answer will vary. Check out this link for more info:
http://www.javaworld.com/javaworld/javaqa/2001-06/03-qa-0622-vector.html
I'm more interested in JVM behaviour and not how class loader load
class
Sorry, but this is nonsensical.
Because the answer is that the JVM creates a class loader and let's this class loader load the classes.
So, in order to understand the "JVM behaviour" you need to understand the class loader behaviour.
But maybe your question was: how does the JVM create the system class loader?
The accepted answer is already correct but there is a more detailed and updated official spec in How Classes are Found.
Some caveats as:
A class file has a subpath name that reflects the class's fully-qualified name. For example, if the class com.mypackage.MyClass is stored under /myclasses, then /myclasses must be in the user class path and the full path to the class file must be /myclasses/com/mypackage/MyClass.class. If the class is stored in an archive named myclasses.jar, then myclasses.jar must be in the user class path, and the class file must be stored in the archive as com/mypackage/MyClass.class.
And the priorities in How the Java Launcher Finds User Classes
The default value, ".", meaning that user class files are all the class files in the current directory (or under it, if in a package).
The value of the CLASSPATH environment variable, which overrides the default value.
The value of the -cp or -classpath command line option, which overrides both the default value and the CLASSPATH value.
The JAR archive specified by the -jar option, which overrides all other values. If this option is used, all user classes must come
from the specified archive.

Categories