After hours of research I haven't found a concrete answer for my question and I'm going maddd!:
The steps from editing to execution:
1 . (Compilation step) After writing the source code, i compile the program. In this step it is converted into bytecode. A java.class file (the bytecode) is generated.
2 .(Execution step) Now i execute the program.
(Interpretation step) When I do this, the JVM interprets the bytecode into machine code. So I understand that the machine code is only generated after execution!??
Now the steps are: code-->bytecode-->execution-->machinecode
All these steps are hardware- and software-independent.
Am i right?
This is called JIT (just in time compilation), so that when I execute the program the bytecode is compiled into machinecode, and only then.
So why is this step called interpretation?
I'm thanking you in advance for your answers!
In short because JVM doesn't have to have JIT. It can interpret the bytecode instead of compiling it. Of course an interpret-only JVM would be slow, but the JIT part is just an extra feature to improve performance, not a required property of a Java Virtual Machine. The -Xint command line parameter can be used to run a java program in interpret-only mode.
The reason it's compiled to bytecode and not machine code is to get the platform independence. Bytecode is platform independent, so the same code can run on any platform (as long as there's the JVM to interpret it). If it were compiled into machine code, it would be operating system and processor architecture dependent.
(Interpretating step) When i do this, the JVM interprets the bytecode into machine code. So i understand that the machine code is only generated after execution!??
Not exactly, and no. A JVM operating strictly as a bytecode interpreter does not transform bytecode into machine code and then execute that. The machine code executed by such a JVM is (comprised by) the pre-existing machine code of the JVM itself. The byte code is used to provide some of the data on which to operate and to direct which of the JVM's machine code is executed.
Now the steps are: code-->bytecode-->execution-->machinecode
All these steps are hardware- and software-independent. Am i right?
No, not at all. The particulars of the Java code --> bytecode transformation are somewhat dependent on which Java compiler (software) you use. The Java virtual machine you use must be specific to the hardware on which it runs, and it is itself a piece of software. Moreover, the operating environment is influenced by a lot of other software.
Java hardware independence, such as it is, means that a Java program (bytecode) will behave consistently on any hardware, but the details of how that consistent behavior is provided on any given machine are all kinds of hardware- and software-dependent.
This is called JIT(just in time compilation), so that when I execute the program the bytecode is compiled into machinecode, and only then. But why is this step called interpretating?
JIT is something else, and a JVM that performs JIT (as in fact most do) is not strictly an interpreter. Most such JVMs run some bytecode in an interpretative manner as described above, but compile some bytecode to native (machine) code, and run that machine code directly when subsequently needed. The latter manner of execution generally isn't called "interpreting".
Related
This question already has answers here:
Why a virtual machine is needed execute a java program.? [duplicate]
(3 answers)
Closed 2 years ago.
I am trying to wrap my head around the Java Virtual Machine and why it uses bytecode. I know it has been asked so many times, but somehow I couldn't finally make the correct assumption, so I researched many things and decided to explain how I think it works and if it's correct.
I understand that in C++, compiler compiles the source code on the specific (architecture + operating system). So, compiled version of C++ for (x86 + Windows) won't run on any other architecture or operating system.
My assumptions
When Java compiler compiles the source code into bytecode, It doesn't do the compilation depending on architecture or operating system. The source code will always be compiled to the same bytecode if it's compiled on Windows or Mac. Let's say we compiled and now, send the bytecode to another computer (x86 + windows). In order for that computer to run this bytecode, It needs JVM. Now, JVM knows what architecture + operating system it's running on. (x86 + windows). So, JVM will compile bytecode to x86 + Windows and it will produce machine code which can be run by the actual computer now.
So, even though we use Java Virtual Machine, we still run the actual machine code on our operating system and not on the virtual machine. Virtual Machine just helps us to transform bytecode into machine code.
This means that when using Java, the only thing we have to worry about is installing JVM and that's it.
I just always thought that the Virtual Machine is just a computer itself where it would run the code in its own isolated place, but in case of JVM, i don't think that's correct, because I think machine code JVM produces still has to be run on the actual operating system we have.
Do you think my assumptions are correct?
When Java compiler compiles the source code into bytecode, It doesn't do the compilation depending on architecture or operating system. The source code will always be compiled to the same bytecode if it's compiled on Windows or Mac.
All correct.
Let's say we compiled and now, send the bytecode to another computer (x86 + windows). In order for that computer to run this bytecode, It needs JVM. Now, JVM knows what architecture + operating system it's running on. (x86 + windows). So, JVM will compile bytecode to x86 + windows and it will produce machine code which can be run by the actual computer now.
This is mostly correct, but there are a couple things that are a bit "off"
First of all, to execute the bytecodes on any computer you need a JVM. That includes the computer on which you compiled the bytecodes.
(It is theoretically possible that a computer could be designed and implemented to execute the JVM bytecode instruction set as its native instruction set. But I don't know if anyone has ever seriously contemplated doing this. It would be pointless. Performance would not be comparable with hardware that you can by for a couple of hundred dollars. The JVM bytecode instruction set is designed to be compact and simple, and it is relatively easy to JIT compile. Not to be executed efficiently.)
Secondly a typical JVM actually operates in two modes:
It starts out executing the bytecodes in software using an interpreter.
After a bit, it selectively compiles bytecodes of heavily used methods to the platform's native instruction set and executes the native code. The compilation is done using a JIT compiler.
Note that the JIT compiler is platform specific.
So, even though, we use Java Virtual Machine, we still run the actual machine code on our operating system and not on the virtual machine.
That is correct.
[The Java] Virtual Machine just helps us to transform bytecode into machine code.
The JVM actually does a lot more. Things like:
Garbage collection
Bytecode loading and verification
Implementing reflection
Providing native code methods for bridging between Java classes and operating system functionality
Implementing infrastructure for monitoring, profiling, debugging and so on.
This means that when using Java, the only thing we have to worry about is installing JVM and that's it.
Yes. But in a modern JDK there are other alternatives; e.g. jlink will generate an executable that has a cut-down JRE embedded in it so that you don't need to install a JRE. And GraalVM supports ahead of time (AOT) compilation.
I just always thought that the Virtual Machine is just a computer itself where it would run the code in its own isolated place, but in case of JVM, i don't think that's correct, because I think machine code JVM produces still has to be run on the actual operating system we have.
Ah yes.
The term "virtual machine" has multiple meanings:
A Java Virtual Machine "executes" Java bytecode ... in the sense above.
A Linux or Windows virtual machine is where the user's application and the guest operating system are running under the control of a "hypervisor" operating system. The applications and guest OS use the native hardware to execute instructions, but they don't have full control of the hardware.
And there are potentially other shades of meaning.
If you conflate JVMs with other kinds of virtual machine, you can get yourself in knots. Don't. They are different enough that conflating the concepts in not going to help you understand.
When compiling java code, java-bytecode is created.
This bytecode is platform independent and can be run by the JVM. This means that other programming languages like Kotlin can also generate this java-bytecode and target the JVM.
So you are correct about the java-bytecode being the same for every platform.
Where you aren't entirely correct however is, that the JVM converts the java-bytecode to native bytecode.
The JVM executes the java-bytecode instead and gives the instructions to the operating system. While executing, the JVM compiles bunches of java-bytecode to native code, which the operating system can then execute. This happens while the code is executed so a class that is never loaded will also never be executed and will never end up as machine code.
Due to this we can use features like Reflection where some of the java-bytecode of a class files can be modified at runtime before being compiled to native machine code.
The JVM basically sits between the operating system and the java-bytecode. In some sense just like a normal VM.
The article below has a nice visualization of the JVM.
https://www.guru99.com/java-virtual-machine-jvm.html
The java byte code is a form of intermediate code.
That code can be interpreted, a JVM is such an interpreter emulating (simulating) every single byte code instruction. This is slow but easily portable.
That code can be compiled to native code, normally a JVM contains a just-in-time compiler for code that is indeed executed. This native code can be optimized for the current machine it is running on, and the native code does not need to be loaded from file, and code inlining can be done.
So the JVM principly is a turbo-charged interpreter, of java byte code, portable to many platforms.
Context:
[Other languages] Microsoft's C# language is more of a compiler.
[Other intermediate codes] LLVM IR is an intermediate code from the C side, compilable.
[Other JVMs] There are more than one JVM, with different techniques.
I am a beginner in java programming course and so far this is what I have understood about the whole java program being compiled and executed. Stating in brief:-
1) Source code (.java) file is converted into bytecode(.class) (which is an intermediate code) by the java compiler.
2) This bytecode(.class) file is platform independent so wooosh....I can copy it and take it to a different platform machine which has JVM.
3) When I run the bytecode The JVM which is a part of JRE first verifies the
bytecode, calls out JIT which at runtime makes the optimizations since
it
has access to dynamic
runtime information.
4) And finally JVM interprets the intermediate code into a
series of machine instructions for the processor to execute. (A processor can't execute the bytecode directly since it is not in native code)
Is my understanding correct? Anything that needs to be added or corrected?
Taking each of your points in turn:
1) This is correct. Java source is compiled by javac (although other tools could do the same thing) and class files are generated.
2) Again, correct. Class files contain platform-neutral bytecodes. These are loosely an instruction set for a 'virtual' machine (i.e. the JVM). This is how Java implements the "write once, run anywhere" idea it's had since it was launched.
3) Partially correct. When the JVM needs to load a class it runs a four-phase verification on the bytecodes of that class to ensure that the format of the bytecodes is legal in terms of the JVM. This is to prevent bytecode sequences being generated that could potentially subvert the JVM (i.e. virus-like behaviour). The JVM does not, however, run the JIT at this point. When bytecodes are executed they start in interpreted mode. Each bytecode is converted on the fly to the required native instructions and OS system calls.
4) This is sort of wrong when combined with point 3.
Here's the process explained briefly:
As the JVM interprets the bytecodes of the application it also profiles which groups of bytecodes are being run frequently. If you have a loop that repeatedly calls a method the JVM will notice this and identify that this is a hotspot in your code (hence the name of the Oracle JVM). Once a method has been called enough times (which is tunable), the JVM will call the Just In Time (JIT) compiler to generate native instructions for that method. When the method is called again the native code is used, eliminating the need for interpreting and thus improving the speed of the application. This profiling phase is what leads to the 'warm-up' behaviour of a Java application where relevant sections of the code are gradually compiled into native instructions.
For OpenJDK based JVMs there are two JIT compilers, C1 and C2 (sometimes called client and server). The C1 JIT will warm-up more quickly but have a lower optimum level of performance. C2 warms-up more slowly but applies a greater level of optimisation to the code, giving a higher overall performance level.
The JVM can also throw away compiled code, either because it hasn't been used for a long time (like in a cache) or an assumption that the JIT made (called a speculative optimisation) turns out to be wrong. This is called a deopt and results in the JVM going back to interpreted mode, reprofiling the code and potentially recompiling it with the JIT.
First and foremost, java is only a programming language. That means you could (theoretically) run a compiler to generate a native binary instad of this bytecode. (See: Compiling a java program into an executable )
The other thing I should mention are Java Processors which are able to execute java bytecode directly... because its their native instruction set (See: https://en.wikipedia.org/wiki/Java_processor )
I am trying to understand how .class files work in java and what's their purpose. I found some information online, but I get unsatisfying explanations.
As soon as we run the compiler we get the .class file, which is bytecode. Is this machine readable or not? And if not, this is why we need the interpreter for the program to run successfully?
Also, since the .class file is the equivalent of our .java programs, why can't somebody run a java program straight away by just running the .class file using VM and they would need to have the .java file as well?
The JVM is by definition a virtual machine, that is a software machine that simulates what a real machine does. Like real machines it has an instruction set (the bytecodes), a virtual computer architecture and an execution model. It is capable of running code written with this virtual instruction set, pretty much like a real machine can run machine code.
So, the class files contain the instructions in the virtual instruction set, and it is capable of running them. For that matter, a virtual machine can either interpret the code itself or compile it for the hardware architecture it is currently running. Some do both, some do just one of them (e.g. .net runtime compiles once the first time the method is called).
For instance, the Java HotSpot initially interprets bytecodes, and progressively compiles the code into machine code. This is called adaptive optimization. Some virtual machines always compile to machine code directly.
So, you can see there are two different "compiling concepts". One consists in the transformation of Java code to JVM bytecodes (From .java to .class). And a second compilation phase happens when the program runs, where the bytecodes may either be interpreted or compiled to actual machine code. This is done by the just-in-time compiler, within the JVM.
So, as you can see, a computer cannot run a Java program directly because the program is not written in a language that the computer understands. It is written in lingua-franca that all JVM implementations can understand. And there are implementations of that JVM for many operating systems and hardware architectures. These JVMs translate the programs in this lingua-franca (bytecodes) for any particular hardware (machine code). That's the beauty of the virtual machine.
The .class file is machine-readable. The machine that reads it is the Java Virtual Machine, which interprets it and compiles it to native code (executable by your computer).
You don't need the .java files to run Java code. The .class files are all you need.
It's machine readable, but does not execute on the bare hardware. It's run through the Java Virtual Machine which is an interpreter with a very high performance just-in time compiler. There are good reasons to have the interpreter only use the class file's bytecode. Briefly they are:
Easier to build the interpeter since the bytecode is much closer to instructions that can be turned into native machine code by the JIT.
Easier to resolve dependencies since the Java compiler does some syntactic sugar on them through the import command.
Java bytecode (.class file) is not directly executable.
It's an intermediate language that is interpreted by the underlying Java Virtual Machine. Of course some optimizations can happen (i.e. Just-in-time compilation).
To run a Java program you only need the bytecode files, .java files contains the source code.
Compiler Vs Interpreter:
Compiler Takes an entire program as
input
Interpreter Takes Single instructions as
input.
Intermediate Object Code is
Generated
No Intermediate Object Code is
Generated
Conditional Control Statements are
Executed faster
Conditional Control Statements are
Executed slower
Memory Requirement: More
(Since Object Code is Generated)
Memory Requirement: Less
Program need not be compiled every
time
Every time higher level program is
converted into lower level program
Errorsare displayed after entire
program is checked
Errors are displayed for every
instruction interpreted (if any)
Example: C Compiler
Example: BASIC
I have a very basic question about JVM: is it a compiler or an interpreter?
If it is an interpreter, then what about JIT compiler that exist inside the JVM?
If neither, then what exactly is the JVM? (I dont want the basic definition of jVM of converting byte code to machine specific code etc.)
First, let's have a clear idea of the following terms:
Javac is Java Compiler -- Compiles your Java code into Bytecode
JVM is Java Virtual Machine -- Runs/ Interprets/ translates Bytecode into Native Machine Code
JIT is Just In Time Compiler -- Compiles the given bytecode instruction sequence to machine code at runtime before executing it natively. Its main purpose is to do heavy optimizations in performance.
So now, Let's find answers to your questions:
JVM: is it a compiler or an interpreter?
An interpreter
What about JIT compiler that exist inside the JVM?
If you read this reply completely, you probably know it now.
What exactly is the JVM?
JVM is a virtual platform that resides on your RAM
Its component, Class loader loads the .class file into the RAM
The Byte code Verifier component in JVM checks if there are any access restriction violations in your code. (This is one of the principal reasons why java is secure)
Next, the Execution Engine component converts the Bytecode into executable machine code
It is a little of both, but neither in the traditional sense.
Modern JVMs take bytecode and compile it into native code when first needed. "JIT" in this context stands for "just in time." It acts as an interpreter from the outside, but really behind the scenes it is compiling into machine code.
The JVM should not be confused with the Java compiler, which compiles source code into bytecode. So it is not useful to consider it "a compiler" but rather to know that in the background it does do some compilation.
Like #delnan already stated in the comment section, it's neither.
JVM is an abstract machine running Java bytecode.
JVM has several implementations:
HotSpot (interpreter + JIT compiler)
Dalvik (interpreter + JIT compiler)
ART (AOT compiler + JIT compiler)
GCJ (AOT compiler)
JamVM (interpreter)
...and many others.
Most of the others answers when talking about JVM refer either to HotSpot or
some mixture of the above approaches to implementing the JVM.
It is both. It starts by interpreting bytecode and can (should it decide it is worth it) then compile that bytecode to native machine code.
It's both. It can interpret bytecode, and compile it to native code.
Javac is a compiler but not a traditional compiler.
A compiler typically converts source code to Machine level language for execution and that is done in a single shot i.e. entire code is taken and converted to machine level language at ONCE. (more on this below).
While, JavaC converts it to Bytecode instead of machine level language.
JIT is a Java compiler but also acts as an interpreter. A typical compiler will convert all the code at once from source code to machine level language. Instead, JIT goes line by line (line by line execution is a feature of Interpreters) and converts bytecode generated by JavaC into machine level language and executes it. JVM which has JIT in it has multiple implementations. Hotspot being one of the major ones for Java programming. Hotspot implementation makes JIT optimize the execution by converting chunks of code which are repetitive into Machine level language at once (like a compiler as mentioned above) so that they can be executed faster instead of converting each line of code 1 by 1.
So, the answer is not Black and White with respect to the typical definitions of Compiler and Interpreter.
This is my understanding after reading several online answers, blogs, etc. If somebody has suggestions to improve this understanding, please feel free to suggest.
JVM have both compiler and interpreter. Because the compiler compiles the code and generates bytecode. After that the interpreter converts bytecode to machine understandable code.
Example: Write and compile a program and it runs on Windows. Take the .class file to another OS (Unix) and it will run because of interpreter that converts the bytecode to machine understandable code.
My knowledge of Java isn't great, so I want to ask how the language works. By which I mean not just the "Language" but the Virtual Machine as well.
Here is my understanding.
Java compiler turns code into Java Byte-Code. in the form of a .java file
when the file is run, the JVM reads (just in time) the byte-code and turns it into machine code.
Computer reads the machine code and the program appears to run like a compiled program (to the user).
Is this hopelessly wrong?
There are already many answers, but I'm missing one important point:
"2. when the file is run, the JVM reads (just in time) the byte-code and turns it into machine code."
This is not quite correct.
The JVM starts by interpreting the code
It looks at the most time consuming parts, the hot spots
It analysis the traces, i.e., the typical execution flow
It generates machine code optimized for the hot spots and the traces
The less time-consuming parts of code may stay interpreted. If the situation changes (e.g., by loading a new class), some compiled code may show to be not optimal anymore or even incorrect, and it gets thrown away and the JVM reverts to interpreting for a while, then it re-compiles it again.
A Java Virtual Machine (JVM) is the software, which interprets compiled Java byte code and runs the java program. Java Virtual Machine language conceptually represents the instruction set of a stack-oriented, capability architecture.
Java Virtual Machine does not have any information regarding the programming languages. JVM knows only binary byte code format. Programmer can generate the bytecode that adheres to this format in any of the programming languages. Every java program runs within the boundaries defined by the Java Virtual Machine.
The code of java runs inside the JVM cannot go beyond the security constraints defined by Java Virtual Machine. Java applications are considered as secure applications on internet due to this software.
http://en.wikipedia.org/wiki/Java_bytecode
http://en.wikipedia.org/wiki/Java_virtual_machine
Your understanding is correct. I'd like to add the below
The HotSpot compiler also adaptively compiles Java bytecodes into optimized machine instructions
Almost:
the Java compiler creates .class files not .java files, which contain the byte code. .java files contain the source code.
the JVM (Java virtual machine) is like a (virtual) computer on its own. It interpretes the byte code. The OS only runs the JVM.
A JIT (just in time) compiler can compile part of the code to machine code for performance reasons, in which case the JVM delegates the execution of that code to the OS (I guess).
To be precise,
When you create a java class, the extension would be .java
During compilation, the compiler converts the code (.java file) to
.class (byte code).
When the code is run, JVM converts the byte code (.class file) to
Machine code that can be interpreted by the OS. By doing so, it makes
Java as platform independent and JVM as platform dependent.