Why do we use the Java virtual machine? [duplicate]

Why do we use the Java virtual machine? [duplicate] - java

This question already has answers here:
Why a virtual machine is needed execute a java program.? [duplicate]
(3 answers)
Closed 2 years ago.
I am trying to wrap my head around the Java Virtual Machine and why it uses bytecode. I know it has been asked so many times, but somehow I couldn't finally make the correct assumption, so I researched many things and decided to explain how I think it works and if it's correct.
I understand that in C++, compiler compiles the source code on the specific (architecture + operating system). So, compiled version of C++ for (x86 + Windows) won't run on any other architecture or operating system.
My assumptions
When Java compiler compiles the source code into bytecode, It doesn't do the compilation depending on architecture or operating system. The source code will always be compiled to the same bytecode if it's compiled on Windows or Mac. Let's say we compiled and now, send the bytecode to another computer (x86 + windows). In order for that computer to run this bytecode, It needs JVM. Now, JVM knows what architecture + operating system it's running on. (x86 + windows). So, JVM will compile bytecode to x86 + Windows and it will produce machine code which can be run by the actual computer now.
So, even though we use Java Virtual Machine, we still run the actual machine code on our operating system and not on the virtual machine. Virtual Machine just helps us to transform bytecode into machine code.
This means that when using Java, the only thing we have to worry about is installing JVM and that's it.
I just always thought that the Virtual Machine is just a computer itself where it would run the code in its own isolated place, but in case of JVM, i don't think that's correct, because I think machine code JVM produces still has to be run on the actual operating system we have.
Do you think my assumptions are correct?

When Java compiler compiles the source code into bytecode, It doesn't do the compilation depending on architecture or operating system. The source code will always be compiled to the same bytecode if it's compiled on Windows or Mac.
All correct.
Let's say we compiled and now, send the bytecode to another computer (x86 + windows). In order for that computer to run this bytecode, It needs JVM. Now, JVM knows what architecture + operating system it's running on. (x86 + windows). So, JVM will compile bytecode to x86 + windows and it will produce machine code which can be run by the actual computer now.
This is mostly correct, but there are a couple things that are a bit "off"
First of all, to execute the bytecodes on any computer you need a JVM. That includes the computer on which you compiled the bytecodes.
(It is theoretically possible that a computer could be designed and implemented to execute the JVM bytecode instruction set as its native instruction set. But I don't know if anyone has ever seriously contemplated doing this. It would be pointless. Performance would not be comparable with hardware that you can by for a couple of hundred dollars. The JVM bytecode instruction set is designed to be compact and simple, and it is relatively easy to JIT compile. Not to be executed efficiently.)
Secondly a typical JVM actually operates in two modes:
It starts out executing the bytecodes in software using an interpreter.
After a bit, it selectively compiles bytecodes of heavily used methods to the platform's native instruction set and executes the native code. The compilation is done using a JIT compiler.
Note that the JIT compiler is platform specific.
So, even though, we use Java Virtual Machine, we still run the actual machine code on our operating system and not on the virtual machine.
That is correct.
[The Java] Virtual Machine just helps us to transform bytecode into machine code.
The JVM actually does a lot more. Things like:
Garbage collection
Bytecode loading and verification
Implementing reflection
Providing native code methods for bridging between Java classes and operating system functionality
Implementing infrastructure for monitoring, profiling, debugging and so on.
This means that when using Java, the only thing we have to worry about is installing JVM and that's it.
Yes. But in a modern JDK there are other alternatives; e.g. jlink will generate an executable that has a cut-down JRE embedded in it so that you don't need to install a JRE. And GraalVM supports ahead of time (AOT) compilation.
I just always thought that the Virtual Machine is just a computer itself where it would run the code in its own isolated place, but in case of JVM, i don't think that's correct, because I think machine code JVM produces still has to be run on the actual operating system we have.
Ah yes.
The term "virtual machine" has multiple meanings:
A Java Virtual Machine "executes" Java bytecode ... in the sense above.
A Linux or Windows virtual machine is where the user's application and the guest operating system are running under the control of a "hypervisor" operating system. The applications and guest OS use the native hardware to execute instructions, but they don't have full control of the hardware.
And there are potentially other shades of meaning.
If you conflate JVMs with other kinds of virtual machine, you can get yourself in knots. Don't. They are different enough that conflating the concepts in not going to help you understand.

When compiling java code, java-bytecode is created.
This bytecode is platform independent and can be run by the JVM. This means that other programming languages like Kotlin can also generate this java-bytecode and target the JVM.
So you are correct about the java-bytecode being the same for every platform.
Where you aren't entirely correct however is, that the JVM converts the java-bytecode to native bytecode.
The JVM executes the java-bytecode instead and gives the instructions to the operating system. While executing, the JVM compiles bunches of java-bytecode to native code, which the operating system can then execute. This happens while the code is executed so a class that is never loaded will also never be executed and will never end up as machine code.
Due to this we can use features like Reflection where some of the java-bytecode of a class files can be modified at runtime before being compiled to native machine code.
The JVM basically sits between the operating system and the java-bytecode. In some sense just like a normal VM.
The article below has a nice visualization of the JVM.
https://www.guru99.com/java-virtual-machine-jvm.html

The java byte code is a form of intermediate code.
That code can be interpreted, a JVM is such an interpreter emulating (simulating) every single byte code instruction. This is slow but easily portable.
That code can be compiled to native code, normally a JVM contains a just-in-time compiler for code that is indeed executed. This native code can be optimized for the current machine it is running on, and the native code does not need to be loaded from file, and code inlining can be done.
So the JVM principly is a turbo-charged interpreter, of java byte code, portable to many platforms.
Context:
[Other languages] Microsoft's C# language is more of a compiler.
[Other intermediate codes] LLVM IR is an intermediate code from the C side, compilable.
[Other JVMs] There are more than one JVM, with different techniques.

Related

How java solved portability?

Java compiler converts Java code to bytecode and then JVM converts bytecode to machine instructions. As far as I have understood, JVM are built for different platforms (processor + OS). Then how can we say that Java is platform independent? Ultimately, we require a JVM which is platform dependent?

As you have mentioned yourself, JVM is platform-dependent. That's it - JVM is platform-dependent, but not JAVA. After all, the JVM needs to be run in someway inside the native machine, so it must have to be specific to that platform.
JAVA is portable in the sense that the compiled code is portable. For example, if you compare with C, both the C and Java source codes are portable, that means both of them provide source code portability. Once you have a source code written in a Windows PC, you can transfer that exact code in another Linux machine and both Java and C code will compile and run fine in both machines.
But, what about object code portability?
We know, when we compile a C code, it produces the machine readable object code. So, if you compile a C source code from one machine, then that object code may not be run from another machine if they are not compatible. But, in case of JAVA, if you compile a Java source code to bytecode from one machine, that bytecode can be run in any machine that runs the JVM.
Another interesting fact, by successful compilation of a Java source code, we are also producing a byte code for some unknown future CPU which doesn't even exist as JVM acts as a kind of virtual CPU.

The end-user writes code in java and that is platform independent.
The JVM engineer works on creating the JVM and the JRE and within it the compiler and the interpreter for different platforms. Therefore that absolves the end-user to worry about porting their codebase to different platforms. Write once and then run it on all platforms (as long as there is a JVM available for that platform).
So from the perspective of the end-user (Java programmer) the java code is indeed platform independent since it can run anywhere without any changes. Even though the JVM is ported to different platforms and is indeed platform dependent.
Difference with C/C++:
As it is with the JVM, there exists a C/C++ compiler for each platform responsible for translating source to machine/processor instructions. These machine instructions are understood by the processor, for instance: intel etc. Therefore a C/C++ compiler is made available for each platform it supports. However, you might need to link to specific OS/platform specific libraries in your program. That will make your program platform specific and will need to be rewritten for another platform by linking to that platform's version of that library. That becomes C/C++ programmer responsibility. Also each C/C++ compiler has its own vagaries specific to a platform unlike a JVM which provides a consistent view across platforms detailed below.
In case of Java, you are removed from tapping into platform specificities (unless the JVM or Java libraries do not provide it but they usually cover most of the basics). The java code taps into JVM and Java libraries and the JVM is responsible for translating that to bytecode or to machine instructions (to improve performance) at runtime. Unless you use JNI in your java source, you do not have to worry about portability. For instance: within your Java program, you can access the native environment, the host memory, some OS characteristics etc., and your code is still portable. The JVM provides the infrastructure in that the native calls to extract that information specific to the platform. However there might be some platform specific functionality that your java program requires that is not provided by the JVM or Java libraries and therefore you need to JNI and your portability suffers as a result.
Note: C/C++/Java all have source code portability. But what the JVM provides is binary portability. The compiled or interpreted java code runs on an abstraction called the JVM (Virtual Machine). The java programmer writes to that interface instead of the platforms that the JVM runs on. There is a clear separation here.

Steps of programm execution

After hours of research I haven't found a concrete answer for my question and I'm going maddd!:
The steps from editing to execution:
1 . (Compilation step) After writing the source code, i compile the program. In this step it is converted into bytecode. A java.class file (the bytecode) is generated.
2 .(Execution step) Now i execute the program.
(Interpretation step) When I do this, the JVM interprets the bytecode into machine code. So I understand that the machine code is only generated after execution!??
Now the steps are: code-->bytecode-->execution-->machinecode
All these steps are hardware- and software-independent.
Am i right?
This is called JIT (just in time compilation), so that when I execute the program the bytecode is compiled into machinecode, and only then.
So why is this step called interpretation?
I'm thanking you in advance for your answers!

In short because JVM doesn't have to have JIT. It can interpret the bytecode instead of compiling it. Of course an interpret-only JVM would be slow, but the JIT part is just an extra feature to improve performance, not a required property of a Java Virtual Machine. The -Xint command line parameter can be used to run a java program in interpret-only mode.
The reason it's compiled to bytecode and not machine code is to get the platform independence. Bytecode is platform independent, so the same code can run on any platform (as long as there's the JVM to interpret it). If it were compiled into machine code, it would be operating system and processor architecture dependent.

(Interpretating step) When i do this, the JVM interprets the bytecode into machine code. So i understand that the machine code is only generated after execution!??
Not exactly, and no. A JVM operating strictly as a bytecode interpreter does not transform bytecode into machine code and then execute that. The machine code executed by such a JVM is (comprised by) the pre-existing machine code of the JVM itself. The byte code is used to provide some of the data on which to operate and to direct which of the JVM's machine code is executed.
Now the steps are: code-->bytecode-->execution-->machinecode
All these steps are hardware- and software-independent. Am i right?
No, not at all. The particulars of the Java code --> bytecode transformation are somewhat dependent on which Java compiler (software) you use. The Java virtual machine you use must be specific to the hardware on which it runs, and it is itself a piece of software. Moreover, the operating environment is influenced by a lot of other software.
Java hardware independence, such as it is, means that a Java program (bytecode) will behave consistently on any hardware, but the details of how that consistent behavior is provided on any given machine are all kinds of hardware- and software-dependent.
This is called JIT(just in time compilation), so that when I execute the program the bytecode is compiled into machinecode, and only then. But why is this step called interpretating?
JIT is something else, and a JVM that performs JIT (as in fact most do) is not strictly an interpreter. Most such JVMs run some bytecode in an interpretative manner as described above, but compile some bytecode to native (machine) code, and run that machine code directly when subsequently needed. The latter manner of execution generally isn't called "interpreting".

How does Java bytecode deal with multiple platforms?

For example, let's say you have a java program that simply opens a window. This will obviously result in different assembly code for different operating systems (on windows it will eventually have to call CreateWindowEx). So how does the Java bytecode (or any other similar language) represent something platform specific like this?

The JVM is OS-dependent, byte code isn't.
This means that bytecode is a sort of "generic" language that JVM will interpret end execute according to the system it's running on.
UPDATE
As Chris Jester-Young says, my answer is not strictly correct:
The bytecode for a 100% pure Java program is indeed platform-independent. The underlying Java platform classes that such programs invoke are, of course, not.
Most of the time, the JVM will also JIT compile, not just interpret. (You can enable pure-interpretation mode, but it'll be slow!)

Serenity said this:
Typically, the compiled code is the exact set of instructions the CPU requires to "execute" the program. In Java, the compiled code is an exact set of instructions for a "virtual CPU" which is required to work the same on every physical machine.
So, in a sense, the designers of the Java language decided that the language and the compiled code was going to be platform independent, but since the code eventually has to run on a physical platform, they opted to put all the platform dependent code in the JVM.
This requirement for a JVM is in contrast to your Turbo C example. With Turbo C, the compiler will produce platform dependent code, and there is no need for a JVM work-alike because the compiled Turbo C program can be executed by the CPU directly.
With Java, the CPU executes the JVM, which is platform dependent. This running JVM then executes the Java bytecode which is platform independent, provided that you have a JVM availble for it to execute upon. You might say that writing Java code, you don't program for the code to be executed on the physical machine, you write the code to be executed on the Java Virtual Machine.
The only way that all this Java bytecode works on all Java virtual machines is that a rather strict standard has been written for how Java virtual machines work. This means that no matter what physical platform you are using, the part where the Java bytecode interfaces with the JVM is guaranteed to work only one way. Since all the JVMs work exactly the same, the same code works exactly the same everywhere without recompiling. If you can't pass the tests to make sure it's the same, you're not allowed to call your virtual machine a "Java virtual machine".
Of course, there are ways that you can break the portability of a Java program. You could write a program that looks for files only found on one operating system (cmd.exe for example). You could use JNI, which effectively allows you to put compiled C or C++ code into a class. You could use conventions that only work for a certain operating system (like assuming ":" separates directories). But you are guaranteed to never have to recompile your program for a different machine unless you're doing something really special (like JNI).
Source: How is Java platform-independent when it needs a JVM to run?

In the Java programming language, all source code is first written in plain text files ending with the .java extension. Those source files are then compiled into .class files by the javac compiler. A .class file does not contain code that is native to your processor; it instead contains bytecodes — the machine language of the Java Virtual Machine1 (Java VM). The java launcher tool then runs your application with an instance of the Java Virtual Machine.
please read this doc by oracle
.class files(i.e byte code) are same but not the JVM ,It differs with OS.

There are different Java Virtual Machine binaries for each operating system. They can run the same Java bytecode (classes, jars), but the implementations is sometimes different.
For example, all the window handling code is implemented differently on Windows, Unix and Mac. Each of them will call the OS native calls to open a window or to draw something. You don't have to care about it, because the Virtual Machine automatically does it.

Does javac perform any bytecode level optimizations depending on the underlying operating system?

My understanding is that the Java bytecode produced by invoking javac is independent of the underlying operating system, but the HotSpot compiler will perform platform-specific JIT optimizations and compilations as the program is running.
However, I compiled code on Windows under a 32 bit JDK and executed it on Solaris under a 32 bit JVM (neither OS is a 64 bit operating system). The Solaris x86 box, to the best of my knowledge (working to confirm the specs on it) should outperform the Windows box in all regards (number of cores, amount of RAM, hard disk latency, processor speed, and so on). However, the same code is running measurably faster on Windows (a single data point would be a 7.5 second operation on Windows taking over 10 seconds on Solaris) on a consistent basis. My next test would be to compile on Solaris and note performance differences, but that just doesn't make sense to me, and I couldn't find any Oracle documentation that would explain what I'm seeing.
Given the same version (major, minor, release, etc.) of the JVM on two different operating systems, would invoking javac on the same source files result in different optimizations within the Java bytecode (the .class files produced)? Is there any documentation that explains this behavior?

No. javac does not do any optimizations on different platforms.
See the oracle "tools" page (where javac and other tools are described):
Each of the development tools comes in a Microsoft Windows version (Windows) and a Solaris or Linux version. There is virtually no difference in features between versions. However, there are minor differences in configuration and usage to accommodate the special requirements of each operating system. (For example, the way you specify directory separators depends on the OS.)
(Maybe the Solaris JVM is slower than the windows JVM?)

The compilation output should not depend on OS on which javac was called.
If you want to verify it try:
me#windows# javac Main.java
me#windows# javap Main.class > Main.win.txt
me#linux# javac Main.java
me#linux# javap Main.class > Main.lin.txt
diff Main.win.txt Main.lin.txt

I decided to google it anyway. ;)
http://java.sun.com/docs/white/platform/javaplatform.doc1.html
The Java Platform is a new software platform for delivering and running highly interactive, dynamic, and secure applets and applications on networked computer systems. But what sets the Java Platform apart is that it sits on top of these other platforms, and executes bytecodes, which are not specific to any physical machine, but are machine instructions for a virtual machine. A program written in the Java Language compiles to a bytecode file that can run wherever the Java Platform is present, on any underlying operating system. In other words, the same exact file can run on any operating system that is running the Java Platform. This portability is possible because at the core of the Java Platform is the Java Virtual Machine.
Written April 30, 1996.
A common mistake, esp if you have developed for C/C++, is to assume that the compiler optimises the code. It does one and only one optimisation which is to evaluate compiler time known constants.
It is certainly true that the compiler is no where near as powerful as you might imagine because it just validates the code and produces byte-code which matches your code as closely as possible.
This is because the byte-code is for an idealised virtual machine which in theory doesn't need any optimisations. Hopefully when you think about it that way it makes sense that the compiler does do anything much, it doesn't know how the code will actually be used.
Instead all the optimisation is performed by the JIT in the JVM. This is entirely platform dependant and can produce 32-bit or 64-bit code and use the exact instruction of the processor running the code. It will also optimise the code based on how it is actually used, something a static compiler cannot do. It means the code can be re-compiled more than once based on different usage patterns. ;)

To my understanding javac only consideres the -target argument to decide what bytecode to emit, hence there is no platform specific in the byte code generation.
All the optimization is done by the JVM, not the compiler, when interpreting the byte codes. This is specific to the individual platform.
Also I've read somewhere that the Solaris JVM is the reference implementation, and then it is ported to Windows. Hence the Windows version is more optimzied than the Solaris one.

Does javac perform any bytecode level optimizations depending on the
underlying operating system?
No.
Determining why the performance characteristics of your program are different on two platforms requires profiling them under the same workload, and careful analysis of method execution times and memory allocation/gc behaivor. Does your program do any I/O?

To extend on dacwe's part "Maybe the Solaris JVM is slower than the windows JVM?"
There are configuration options (e.g., whether to use the client or server vm [link], and probably others as well), whose defaults differ depending on the OS. So that might be a reason why the Solaris VM is slower here.

How Java Language Works

My knowledge of Java isn't great, so I want to ask how the language works. By which I mean not just the "Language" but the Virtual Machine as well.
Here is my understanding.
Java compiler turns code into Java Byte-Code. in the form of a .java file
when the file is run, the JVM reads (just in time) the byte-code and turns it into machine code.
Computer reads the machine code and the program appears to run like a compiled program (to the user).
Is this hopelessly wrong?

There are already many answers, but I'm missing one important point:
"2. when the file is run, the JVM reads (just in time) the byte-code and turns it into machine code."
This is not quite correct.
The JVM starts by interpreting the code
It looks at the most time consuming parts, the hot spots
It analysis the traces, i.e., the typical execution flow
It generates machine code optimized for the hot spots and the traces
The less time-consuming parts of code may stay interpreted. If the situation changes (e.g., by loading a new class), some compiled code may show to be not optimal anymore or even incorrect, and it gets thrown away and the JVM reverts to interpreting for a while, then it re-compiles it again.

A Java Virtual Machine (JVM) is the software, which interprets compiled Java byte code and runs the java program. Java Virtual Machine language conceptually represents the instruction set of a stack-oriented, capability architecture.
Java Virtual Machine does not have any information regarding the programming languages. JVM knows only binary byte code format. Programmer can generate the bytecode that adheres to this format in any of the programming languages. Every java program runs within the boundaries defined by the Java Virtual Machine.
The code of java runs inside the JVM cannot go beyond the security constraints defined by Java Virtual Machine. Java applications are considered as secure applications on internet due to this software.

http://en.wikipedia.org/wiki/Java_bytecode
http://en.wikipedia.org/wiki/Java_virtual_machine

Your understanding is correct. I'd like to add the below
The HotSpot compiler also adaptively compiles Java bytecodes into optimized machine instructions

Almost:
the Java compiler creates .class files not .java files, which contain the byte code. .java files contain the source code.
the JVM (Java virtual machine) is like a (virtual) computer on its own. It interpretes the byte code. The OS only runs the JVM.
A JIT (just in time) compiler can compile part of the code to machine code for performance reasons, in which case the JVM delegates the execution of that code to the OS (I guess).

To be precise,
When you create a java class, the extension would be .java
During compilation, the compiler converts the code (.java file) to
.class (byte code).
When the code is run, JVM converts the byte code (.class file) to
Machine code that can be interpreted by the OS. By doing so, it makes
Java as platform independent and JVM as platform dependent.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.