I'm confused about a certain topic:
When you compile Java or Python, you get bytecode which will run on the respective VMs. In a previous question I had asked why, when you open a .pyc or .class file in a text editor, it appears as gibberish and not like readable bytecode (LOAD, STORE operations etc).
Now the answer I got at the time based around the argument of "That's like saying if you opened an .exe file and expected to see x86 assembly" and they made the analogy that bytecode that I've seen is the "assembly" version of the real bytecode which is not readable.
This would be okay and make sense if not for one thing. You can't compare an exe file to a bytecode file. An exe file is ALREADY compiled to machine code. A bytecode file is NOT. A bytecode file is fed to a VM which then interprets it (usually with JIT).
That means that whoever wrote the JVM for instance, (which is just a piece of software itself), would need to write a bytecode-interpreter. And I really doubt they wrote an interpreter to handle the following:
Java .class file:
I could be wrong and maybe they DID write an interpreter to handle this form of bytecode for some odd reason, but it doesn't seem likely. However, if the JVM handles the "assembly" version of the bytecode, then that would mean the cycle is
.java -> .class (unreadable) -> .class (readable right as it enters the JVM) There's almost a meaningless step in between.
I'm just really confused at this point.
They did write an interpreter for this form of bytecode. They read it as bytes, of course, not ASCII characters, which makes it more usable. But, for example, each instruction code takes only one byte, not e.g. five to write store.
The goal was to have something compact in memory usage, but not actually compiled to machine code that would be specific to only one device. Java bytecode is more or less its own form of machine code.
If you would like to read it, however, use the javap command to decompile it to a more readable form.
Bytecode is the "machine code" for a virtual machine. As such, it has much the same goals and restrictions as "real" machine code - compact, efficient decoding, etc.
The fact that bytecode is executed by a virtual machine rather than by a "real" machines is not particularly relevant.
Related
This question already has answers here:
How exactly does java compilation take place?
(9 answers)
Closed 7 years ago.
A compilation process takes source code to be translated into machine language.
So far, I have mentioned .java files (source code) and .exe files (0s and 1s).
So, What is the purpose of .class files that seem to be in the middle of the entire process?
We already have executable files coming from java source code and they don't depend on these .class files.
I know .class files are also created after compilation, but their content (which I don't know) differ from .exe files because they can only run with a java compiler and not as an independent application like an executable file will do.
Thank you!
Java was created to be platform independent. .class files contain JVM (Java Virtual Machine) code instead of normal machine code as found in .exes. The JVM is idealized, with features no real processor actually has, but it is less complex than the language of Java itself and is an abstraction layer above the platform specific hardware.
When a Java program is run, the JVM interprets the Java bytecode so that it can actually run on the CPU. The initial javac compilation relieves the JVM from having to compile the entire program on execution and hampering performance, while the abstraction lets the code run anywhere as long as a JVM is installed.
This is in contrast to .exe files, which contain platform specific machine code directly. Windows itself provides abstraction that keeps things from breaking, but because the code is still platform specific it won't work on non-Windows systems and there are separate versions for 32- and 64-bit systems (unless, of course, emulators).
The bytecode interpretation the JVM does is still very expensive compared to .exe files loading machine code directly. Therefore, if some part of the code is called enough, the JIT (Just-In-Time compiler) will kick in and compile the JVM code directly to platform specific machine code and store that in memory to increase performance in these active areas. If that area of code is used enough, the JIT does optimization on that code too, and because it can run profiling on running code it can optimize things with assumptions .exe compilers can't make. This is why many large Java programs have a warmup time before they start working more efficiently.
I am trying to understand how .class files work in java and what's their purpose. I found some information online, but I get unsatisfying explanations.
As soon as we run the compiler we get the .class file, which is bytecode. Is this machine readable or not? And if not, this is why we need the interpreter for the program to run successfully?
Also, since the .class file is the equivalent of our .java programs, why can't somebody run a java program straight away by just running the .class file using VM and they would need to have the .java file as well?
The JVM is by definition a virtual machine, that is a software machine that simulates what a real machine does. Like real machines it has an instruction set (the bytecodes), a virtual computer architecture and an execution model. It is capable of running code written with this virtual instruction set, pretty much like a real machine can run machine code.
So, the class files contain the instructions in the virtual instruction set, and it is capable of running them. For that matter, a virtual machine can either interpret the code itself or compile it for the hardware architecture it is currently running. Some do both, some do just one of them (e.g. .net runtime compiles once the first time the method is called).
For instance, the Java HotSpot initially interprets bytecodes, and progressively compiles the code into machine code. This is called adaptive optimization. Some virtual machines always compile to machine code directly.
So, you can see there are two different "compiling concepts". One consists in the transformation of Java code to JVM bytecodes (From .java to .class). And a second compilation phase happens when the program runs, where the bytecodes may either be interpreted or compiled to actual machine code. This is done by the just-in-time compiler, within the JVM.
So, as you can see, a computer cannot run a Java program directly because the program is not written in a language that the computer understands. It is written in lingua-franca that all JVM implementations can understand. And there are implementations of that JVM for many operating systems and hardware architectures. These JVMs translate the programs in this lingua-franca (bytecodes) for any particular hardware (machine code). That's the beauty of the virtual machine.
The .class file is machine-readable. The machine that reads it is the Java Virtual Machine, which interprets it and compiles it to native code (executable by your computer).
You don't need the .java files to run Java code. The .class files are all you need.
It's machine readable, but does not execute on the bare hardware. It's run through the Java Virtual Machine which is an interpreter with a very high performance just-in time compiler. There are good reasons to have the interpreter only use the class file's bytecode. Briefly they are:
Easier to build the interpeter since the bytecode is much closer to instructions that can be turned into native machine code by the JIT.
Easier to resolve dependencies since the Java compiler does some syntactic sugar on them through the import command.
Java bytecode (.class file) is not directly executable.
It's an intermediate language that is interpreted by the underlying Java Virtual Machine. Of course some optimizations can happen (i.e. Just-in-time compilation).
To run a Java program you only need the bytecode files, .java files contains the source code.
Compiler Vs Interpreter:
Compiler Takes an entire program as
input
Interpreter Takes Single instructions as
input.
Intermediate Object Code is
Generated
No Intermediate Object Code is
Generated
Conditional Control Statements are
Executed faster
Conditional Control Statements are
Executed slower
Memory Requirement: More
(Since Object Code is Generated)
Memory Requirement: Less
Program need not be compiled every
time
Every time higher level program is
converted into lower level program
Errorsare displayed after entire
program is checked
Errors are displayed for every
instruction interpreted (if any)
Example: C Compiler
Example: BASIC
I read the following articles:
http://searchcio-midmarket.techtarget.com/definition/just-in-time-compiler
http://javarevisited.blogspot.in/2011/12/jre-jvm-jdk-jit-in-java-programming.html
I am now really interested in knowing what will happen when I run a class. JIT compiles the byte code again and then ???
Will this compiled code be converted into an .exe by the JVM?
Like the others said: JIT does not mean the code is compiled to a binary executable (.exe). However, an interesting application that you may consider is Excelsior JET.
I haven't read too much about it and haven't used it, so I don't know exactly how it works... yet. But according to its webpage, it's an AOT (Ahead-Of-Time) compiler. This means that it will compile your .class files to a system-dependent binary file.
You should give it a try, see how it performs. According to the website, you get a free license if your project is non-comercial in nature.
Java Compiler compiles plain-text Java code into JVM bytecode. http://en.wikipedia.org/wiki/Java_bytecode
JVM has a HotSpot optimizer that evaluates the code for "Hot Spots" (basically, code that will be used the most) and pays special attention to those spots when using CPU cache. It may also flag those spots for the JVM to recompile to a native language (like Assembly) and this is called JIT.
JVM is essentially a virtual machine that runs a JVM bytecode interpreter.
There is never a direct .exe. It is a Windows/C/C++ thing, mostly.
No, the code is NOT "compiled" into an "exe"
the program is stored in memory as byte code, but the code segment currently running is preparatively compiled to physical machine code in order to run faster.
I'll go out on limb and say that JIT is a type of interpreter, designed to improve the speed of commonly used branches of code (at least that was my interpretation 10 years ago)
JIT compilers represent a hybrid approach, with translation occurring continuously, as with interpreters, but with caching of translated code to minimize performance degradation. It also offers other advantages over statically compiled code at development time, such as handling of late-bound data types and the ability to enforce security guarantees.
My knowledge of Java isn't great, so I want to ask how the language works. By which I mean not just the "Language" but the Virtual Machine as well.
Here is my understanding.
Java compiler turns code into Java Byte-Code. in the form of a .java file
when the file is run, the JVM reads (just in time) the byte-code and turns it into machine code.
Computer reads the machine code and the program appears to run like a compiled program (to the user).
Is this hopelessly wrong?
There are already many answers, but I'm missing one important point:
"2. when the file is run, the JVM reads (just in time) the byte-code and turns it into machine code."
This is not quite correct.
The JVM starts by interpreting the code
It looks at the most time consuming parts, the hot spots
It analysis the traces, i.e., the typical execution flow
It generates machine code optimized for the hot spots and the traces
The less time-consuming parts of code may stay interpreted. If the situation changes (e.g., by loading a new class), some compiled code may show to be not optimal anymore or even incorrect, and it gets thrown away and the JVM reverts to interpreting for a while, then it re-compiles it again.
A Java Virtual Machine (JVM) is the software, which interprets compiled Java byte code and runs the java program. Java Virtual Machine language conceptually represents the instruction set of a stack-oriented, capability architecture.
Java Virtual Machine does not have any information regarding the programming languages. JVM knows only binary byte code format. Programmer can generate the bytecode that adheres to this format in any of the programming languages. Every java program runs within the boundaries defined by the Java Virtual Machine.
The code of java runs inside the JVM cannot go beyond the security constraints defined by Java Virtual Machine. Java applications are considered as secure applications on internet due to this software.
http://en.wikipedia.org/wiki/Java_bytecode
http://en.wikipedia.org/wiki/Java_virtual_machine
Your understanding is correct. I'd like to add the below
The HotSpot compiler also adaptively compiles Java bytecodes into optimized machine instructions
Almost:
the Java compiler creates .class files not .java files, which contain the byte code. .java files contain the source code.
the JVM (Java virtual machine) is like a (virtual) computer on its own. It interpretes the byte code. The OS only runs the JVM.
A JIT (just in time) compiler can compile part of the code to machine code for performance reasons, in which case the JVM delegates the execution of that code to the OS (I guess).
To be precise,
When you create a java class, the extension would be .java
During compilation, the compiler converts the code (.java file) to
.class (byte code).
When the code is run, JVM converts the byte code (.class file) to
Machine code that can be interpreted by the OS. By doing so, it makes
Java as platform independent and JVM as platform dependent.
I've been learning compiler theory and assembly and have managed to create a compiler that generates x86 assembly code.
How can I take this assembly code and turn it into a .exe? Is there some magical API or tool I have to interact with? Or is it simpler than I think?
I'm not really sure what's in a .exe, or how much abstraction lies between assembly code and the .exe itself.
My 'compiler' was written in Java, but I'd like to know how to do this in C++ as well.
Note that if I take the generated assembly, it compiles to a .exe just fine for example with vc++.
Edit: To be more precise, I already know how to compile assembly code using a compiler. What I'm wanting is to have my program to basically output a .exe.
It looks like you need to spawn an assembler process and a linker process. On UNIX, it's as simple as invoking the fork() function, which would create a new process, and the exec() function, specifying the assembler and the linker executable names as the function's parameter, with suitable arguments to those executables, which would be the names of your generated assembly and object files. That's all you'd need to do on a UNIX system.
Normally you use an assembler and a linker to create an exe file. There is no magic involved. The different parts are assembled, a header and other boilerplate is added so the OS knows where the bootstrap code is located for the program and to organize the memory.
The VC++ compiler does that under the hood. You might consider playing a bit on Linux as you can better see the machinery working on Unix platforms. It is fundamentally the same, but it s just difficult to look through to UI on windows.
In principle, each assembly line corresponds to a machine instruction which is just a few bytes in the exe file. You can find them in the specs for the processor. So you can write your own asembeler if you know the codes and the exe format (header, relocation info etc).
How Do Assemblers Map x86 Instruction Mnemonics to Binary Machine Instructions?