Compilation vs translation, "compiling" Java to bytecode?

Compilation vs translation, "compiling" Java to bytecode? - java

My understanding is like this, definitions:
Translation - having code in some language, generating code in some other language.
Compilaton - translation to a machine code.
Machine code - direct instructions for CPU.
Now, from docs.oracle.com:
javac - Java programming language compiler
Compiler...? I think it is Java translator, because it is generating a code, that is not a machine code. Bytecode needs interpreter (JVM) to run, so it's definetely not a machine code.
From Wikipedia:
Java applications are typically compiled to bytecode
Similarly. According to definitions, I would say that Java is traslated to bytecode. There are many more examples on the Internet, I think there is confusion about that or I'm just missing something.
Could you please clarify this?
What is the difference between translation and compilation?

It's all a matter of definitions, and there's no single accepted definition for what "compilation" means. In your eyes, compilation is transforming a source code in some language to native; so a transformation process which doesn't generate machine code shouldn't be called "compilation". In my eyes (and apparently, the javac documentation writers' eyes as well), it should.
There are actually a lot of different terms: translation, compilation, decompilation, assembly, disassembly, and more.
Personally, I'd say it makes sense to group all of these terms under "compilation", because all these processes have a lot in common:
They transform code in one formal language to code in another formal language.
They try to preserve the semantics of the input code as much as possible.
They all have a very similar design to each other, with a front-end, a back-end, and a possible optimizer in the middle (learn more about compiler structure here). I've seen the entrails of both javac and native compilers and they are relatively similar.
In addition, your definition of "produces native code" is problematic:
What about compilers that can generate assembly but don't bother transforming that to machine code, leaving this to an external program (commonly called "assembler")? Would you deny them this definition of "compilers" because of that last, insignificant-in-comparison step?
How do you even classify "machine code"? What if tomorrow a processor which can run Java Bytecode natively is created?
But these are just my opinions. I think that out there, the most accepted definitions are that:
Compilation is transforming code in a higher-level language to a lower-level one. Examples: Java to Java Bytecode, or C to x86 machine code.
Decompilation is transforming a code in a lower-level language to a higher-level one - in effect, the opposite of compilation. Examples: Java Bytecode to Java.
Translation or source-to-source compilation is transforming a code in some language to another language of comparable "level". Examples: ARM to x86, or C to Java. When the two languages are actually different versions of the same language (e.g. Javascript 6 to Javascript 5), the term transpiler is also used.
Assembly is transforming code in some assembly language to machine code.
Disassembly is either a synonym to decompilation or the opposite of assembly, depending on the context.
Under these definitions, javac could definitely be considered as a compiler. But again, it's all in the definitions: from a technical standpoint, many of these actions have a lot in common.

The result of javac is machine code. The fact that the machine is virtual and not physical is not relevant (otherwise, you could argue that compiling code in x86 was translation if you were a Mac user, since the x86 code is not Mac machine code).

"A compiler is a computer program (or set of programs) that transforms
source code written in a programming language (the source language)
into another computer language (the target language, often having a
binary form known as object code)."
http://en.wikipedia.org/wiki/Compiler
So no a compilation doesn't mean that the output is in machine code.
For example early C++ compilation would generate a C program, that then needed to be compiled again into machine code. Of course any good compiler would hide these separate steps from the user, but they are still there.
Nowday I know at least of the NesC compiler that does the same procedure.
A machine that runs JVM bytecode can be built, actually some chapters of Structured Computer Organization, from A. Tanenbaum describe how to do that.
http://www.amazon.com/Structured-Computer-Organization-5th-Edition/dp/0131485210

Compiler...? I think it is Java translator, because it is generating a code, that is not a machine code. Bytecode needs interpreter (JVM) to run, so it's definetely not a machine code.
The JVM is the Java Virtual Machine, which is a machine, and its machine code is called Java byte code. (Each "byte" in byte code is a JVM machine instruction.)
You can learn more by reading the JVM specification. Start here: https://docs.oracle.com/javase/specs/
Also, the JVM is not an interpreter; it is a definition of a machine. Some implementations of the JVM include just-in-time (JIT) compilers, ahead-of-time (AOT) compilers, adaptive compilers (e.g. Hotspot), and yes, even interpreters (although I haven't personally seen a Java interpreter in 20 years now).
What most people starting out in compilers don't understand is that there are no compilers anymore that generate "machine code". They all generate some intermediate forms (as defined by a particular OS, for example) that are then loaded and munged by the OS.
For the most part today, no compiled program is capable of being executed without a large, bloated OS to munge it and slice it and glue it into pieces that the OS owns and manages.
Not even C compiles code into actual machine executables nowadays without coloring outside the lines. (Write an actual, working boot loader from scratch, and get back to me.)

Related

would conversion from java code to jvm byte code considered compiling or transpiling?

Compiling is a process to conversion from one level of abstraction to lower level. Meanwhile transpiling is a process of conversion from one level of abstraction to another at same level like converting java code to Kotlin/python. That is my understanding of the two process. Could someone please explain it in terms of java code and jvm byte code. And is my inference correct?

Higher/Lower levels of abstraction from human language to machine language
A compiler translates from a higher level language to a lower level language. By higher/lower we mean how abstracted from machine language. So this would include Java language to bytecode. Bytecode is closer to machine language, and further away from human language.
A transpiler converts between languages of comparable levels of abstraction. Converting from EcmaScript 6 to EcmaScript 5 for compatibility with older web browsers would be one example. Converting from Java language to Kotlin would be another, or Swift to Kotlin.
See Wikipedia: https://en.wikipedia.org/wiki/Source-to-source_compiler
Intermediate representation
In particular, bytecode compiled from Java language and bitcode compiled via LLVM (from Swift, Rust, etc) are known as Intermediate Representation (IR). An IR is designed for further processing, optimizing, and translating on its way to becoming machine language.

Would conversion from java code to jvm byte code considered compiling or transpiling?
It is compiling, according to the definitions that you give in your question. The bytecode instruction set is at a lower level of abstraction than Java source code.
Having said that, this distinction between compiling and transpiling is a bit nebulous, since there is no clear definition of a "level of abstraction".
For example, one could argue that since C is sometimes called "high level assembly language", C++ is at a higher level of abstraction than C. So does that make C++ -> C conversion compilation? Or are they at the same level of abstraction, which makes C++ -> C conversion transpilation? This is not totally academic because the first implementation of the C++ language did translate C++ to C source code ... IIRC.

It works as explained below:
Firstly java source code is converted into Bytecode file by the translator named “Compiler”. The byte code file gets name with .class extension and javac (java compiler) is the tool to compile the .java file.
Then,
java is a tool use to invoke Java Interpreter “JVM”. Now, the work of JVM starts. When JVM invoke,
a subprogram in JVM called Class loader (or system class loader) starts and load the bytecode into OS memory( or RAM).
another subprogram Bytecode Verifier verify and ensure that the code do not violate the security rules. That’s why the java program is much secured and virus free.
Then last subprogram Execution Engine finally converts bytecodes into machine code. The name of that engine are in use today is JIT Just In Compiler.
You can read about the same here : https://www.quora.com/How-does-the-Java-interpreter-JVM-convert-bytecode-into-machine-code

Why does Java code need to be compiled but JavaScript code does not

How come code written in Java needs to be compiled in byte-code that is interpreted by the JVM, but code written in a language like JavaScript does not need to be compiled and can run directly in a browser?
Is there an easy way to understand this?
What is the fundamental difference between the way these two languages are written, that may help to understand this behavior?
I am not a CS student, so please excuse the naivete of the question.

Historically, JavaScript was an interpreted language. Which means an interpreter accepts the source code and executes it all in one step. The advantage here is simplicity and flexibility, but interpreters are very slow. Compilers convert the high level language into a lower level language that either the native processor or a VM (in this case, the Java VM) can execute directly. This is much faster.
JavaScript in modern browsers is now compiled on the fly. So when the script is loaded, the first thing the JavaScript engine does is compile it into a bytecode and then execute it. The reason the entire compilation step is missing from the end user's perspective is because browser developers have (thankfully) maintained the requirement that JavaScript is not explicitly compiled.
Java was from the getgo a language that always had an explicit compile step. But in many cases that's not true anymore. IDE's like IntelliJ or Eclipse can compile Java on the fly and in many cases remove the explicit compilation step.

JavaScript and Java are not the same thing. They might share a similar name, but I refer you to the JS guru - Douglas Crockford to help clear up the fact that they are really not related at all.
The reality is that there is nothing stopping Java being an interpreted language, and there's equally nothing stopping JavaScript being a compiled language (Chrome's javascript engine does do compilation to improve speed, and does a very good job of it).
In the context of the browser, Java runs in the same way as Flash or Silverlight - a plugin is required and the browser acts as a host to that plugin; which hosts a Java runtime environment.
Javascript was designed to be a scripting language for the browser, and that's why the browser can understand it natively. How the browser actually achieves the running of that code, however, is entirely up to the browser. That is - it can operate purely at a script level, assuming zero-knowledge of the next line of code and running a purely software-based stack; or it can perform some JIT to get the code closer to the hardware and (hopefully) improve speed.

Any language can be compiled and interpreted. In both cases, a piece of software has to read the source code, split it up, parse it, etc. to check certain requirements and then assign a meaning to every part of the program. The only difference is that the compiler then proceeds to generate code with (almost) the same meaning in another language (JVM bytecode, or JavaScript, or machine code, or something entirely else) while the interpreter carries out the meaning of the program immediately.
Now, in practice it's both simpler and more complicated. It's simpler in many languages lend themselves better to one of the two - Java is statically-typed and there is relatively little dynamic about the meaning of a program, so you can compile it and thus do some work which would otherwise need to be done at runtime. JavaScript is dynamically-typed and you can't decide a lot of things (such as whether + is addition or concatenation) until runtime, so compilation does not afford you much performance. However, a mix of compiler and interpreter (compile to simplified intermediate representation, then interpret and/or compile that) is increasingly popular among dynamic language implementations. And then there's the fact that modern JavaScript implementations do compile, and in fact V8 never interprets anything.

Because of the being complexity of compiling level between Java and Javascript, there are some limitations in Javascript. Since bytecode is executed on JVM platform which is written for specific OS and Hardware bytecode execution has more advantages to access system resources. Even C code can be embedded into Java bytecode aswell. On the other hand Since Javascript is only run on browser there is less to do with it.
there are two main part at Java platform. Java programming Language and JVM. It makes each part focus only their own area. That is why JVM doesnot dedal with Java programming syntax. It is similar to that when running C code linking does not deal with C code but assembly.
Bytecode in JVM platform is likely an assembly in C.
Eventually all representations are converted into binary representation and then the electirical signals somehow. It proves that we need programming levels.

Confused from Wiki: C# and Java are interpreted?

On the EN Wiki I read that both C# and Java are interpreted languages, however at least for C# I think it is not true.
Many interpreted languages are first compiled to some form of virtual
machine code, which is then either interpreted or compiled at runtime
to native code.
From my understanding, it is compiled into CIL and when run, using JIT its compiled to target platform. I have also read that JIT is an interpreter, is that really so?
Or are they called interpreted as they are using intermediate code? I do not understand it.
Thanks

JIT is a form of compilation to native (machine) code. Typically (but not as a necessity), implementations of either the CLI and JVM are compiled in two steps:
the language compiler compiles code to something intermediate (IL/bytecode)
the JIT compiles that to native/machine code at runtime
However, interpreters for both do exist. Micro Framework operates as an IL interpreter, for example. Equally, tools like (looking .NET here) NGEN and "AOT" (mono) allow compilation to native/machine code at the start.

They are considered JIT languages which is different from interpreting. JIT simply compiles to native code when needed during execution. The common strategy is to compile into an intermediate representation (bytecode) beforehand which makes the JIT faster.
However, there is nothing that prevents them from being interpreted, or even statically compiled. Languages are simply languages - how they are executed is irrelevant from a language perspective.

On the EN Wiki I read that both C# and Java are interpreted languages
Can you pls provide the link?
May be the interpreted word means different here. It perhaps means that these languages are first interpreted to convert source code into platform-independent code.(VM Specific)
are they called interpreted as they are using intermediate code
I too think so.
I have also read that JIT is an interpreter
JIT is a compiler. See this

Is something "interpreter" or not depends on context of discussion.
From purely abstract view interpreter can be defined as any intermediate program present in runtime which dynamically translates program code written in one language to a target code of hardware/software of other language. Think about runing java bytecode on x86 hardware, or running Python on CLR VM what exactly IronPython is. In this view every virtual machine is an interpreter of some kind. As it is program present in runtime it clearly differs from static compilers or hardware implemented VM-s.
Now there are many different ways to achieve this functionality where accent is on "dynamically" and "present in runtime".
In discussions where implementation of VM matters, people make clear distinction between "classical" interpreter and JIT-ed one. Classical interpreter is something which for every instruction of hosted program emits routine of target code. This design is simple to build, but hard to optimize. JIT-ed design reads bunch of instruction of original code, and then translates all those instructions to a one native compiled routine. So it "interprets" faster. It is like micro static compiler within VM. There are many different ways to accomplish behavior labeled as JIT, and then there are other approaches like tracing compilers.
Modern VM's like CLR, HotSpot and J9 JVM's are even more complex than to be tagged with simple labels as JIT or Interpreter. They can be at a same time static compilers (AOT execution), classical interpreters and JIT-ed VMs.
For example CLR can compile code Ahead-Of-Time (static compiler), and store native code as bunch of more or less excutable files on disk to be used for faster future startups of hosted program. I believe "ngen" is AOT process used in windows for this functionality. If AOT is not used CLR behaves as JIT VM.
J9 and HotSpot are able to switch in runtime between purely interpreted execution or JIT-ed on depending of code analysis and current load. So it's is quite gray area. J9 even has AOT functionality similar to CLR.
Some other VMs like Maxine JVM or PyPy are socalled "metacircular" VM. This means they are (mostly) implemented in a same language they host (Maxine is JVM written in Java). In order to provided good code they usually have some JIT like behavior implemented in host language which is than bootstrapped and optimized by a very low, close to machine, interpreter.
So actual definition of interpreter varies on context of discussion. When labels like JIT are used then there is clear accent of discussion to an implementation details of VM being discussed.

Is the JVM a compiler or an interpreter?

I have a very basic question about JVM: is it a compiler or an interpreter?
If it is an interpreter, then what about JIT compiler that exist inside the JVM?
If neither, then what exactly is the JVM? (I dont want the basic definition of jVM of converting byte code to machine specific code etc.)

First, let's have a clear idea of the following terms:
Javac is Java Compiler -- Compiles your Java code into Bytecode
JVM is Java Virtual Machine -- Runs/ Interprets/ translates Bytecode into Native Machine Code
JIT is Just In Time Compiler -- Compiles the given bytecode instruction sequence to machine code at runtime before executing it natively. Its main purpose is to do heavy optimizations in performance.
So now, Let's find answers to your questions:
JVM: is it a compiler or an interpreter?
An interpreter
What about JIT compiler that exist inside the JVM?
If you read this reply completely, you probably know it now.
What exactly is the JVM?
JVM is a virtual platform that resides on your RAM
Its component, Class loader loads the .class file into the RAM
The Byte code Verifier component in JVM checks if there are any access restriction violations in your code. (This is one of the principal reasons why java is secure)
Next, the Execution Engine component converts the Bytecode into executable machine code

It is a little of both, but neither in the traditional sense.
Modern JVMs take bytecode and compile it into native code when first needed. "JIT" in this context stands for "just in time." It acts as an interpreter from the outside, but really behind the scenes it is compiling into machine code.
The JVM should not be confused with the Java compiler, which compiles source code into bytecode. So it is not useful to consider it "a compiler" but rather to know that in the background it does do some compilation.

Like #delnan already stated in the comment section, it's neither.
JVM is an abstract machine running Java bytecode.
JVM has several implementations:
HotSpot (interpreter + JIT compiler)
Dalvik (interpreter + JIT compiler)
ART (AOT compiler + JIT compiler)
GCJ (AOT compiler)
JamVM (interpreter)
...and many others.
Most of the others answers when talking about JVM refer either to HotSpot or
some mixture of the above approaches to implementing the JVM.

It is both. It starts by interpreting bytecode and can (should it decide it is worth it) then compile that bytecode to native machine code.

It's both. It can interpret bytecode, and compile it to native code.

Javac is a compiler but not a traditional compiler.
A compiler typically converts source code to Machine level language for execution and that is done in a single shot i.e. entire code is taken and converted to machine level language at ONCE. (more on this below).
While, JavaC converts it to Bytecode instead of machine level language.
JIT is a Java compiler but also acts as an interpreter. A typical compiler will convert all the code at once from source code to machine level language. Instead, JIT goes line by line (line by line execution is a feature of Interpreters) and converts bytecode generated by JavaC  into machine level language and executes it. JVM which has JIT in it has multiple implementations. Hotspot being one of the major ones for Java programming. Hotspot implementation makes JIT optimize the execution by converting chunks of code which are repetitive into Machine level language at once (like a compiler as mentioned above) so that they can be executed faster instead of converting each line of code 1 by 1.
So, the answer is not Black and White with respect to the typical definitions of Compiler and Interpreter.
This is my understanding after reading several online answers, blogs, etc. If somebody has suggestions to improve this understanding, please feel free to suggest.

JVM have both compiler and interpreter. Because the compiler compiles the code and generates bytecode. After that the interpreter converts bytecode to machine understandable code.
Example: Write and compile a program and it runs on Windows. Take the .class file to another OS (Unix) and it will run because of interpreter that converts the bytecode to machine understandable code.

Is Java a Compiled or an Interpreted programming language ?

In the past I have used C++ as a programming language. I know that the code written in C++ goes through a compilation process until it becomes object code "machine code".
I would like to know how Java works in that respect. How is the user written Java code run by the computer?

Java implementations typically use a two-step compilation process. Java source code is compiled down to bytecode by the Java compiler. The bytecode is executed by a Java Virtual Machine (JVM). Modern JVMs use a technique called Just-in-Time (JIT) compilation to compile the bytecode to native instructions understood by hardware CPU on the fly at runtime.
Some implementations of JVM may choose to interpret the bytecode instead of JIT compiling it to machine code, and running it directly. While this is still considered an "interpreter," It's quite different from interpreters that read and execute the high level source code (i.e. in this case, Java source code is not interpreted directly, the bytecode, output of Java compiler, is.)
It is technically possible to compile Java down to native code ahead-of-time and run the resulting binary. It is also possible to interpret the Java code directly.
To summarize, depending on the execution environment, bytecode can be:
compiled ahead of time and executed as native code (similar to most C++ compilers)
compiled just-in-time and executed
interpreted
directly executed by a supported processor (bytecode is the native instruction set of some CPUs)

Code written in Java is:
First compiled to bytecode by a program called javac as shown in the left section of the image above;
Then, as shown in the right section of the above image, another program called java starts the Java runtime environment and it may compile and/or interpret the bytecode by using the Java Interpreter/JIT Compiler.
When does java interpret the bytecode and when does it compile it? The application code is initially interpreted, but the JVM monitors which sequences of bytecode are frequently executed and translates them to machine code for direct execution on the hardware. For bytecode which is executed only a few times, this saves the compilation time and reduces the initial latency; for frequently executed bytecode, JIT compilation is used to run at high speed, after an initial phase of slow interpretation. Additionally, since a program spends most time executing a minority of its code, the reduced compilation time is significant. Finally, during the initial code interpretation, execution statistics can be collected before compilation, which helps to perform better optimization.

The terms "interpreted language" or "compiled language" don't make sense, because any programming language can be interpreted and/or compiled.
As for the existing implementations of Java, most involve a compilation step to bytecode, so they involve compilation. The runtime also can load bytecode dynamically, so some form of a bytecode interpreter is always needed.
That interpreter may or may not in turn use compilation to native code internally.
These days partial just-in-time compilation is used for many languages which were once considered "interpreted", for example JavaScript.

Java is compiled to bytecode, which then goes into the Java VM, which interprets it.

Java is a compiled programming language, but rather than compile straight to executable machine code, it compiles to an intermediate binary form called JVM byte code. The byte code is then compiled and/or interpreted to run the program.

Kind of both. Firstly java compiled(some would prefer to say "translated") to bytecode, which then either compiled, or interpreted depending on mood of JIT.

Java does both compilation and interpretation,
In Java, programs are not compiled into executable files; they are compiled into bytecode (as discussed earlier), which the JVM (Java Virtual Machine) then interprets / executes at runtime. Java source code is compiled into bytecode when we use the javac compiler. The bytecode gets saved on the disk with the file extension .class.
When the program is to be run, the bytecode is converted the bytecode may be converted, using the just-in-time (JIT) compiler. The result is machine code which is then fed to the memory and is executed.
Javac is the Java Compiler which Compiles Java code into Bytecode. JVM is Java Virtual Machine which Runs/ Interprets/ translates Bytecode into Native Machine Code. In Java though it is considered as an interpreted language, It may use JIT (Just-in-Time) compilation when the bytecode is in the JVM. The JIT compiler reads the bytecodes in many sections (or in full, rarely) and compiles them dynamically into machine code so the program can run faster, and then cached and reused later without needing to be recompiled. So JIT compilation combines the speed of compiled code with the flexibility of interpretation.
An interpreted language is a type of programming language for which most of its implementations execute instructions directly and freely, without previously compiling a program into machine-language instructions. The interpreter executes the program directly, translating each statement into a sequence of one or more subroutines already compiled into machine code.
A compiled language is a programming language whose implementations are typically compilers (translators that generate machine code from source code), and not interpreters (step-by-step executors of source code, where no pre-runtime translation takes place)
In modern programming language implementations like in Java, it is increasingly popular for a platform to provide both options.

Java is a byte-compiled language targeting a platform called the Java Virtual Machine which is stack-based and has some very fast implementations on many platforms.

Quotation from: https://blogs.oracle.com/ask-arun/entry/run_your_java_applications_faster
Application developers can develop the application code on any of the various OS that are available in the market today. Java language is agnostic at this stage to the OS. The brilliant source code written by the Java Application developer now gets compiled to Java Byte code which in the Java terminology is referred to as Client Side compilation. This compilation to Java Byte code is what enables Java developers to ‘write once’. Java Byte code can run on any compatible OS and server, hence making the source code agnostic of OS/Server. Post Java Byte code creation, the interaction between the Java application and the underlying OS/Server is more intimate. The journey continues - The enterprise applications framework executes these Java Byte codes in a run time environment which is known as Java Virtual Machine (JVM) or Java Runtime Environment (JRE). The JVM has close ties to the underlying OS and Hardware because it leverages resources offered by the OS and the Server. Java Byte code is now compiled to a machine language executable code which is platform specific. This is referred to as Server side compilation.
So I would say Java is definitely a compiled language.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.