How exactly does java compilation take place? - java

Confused by java compilation process
OK i know this: We write java source code, the compiler which is platform independent translates it into bytecode, then the jvm which is platform dependent translates it into machine code.
So from start, we write java source code. The compiler javac.exe is a .exe file. What exactly is this .exe file? Isn't the java compiler written in java, then how come there is .exe file which executes it? If the compiler code is written is java, then how come compiler code is executed at the compilation stage, since its the job of the jvm to execute java code. How can a language itself compile its own language code? It all seems like chicken and egg problem to me.
Now what exactly does the .class file contain? Is it a abstract syntax tree in text form, is it tabular information, what is it?
can anybody tell me clear and detailed way about how my java source code gets converted in machine code.

OK i know this: We write java source code, the compiler which is platform independent translates it into bytecode,
Actually the compiler itself works as a native executable (hence javac.exe). And true, it transforms source file into bytecode. The bytecode is platform independent, because it's targeted at Java Virtual Machine.
then the jvm which is platform dependent translates it into machine code.
Not always. As for Sun's JVM there are two jvms: client and server. They both can, but not certainly have to compile to native code.
So from start, we write java source code. The compiler javac.exe is a .exe file. What exactly is this .exe file? Isn't the java compiler written in java, then how come there is .exe file which executes it?
This exe file is a wrapped java bytecode. It's for convenience - to avoid complicated batch scripts. It starts a JVM and executes the compiler.
If the compiler code is written is java, then how come compiler code is executed at the compilation stage, since its the job of the jvm to execute java code.
That's exactly what wrapping code does.
How can a language itself compile its own language code? It all seems like chicken and egg problem to me.
True, confusing at first glance. Though, it's not only Java's idiom. The Ada's compiler is also written in Ada itself. It may look like a "chicken and egg problem", but in truth, it's only a bootstrapping problem.
Now what exactly does the .class file contain? Is it an abstract syntax tree in text form, is it tabular information, what is it?
It's not Abstract Syntax Tree. AST is only used by tokenizer and compiler at compiling time to represent code in memory. .class file is like an assembly, but for JVM. JVM, in turn, is an abstract machine which can run specialized machine language - targeted only at virtual machine. In it's simplest, .class file has a very similar structure to normal assembly. At the beginning there are declared all static variables, then comes some tables of extern function signatures and lastly the machine code.
If You are really curious You can dig into classfile using "javap" utility. Here is sample (obfuscated) output of invoking javap -c Main:
0: new #2; //class SomeObject
3: dup
4: invokespecial #3; //Method SomeObject."<init>":()V
7: astore_1
8: aload_1
9: invokevirtual #4; //Method SomeObject.doSomething:()V
12: return
So You should have an idea already what it really is.
can anybody tell me clear and detailed way about how my java source code gets converted in machine code.
I think it should be more clear right now, but here's short summary:
You invoke javac pointing to your source code file. The internal reader (or tokenizer) of javac reads your file and builds an actual AST out of it. All syntax errors come from this stage.
The javac hasn't finished its job yet. When it has the AST the true compilation can begin. It's using visitor pattern to traverse AST and resolves external dependencies to add meaning (semantics) to the code. The finished product is saved as a .class file containing bytecode.
Now it's time to run the thing. You invoke java with the name of .class file. Now the JVM starts again, but to interpret Your code. The JVM may, or may not compile Your abstract bytecode into the native assembly. The Sun's HotSpot compiler in conjunction with Just In Time compilation may do so if needed. The running code is constantly being profiled by the JVM and recompiled to native code if certain rules are met. Most commonly the hot code is the first to compile natively.
Edit: Without the javac one would have to invoke compiler using something similar to this:
%JDK_HOME%/bin/java.exe -cp:myclasspath com.sun.tools.javac.Main fileToCompile
As you can see it's calling Sun's private API so it's bound to Sun JDK implementation. It would make build systems dependent on it. If one switched to any other JDK (wiki lists 5 other than Sun's) then above code should be updated to reflect the change (since it's unlikely the compiler would reside in com.sun.tools.javac package). Other compilers could be written in native code.
So the standard way is to ship javac wrapper with JDK.

Isn't the java compiler written in java, then how come there is .exe file which executes it?
Where do you get this information from? The javac executable could be written in any programming language, it is irrelevant, all that is important is that it is an executable which turns .java files into .class files.
For details on the binary specification of a .class file you might find these chapters in the Java Language Specification useful (although possibly a bit technical):
Virtual Machine Startup
Loading of Classes and Interfaces
You can also take a look at the Virtual Machine Specification which covers:
The class file format
The Java Virtual Machine instruction set
Compiling for the Java Virtual Machine

The compiler javac.exe is a .exe file.
What exactly is this .exe file? Isn't
the java compiler written in java,
then how come there is .exe file which
executes it?
The Java compiler (at least the one that comes with the Sun/Oracle JDK) is indeed written in Java. javac.exe is just a launcher that processes the command line arguments, some of which are passed on to the JVM that runs the compiler, and others to the compiler itself.
If the compiler code is written is
java, then how come compiler code is
executed at the compilation stage,
since its the job of the jvm to
execute java code. How can a language
itself compile its own language code?
It all seems like chicken and egg
problem to me.
Many (if not most) compilers are written in the language they compile. Obviously, at some early stage the compiler itself had to be compiled by something else, but after that "bootstrapping", any new version of the compiler can be compiled by an older version.
Now what exactly does the .class file
contain? Is it a abstract syntax tree
in text form, is it tabular
information, what is it?
The details of the class file format are described in the Java Virtual Machine specification.

Well, javac and the jvm are typically native binaries. They're written in C or whatever. It's certainly possible to write them in Java, just you need a native version first. This is called "boot strapping".
Fun fact: Most compilers that compile to native code are written in their own language. However, they all had to have a native version written in another language first (usually C). The first C compiler, by comparison, was written in Assembler. I presume that the first assembler was written in machine code. (Or, using butterflies ;)
.class files are bytecode generated by javac. They're not textual, they're binary code similar to machine code (but, with a different instruction set and architechture).
The jvm, at run time, has two options: It can either intepret the byte code (pretending to be a CPU itself), or it can JIT (just-in-time) compile it into native machine code. The latter is faster, of course, but more complex.

The .class file contains bytecode which is sort of like very high-level Assembly. The compiler could very well be written in Java, but the JVM would have to be compiled to native code to avoid the chicken/egg problem. I believe it is written in C, as are the lower levels of the standard libraries. When the JVM runs, it performs just-in-time compilation to turn that bytecode into native instructions.

Short Explanation
Write code on a text editor, save it in a format that compiler understands - ".java" file extension, javac (java compiler) converts this to ".class" format file (byte code - class file). JVM executes the .class file on the operating system that it sits on.
Long Explanation
Always remember java is not the base language that operating system recognizes. Java source code is interpreted to the operating system by a translator called Java Virtual Machine (JVM). JVM cant understand the code that you write in a editor, it needs compiled code. This is where a compiler comes into picture.
Every computer process indulges in memory manipulation. We cant just write code in a text editor and compile it. We need to put it in the computer's memory, i.e save it before compiling.
How will the javac (java compiler) recognize the saved text as the one to be compiled? - We have a separate text format that the compiler recognizes, i.e .java. Save the file in .java extension and the compiler will recognize it and compile it when asked.
What happens while compiling? - Compiler is a second translator(not a technical term) involved in the process, it translates user understood language(java) into JVM understood language(Byte code - .class format).
What happens after compiling? - The compiler produces .class file that JVM understands. The program is then executed, i.e the .class file is executed by JVM on the operating system.
Facts you should know
1) Java is not multi-platform it is platform independent.
2) JVM is developed using C/C++. One of the reason why people call Java a slower language than C/C++
3) Java byte code (.class) is in "Assembly Language", the only language understood by JVM. Any code that produces .class file on compilation or generated Byte code can be run on the JVM.

Windows doesn't know how to invoke Java programs before installing a Java runtime, and Sun chose to have native commands which collect arguments and then invoke the JVM instead of binding the jar-suffix to the Java engine.

The compiler was originally written in C with bits of C++ and I assume that it still is (why do you think the compiler is written in Java as well?). javac.exe is just the C/C++ code that is the compiler.
As a side point you could write the compiler in java, but you're right, you have to avoid the chicken and egg problem. To do this you'd would typically write one or more bootstrapping tools in something like C to be able to compile the compiler.
The .class file contains the bytecodes, the output of the javac compilation process and these are the instructions that tell the JVM what to do. At runtime these bytecodes have are translated to native CPU instructions (machine code) so they can execute on the specific hardware under the JVM.
To complicate this a little, the JVM also optimises and caches machine code produced from the bytecodes to avoid repeatedly translating them. This is known as JIT compilation and occurs as the program is running and bytecodes are being interpreted.

.java file
compiler(JAVA BUILD)
.class(bytecode)
JVM(system software usually build with 'C')
OPERATING PLATFORM
PROCESSOR

Related

How to make sure my *jar or *class will run on MacOS?

Sorry if this is off-topic. I am a java beginner and know java is supposed to be cross-platform consistent. But I wonder if the fact that my jar file or *class file executes on Ubuntu guarantees it will run fine on MacOS?
I basically do the following to create *class and *jar. Two java classes, MAIN.java depends on SIDE.java and both java files include package classes; header (javac creates a folder named classes and puts MAIN.class and SIDE.class in there):
javac -d . SIDE.java MAIN.java
jar cvfe MAIN.jar classes/MAIN classes/*.class
I tried running both:
java classes.MAIN -read number.logs
and
java -jar MAIN.jar -read number.logs
and they both run fine on Ubuntu. Is this good enough and it will run on MacOS? (I don't have MacOS, is there a simulator that I could use in this case to check things?)
Yes this is good enough until and unless you use some environment variables i.e. platform specific features like
System.getProperty("os.name").
This cross platform portability is ensured by JVM. You can read more about it here.
Java is cross-platform for a reason. Your java code is translated into a java bytecode (your .class files) and the JVM (Java Virtual Machine) is a machine which is running your bytecode.
This JVM comes with the Java installation on your operating system. Thus if you can install the JRE (which includes the JVM) you can run your code.
There is one exception, the Java language sometimes uses os-standards for representation of graphics (awt classes), so your program, if you are using those classes, can look different, but work the same.
How, for example, an operating system is saving files, is handled by the JVM and shouldn't concern you as a high-level developer unless you are running into restrictions from the OS-Side. Those restrictions are, depending on the case, handled by Exceptions. If you handle exceptions in your code, there is little to nothing that will not work on one os if it worked on another.

How do I create a Unix load card for a java program

In Unix (or Linux), if I want to run a shell script, I can start the file with #!/bin/sh. With awk, I start the executable file with #!/usr/bin/awk -f and it treats the rest of the file as the program.
How do I do that with a Java program? I tried copying the simple.class to simple, putting #!/export/appl/Mail/java/bin/java at the top and making the file executable, but I get:
69> ./simple
Error: Could not find or load main class ..simple
I know this can be done with an executable shell script, or a C program that execs the java interpreter. Every other interpreter on Unix can be called with a #! load card, surely there's a way to do it with Java.
The most usual way is to have a wrapper for the Java. A shell script that executes the "java -jar yourJar.jar" or equivalent. And then you bundle the shell script and the windows equivalent bat file with your product.
Another option is to have a native launcher. For example you can see the Eclipse project which has gone that way. You download Eclipse and you have a native executable to run. The native executable will launch your Java program.
One more option is to compile Java into native code. For example you can use this commercial tool called Excelsior JET ( http://www.excelsior-usa.com/jet.html ).
The Java class file format doesn't allow text before the header, that's why the Java runtime no longer accepts the .class file after your modification.
On Linux you can use binfmt_misc to support additional executable formats, including .class files. It's basically a way to tell the Linux kernel how to detect executable formats and how to execute them.
This Archilinux Wiki article explains in more detail how to get this effect.
You cannot do it with a Java program. Firstly, the Java program needs to be compiled before execution. Secondly, even if compilation wasn't required, the hash sign is not a comment in Java, so that would be a syntax error.
I've never heard the term "load card". What you have is an "interpreter directive" designated by a shebang. This merely designates which interpreter the shell should invoke on a given script.
As for why C programs can be run directly in the shell, executables recognized by the operating system are passed to the loader. A Java class isn't an executable, at least to the OS anyway. So the shell must know which interpreter to pass control to.
But as you've noticed, the shebang doesn't work. The reason is that the class file is in a specific binary format that the JVM expects. Editing this file will break convention and lead to an error. Therefore, there is no way to do what you've asked.
However, you can create a "shortcut" to the program you want to run by creating an alias or perhaps writing a one-line Shell script to wrap the java command you need. This is the common practice as I understand it.
The other answers explain why you can't do what you are trying to do. However, if your shell is zsh, you can create a suffix alias. For example, to execute .jar files using java:
alias -s .jar="/usr/bin/java -jar"
Then, to execute blarg.jar, you just type ./blarg.jar. Of course, you must chmod +x your .jar file first.
Apart from the wrapper script and binfmt_misc solutions suggested by others, I'd like to suggest a potential solution which doesn't directly answer your question but maybe it solves your actual problem.
Since Scala does have an interpreter that can run code without you having to compile it first, and this code can reference any Java code, if your goal can be summed up as "using Java as a shell scripting language", you could use a .scala file as your starter script (which can include the shebang to be run with scala) from which you can call all your Java classes. This isn't any simpler tha having a bash-based starter script, but it's a good starting point to gradually move to scripting in Scala instead of Java in which case you can get rid of the need to compile to .class file in the first place.
The reason this doesn't work is that Java isn't really an interpreted language, it's partially compiled, partially interpreted.
The .java source code that you'd put the hashbang directive in needs to be compiled to a .class file before the java interpreter can run it. Comments are stripped out by the compiler, so there's no way to push a comment from the .java into the .class file. The .class file is "compiled" output in a specific format, so adding a hashbang directive to the top of it would break the format.
Java isn't really meant to be a scripting language - but some JVM languages are. For example Groovy supports hashbang and so does Clojure.

Java and windows? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Compiling a java program into an exe
How can I convert my java program to an .exe file ?
hi everyone . there are many programs such intellij idea or jedit which they have written in JAVA ,
How they have compiled the code into the exe file ?
How can ,I compile my java code to exe file and run it on windows?
And Why javac give us class file ? what is it ?
how about searching stack overflow:
How do I create an .exe for a Java program?
also, like the answer points out launch4j is the program you want to use. i have used it and it works well.
mkoryak took care of how to create an .exe for a Java program, but let me address "What is a Class file"?
In general, you don't have to compile Java down to byte code in the form of an exe (or anything else). You compile .java files into .class files. These files can't be run by an OS directly - rather, they are run by the Java Virtual Machine (JVM). The JVM is essentially a layer between your Java program and the underlying OS. The JVM will, on your behalf, interpret your .class files and pass the instructions contained in them to the underlying OS (Windows, OS X, Linux, etc).
So what does this do for you?
For one, it makes Java a little slower than other languages that are compiled right down to byte code. This was initially a big knock against Java but, as JVM interpreters have gotten better and people have migrated to slower platforms (i.e. Web Apps), this minor latency issue has become less and less of a problem.
On the flip side, however, you can write your Java code for a single OS - the JVM. You don't need to write one version of your app that targets Windows and another that targets OS X, and another, and another... Rather, you write one version of your application and you target that single OS (the JVM). The fact that there are versions of the JVM for every major OS out there means that, as soon as you write a Java program, you can run it on any platform (in theory, anyway).
That's one of the really cool things about Java, and also the meaning behind the often-heard tagline: "Write once, run anywhere."
So that's what the .class files are all about.
In Java, you write .java files and use a Java compiler (such as javac) to create .class files. These .class files are then run by a Java runtime (such as java), which executes your program on the JVM.
I hope that helps and, maybe, you don't need to compile your Java program down to an exe, at all.

Byte code to Java source code

Is it possible to convert a .class file to .java file?
How can this be done?
What about the correctness of the code extracted from this option?
It is possible. You need a Java Decompiler to do this.
You'll find mostly it'll do a surprisingly good job. What you'll get is a valid .java file which will compile to the .class file but this .java file won't necessarily be the same as the original source code. Things like looping constructs might come out differently, and anything that's compile time only such as generics and annotations won't be re-created.
You might have a problem if the code has been obfuscated. This is a process which alters the class files to make them hard to decompile. For example, class and variable names are changed to all be similar so you'll end up with code like aa.a(ab) instead of employee.setName(name) and it's very hard to work out what's going on.
I remember using JAD to do this but I don't think this is actively maintained so it may not work with never versions of Java. A Google search for java decompiler will give you plenty of options.
This is possible using one of the available Java decompilers. Since you are working from byte-code which may have been optimised by the compiler (inlining static variables, restructing control flow etc) what you get out may not be exactly the same as the code that was originally compiled but it will be functionally equivalent.
Adding to the previous answers: recently, a new wave of decompilers has been coming, namely Procyon, CFR, JD, Fernflower
Here's a list of modern decompilers as of March, 2015:
Procyon
CFR
JD
Fernflower
You may test above mention decompilers online, no installation required and make your own educated choice.
Java decompilers in the cloud: http://www.javadecompilers.com/
It is always possible. Search for "java disassembler".
But source code comments and temporary variables will not be available.
If you decompile a class and see the code is too complex with variable names and method names are like a,b,c... that means that the project is obfuscated.
Not exactly a decompiler, but the JDK contains javap, a disassembler:
javap -c org.example.MyClass
Depending on your usecase, it might still be interesting to know or use.
Note that results of class file decompilation depend on the included information within a class file. If I remember correctly, included debug information (see -g flag of javac) is important, especially for naming of variables and the like.
DJ is the easy to use java decompiler . Just open any .class file and it will show you its java code.
Also, you can use jadClipse plugin for eclipse to directly decompile the .class into .java
What about the correctness of the code extracted from this option?
In any case, the code which will be generated by any java decompiler will not be the same as it was written in orginal java class. As it just decodes the bytecode into java code. The only thing you can be sure is, that the output will be same as the output of orginal java code.

How can I open Java .class files in a human-readable way?

I'm trying to figure out what a Java applet's class file is doing under the hood. Opening it up with Notepad or Textpad just shows a bunch of gobbledy-gook.
Is there any way to wrangle it back into a somewhat-readable format so I can try to figure out what it's doing?
Environment == Windows w/ VS 2008 installed.
jd-gui is the best decompiler at the moment. it can handle newer features in Java, as compared to the getting-dusty JAD.
If you don't mind reading bytecode, javap should work fine. It's part of the standard JDK installation.
Usage: javap <options> <classes>...
where options include:
-c Disassemble the code
-classpath <pathlist> Specify where to find user class files
-extdirs <dirs> Override location of installed extensions
-help Print this usage message
-J<flag> Pass <flag> directly to the runtime system
-l Print line number and local variable tables
-public Show only public classes and members
-protected Show protected/public classes and members
-package Show package/protected/public classes
and members (default)
-private Show all classes and members
-s Print internal type signatures
-bootclasspath <pathlist> Override location of class files loaded
by the bootstrap class loader
-verbose Print stack size, number of locals and args for methods
If verifying, print reasons for failure
As pointed out by #MichaelMyers, use
javap -c <name of java class file>
to get the JVM assembly code. You may also redirect the output to a text file for better visibility.
javap -c <name of java class file> > decompiled.txt
You want a java decompiler, you can use the command line tool javap to do this. Also, Java Decompiler HOW-TO describes how you can decompile a class file.
you can also use the online java decompilers available. For e.g. http://www.javadecompilers.com
Using Jad to decompile it is probably your best option. Unless the code has been obfuscated, it will produce an okay result.
what you are looking for is a java de-compiler. I recommend JAD http://www.kpdus.com/jad.html It's free for non commercial use and gets the job done.
Note: this isn't going to make the code exactly the same as what was written. i.e. you're going to lose comments and possibly variable names, so it's going to be a little bit harder than just reading normal source code. If the developer is really secretive they will have obfuscated their code as well, making it even harder to read.
cpuguru, if your applet has been compiled with javac 1.3 (or less), your best option is to use Jad.
Unfortunately, the last JDK supported by JAD 1.5.8 (Apr 14, 2001) is JDK 1.3.
If your applet has been compiled with a more recent compiler, you could try JD-GUI : this decompiler is under development, nevertheless, it generates correct Java sources, most of time, for classes compiled with the JDKs 1.4, 1.5 or 1.6.
DarenW, thank you for your post. JD-GUI is not the best decompiler yet ... but I'm working on :)
jd-gui "http://code.google.com/p/innlab/downloads/detail?name=jd-gui-0.3.3.windows.zip&can=2&q=" is the best and user friendly option for decompiling .class file....
That's compiled code, you'll need to use a decompiler like JAD: http://www.kpdus.com/jad.html
You need to use a decompiler. Others have suggested JAD, there are other options, JAD is the best.
I'll echo the comments that you may lose a bit compared to the original source code. It is going to look especially funny if the code used generics, due to erasure.
JAD and/or JADclipse Eclipse plugin, for sure.
If the class file you want to look into is open source, you should not decompile it, but instead attach the source files directly into your IDE. that way, you can just view the code of some library class as if it were your own
As suggested you can use JAD to decompile it and view the files. To make it easier to read you can use the JADclipse plugin for eclipse to integrate JAD directly to eclipse or use DJ Java Decompiler which is much easier to use than command line JAD
JAD is an excellent option if you want readable Java code as a result. If you really want to dig into the internals of the .class file format though, you're going to want javap. It's bundled with the JDK and allows you to "decompile" the hexadecimal bytecode into readable ASCII. The language it produces is still bytecode (not anything like Java), but it's fairly readable and extremely instructive.
Also, if you really want to, you can open up any .class file in a hex editor and read the bytecode directly. The result is identical to using javap.
There is no need to decompile Applet.class. The public Java API classes sourcecode comes with the JDK (if you choose to install it), and is better readable than decompiled bytecode. You can find compressed in src.zip (located in your JDK installation folder).
CFR - another java decompiler is a great decompiler for modern Java written i Java 6.

Categories