Is it possible to transform LLVM bytecode into Java bytecode?

Is it possible to transform LLVM bytecode into Java bytecode? - java

I have heard that google app engine can run any programming language that can be transformed to Java bytecode via it's JVM. I wondered if it would be possible to convert LLVM bytecode to Java bytecode as it would be interesting to run languages that LLVM supports in the Google App Engine JVM.

It does now appear possible to convert LLVM IR bytecode to Java bytecode, using the LLJVM interpreter.
There is an interesting Disqus comment (21/03/11) from Grzegorz of kraytracing.com which explains, along with code, how he has modified LLJVM's Java class output routine to emit non-monolithic Java classes which agree in number with the input C/C++ modules. He suggests that his technique seems to avoid the excessively long 'compound' Java Constructor method argument signatures usually generated by LLJVM, and he provides links to his modifications and examples.
Although LLJVM doesn't look like it's been in active development for a couple of years now, its still hosted on Github and some documentation can still be found at its former repository at GoogleCode:
LLJVM # Github
LLJVM documentation # GoogleCode
I also came across the 'Proteuscc' project which also utilises LLVM to output Java Byte code (it suggests that this is specifically for C/C++, although I assume the project could be modified or fed LLVM Intermediate Representation (IR)). From http://proteuscc.sourceforge.net:
The general process of producing a Java executable with Proteus then
can be summarised as below.
Generate human readable representation of the LLVM intermediate
representation (ll file)
Pass this ll file as an argument to the
proteus compilation system
The above will produce a Java jar file
which can be executed or used as a library
I've extended a bash script to compile the latest versions of LLVM and Clang on Ubuntu, it can found be as a Github Gist,here.
[UPDATE 31/03/14] - LLJVM has seemed to have been dead for somewhile, however Howard Chu (https://github.com/hyc) looks to have made LLJVM compatible with the latest version of LLVM (3.3). See Howard's LLJVM-LLVM3.3 branch at Github, here

I doubt you can, at least not without significant effort and run-time abstractions (e.g. building half a Von Neumann machine to execute certain opcodes). LLVM bitcode allows the full range of low-level unsafe "do what you want but we won't clean up the mess" features, from direct, raw, constructor-free memory allocation up to completely unchecked casts - real casts, not conversions -you can take i32 and bitcast it to to a %stuff * if you wish. Also, JVMs are heavily geared towards objects and methods, while the LLVM guys are lucky they have function pointers and structs.
On the other hand, it seems that C can be compiled to Java bytecode and LLVM bitcode can be compiled to Javascript (although many features, e.g. dynamic loading and stdlib functions, are lacking), so it should be possible, given enough effort.

Late to the discussion: Sulong executes LLVM IR on the JVM. It creates executable nodes (which are Java objects) from the LLVM IR instead of converting the LLVM IR to Java bytecode. These executable nodes form an AST interpreter. You can check out the project at https://github.com/graalvm/sulong or read a paper about it at http://dl.acm.org/citation.cfm?id=2998416. Disclaimer: I'm working on this project.

Read this: http://vmkit.llvm.org/. I am not sure that it will help you but it seems to be relevant.
Note: This project is not more maintained.

Related

Is there a machine independent compiler?

I know java generates bytecode but the JVM needs to interpret it everytime during runtime.
Does a compiler exist that generates machine independent code, lets say for C.
Then at a target machine this is permanently converted to its local machine code once rather than converting for each run?
Does this solve why many developers develop for windows but no linux?

Not really, but some stuff comes close.
C is regarded as low level as possible while being portable by some. (This, of course, excludes all APIs). The GHC Haskell compiler uses internally a very c-like language in that regard c--, that might be very close to the machine in depended code you are looking for.
Most modern compilers do have such intermediate Code, for example LLVM. There is even a assembler like (so even more low leven than C) for that. But note that LLVM intermediate code is not portable, as for example the pointer size has to be known at compile time. (all the sizeofs in C will fixed at this time)
But there is a IMO more simple solution: Compile the code for any platform, and if you are on a different platform you a dynamic recompiler like QEMU. That still does negatively impact performance.

It's certainly possible, and interpreters exist for C and C++. However, projects using these languages will often use platform-specific code (like the Windows APIs) which stops them from being portable. Interpreted languages generally supply platform-independent core libraries.
Modern compilers – like Clang, LLVM and GCC – all compile your source code to an intermediate language. This means that the same code-level optimizations can be applied to any language that the compiler can convert, and it also enables tools like Emscripten which can effectively compile C to JavaScript! I believe it was used for the recent JavaScript Unreal Engine demo.

A Java example: Android 4.4 introduced a new experimental runtime virtual machine, ART (Android Runtime).
ART straddles an interesting mid-ground between compiled and interpreted code, called ahead-of-time (AOT) compilation. Currently with Android apps, they are interpreted at runtime (using the JIT), every time you open them up. This is slow. (iOS apps, by comparison, are compiled native code, which is much faster.) With ART enabled, each Android app is compiled to native code when you install it. Then, when it’s time to run the app, it performs with all the alacrity of a native app.
Source

Compilation vs translation, "compiling" Java to bytecode?

My understanding is like this, definitions:
Translation - having code in some language, generating code in some other language.
Compilaton - translation to a machine code.
Machine code - direct instructions for CPU.
Now, from docs.oracle.com:
javac - Java programming language compiler
Compiler...? I think it is Java translator, because it is generating a code, that is not a machine code. Bytecode needs interpreter (JVM) to run, so it's definetely not a machine code.
From Wikipedia:
Java applications are typically compiled to bytecode
Similarly. According to definitions, I would say that Java is traslated to bytecode. There are many more examples on the Internet, I think there is confusion about that or I'm just missing something.
Could you please clarify this?
What is the difference between translation and compilation?

It's all a matter of definitions, and there's no single accepted definition for what "compilation" means. In your eyes, compilation is transforming a source code in some language to native; so a transformation process which doesn't generate machine code shouldn't be called "compilation". In my eyes (and apparently, the javac documentation writers' eyes as well), it should.
There are actually a lot of different terms: translation, compilation, decompilation, assembly, disassembly, and more.
Personally, I'd say it makes sense to group all of these terms under "compilation", because all these processes have a lot in common:
They transform code in one formal language to code in another formal language.
They try to preserve the semantics of the input code as much as possible.
They all have a very similar design to each other, with a front-end, a back-end, and a possible optimizer in the middle (learn more about compiler structure here). I've seen the entrails of both javac and native compilers and they are relatively similar.
In addition, your definition of "produces native code" is problematic:
What about compilers that can generate assembly but don't bother transforming that to machine code, leaving this to an external program (commonly called "assembler")? Would you deny them this definition of "compilers" because of that last, insignificant-in-comparison step?
How do you even classify "machine code"? What if tomorrow a processor which can run Java Bytecode natively is created?
But these are just my opinions. I think that out there, the most accepted definitions are that:
Compilation is transforming code in a higher-level language to a lower-level one. Examples: Java to Java Bytecode, or C to x86 machine code.
Decompilation is transforming a code in a lower-level language to a higher-level one - in effect, the opposite of compilation. Examples: Java Bytecode to Java.
Translation or source-to-source compilation is transforming a code in some language to another language of comparable "level". Examples: ARM to x86, or C to Java. When the two languages are actually different versions of the same language (e.g. Javascript 6 to Javascript 5), the term transpiler is also used.
Assembly is transforming code in some assembly language to machine code.
Disassembly is either a synonym to decompilation or the opposite of assembly, depending on the context.
Under these definitions, javac could definitely be considered as a compiler. But again, it's all in the definitions: from a technical standpoint, many of these actions have a lot in common.

The result of javac is machine code. The fact that the machine is virtual and not physical is not relevant (otherwise, you could argue that compiling code in x86 was translation if you were a Mac user, since the x86 code is not Mac machine code).

"A compiler is a computer program (or set of programs) that transforms
source code written in a programming language (the source language)
into another computer language (the target language, often having a
binary form known as object code)."
http://en.wikipedia.org/wiki/Compiler
So no a compilation doesn't mean that the output is in machine code.
For example early C++ compilation would generate a C program, that then needed to be compiled again into machine code. Of course any good compiler would hide these separate steps from the user, but they are still there.
Nowday I know at least of the NesC compiler that does the same procedure.
A machine that runs JVM bytecode can be built, actually some chapters of Structured Computer Organization, from A. Tanenbaum describe how to do that.
http://www.amazon.com/Structured-Computer-Organization-5th-Edition/dp/0131485210

Compiler...? I think it is Java translator, because it is generating a code, that is not a machine code. Bytecode needs interpreter (JVM) to run, so it's definetely not a machine code.
The JVM is the Java Virtual Machine, which is a machine, and its machine code is called Java byte code. (Each "byte" in byte code is a JVM machine instruction.)
You can learn more by reading the JVM specification. Start here: https://docs.oracle.com/javase/specs/
Also, the JVM is not an interpreter; it is a definition of a machine. Some implementations of the JVM include just-in-time (JIT) compilers, ahead-of-time (AOT) compilers, adaptive compilers (e.g. Hotspot), and yes, even interpreters (although I haven't personally seen a Java interpreter in 20 years now).
What most people starting out in compilers don't understand is that there are no compilers anymore that generate "machine code". They all generate some intermediate forms (as defined by a particular OS, for example) that are then loaded and munged by the OS.
For the most part today, no compiled program is capable of being executed without a large, bloated OS to munge it and slice it and glue it into pieces that the OS owns and manages.
Not even C compiles code into actual machine executables nowadays without coloring outside the lines. (Write an actual, working boot loader from scratch, and get back to me.)

Why doesn't java have a non-bytecode compiler? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why isn't more Java software compiled natively?
I know that Java is byte code compiled, but when using the JIT, it will compile the 'hotspots' to native code. Why is there not an option to compile a program to native code?

There is an option to compile Java source code files in binary code it is called GCJ and come from Free Software Foundation.
It is not officially supported by Sun/Oracle but i've used the compiler and it do a great job ;)

Java, as a language, can be implemented, like any language, in many ways. This include full interpretation, bytecode compilation, and native compilation. See Java language specification
The VM specification defines how bytecode should be loaded and executed. It defines the compiled class format, and the execution semantics, e.g. threading issue, and what is called the "memory model". See Java VM specification
While orignally meant to go together, the language and the VM are distinct specifications. You can compile another language to Java bytecode and run it on top of the JVM. And you can implement the Java language in different way, notably without a VM, as long as you follow the expected semantics.
Of course, both are still related, and certain aspects of Java might be hard to support without bytecode interpretation at all, e.g. custom ClassLoader that return bytecode (see ClassLoader.defineClass). I guess the bytecode would need to be JIT'ed immediatly into native code, or maybe are not supported at all.

Which platform native code should it compile to?
Windows, Mac, Linux?
What if the developer works on a different platform than the application is going to run on?
What if the application platform changes, either in the server room or on the desktop?
I don't see the benefit, the JVM's nowadays seem to be to be fast enough for very general purpose needs.

There are several products out there to compile java programs to native code, however they are imperfect, and not at all like the JIT compiler. Some differences include:
Write Once Run Everywhere - it will only work on the target you compile it for.
Dynamic code - you cannot load jars or other Java code at runtime, which is often a feature of application servers, GUI builders and the like.
Runtime profiling - a lot of JIT compiler action involves understanding what the code is doing at runtime, not what it could potentially do under a static analysis, meaning that JIT can outperform a natively compiled application in the right circumstances.
Cannot support all Java features. Things like reflection aren't going to be very meaningful in a compiled program.
Large footprint - when it is compiled to native code, all of the libraries the JVM gives you have to be bundled into the package, causing a very large footprint. It is a tricky problem to figure out what can be left out.
So it is possible, for a certain subset of applications, to compile to native code, but as VMs have gotten faster and faster, and issue #5 above has not really been improved (although project Jigsaw should help with that), it is not a very compelling option for real world applications.

Because it is enough to have byte-code compiled.
If you would compile your own code - you had also compile all libraries.
And it is real problem from two point of view:
1. licensing - most of the code wouldn't be changed
2. you had 'recompile' megatons of code :-)

This was a decision made by Sun to not allow this because they wanted to position Java as being inherently multi-platform. As such, they wanted to ensure that any Java application compiled would run on any platform with a JVM. This prevents there from being Java binaries available on-line which don't run on certain hardware or operating systems.

Is it possible to translate x86 32 bit assembly code into equivalent JVM byte-code and execute it?

Is it possible to translate x86 32 bit assembly code into equivalent JVM byte-code and execute it?
I have a Fortran library in .so form. I want to perform an assembly dump on it using GDB and then using a translator of some sort turn it into valid JVM bytecode.
Is this even possible?
For simplicity sake, let's assume I don't care about platform independence anymore. Both assembly and bytecode will run on the same machine.

Possible is nearly everything but I don't think you will find a tool that does this for you - therefore you would have to do it manually which could take you weeks or months depending on the size of the library.
Of course this may rise legal problems if the compiled library is a commercial one or copyright protected.
A better approach seems to me to develop a small Java Native Interface (JNI) wrapper in C/C++ and link the library to it. Then you will be able to call library functions from Java.
If you can get the Fortran source code you could try a JVM-Fortran compiler like Fortran-to-Java. Then you would get native JVM byte code.

Running/Interpreting C on top of the JVM?

Is there a way to run plain c code on top of the JVM?
Not connect via JNI, running, like you can run ruby code via JRuby, or javascript via Rhino.
If there is no current solution, what would you recommend I should do?
Obviously I want to use as many partials solutions as I can to make it happen.
ANTLR seems like a good place to start, having a full "ANSI C" grammar implementation...
should I build a "toy" VM over the JVM using ANTLR generated code?

Updated 2012-01-26: According to this page on the company's site the product has been bought out and is no longer available.
Yes.
Here's a commercial C compiler that produces JVM bytecode.

There are two other possibilities, both open-source:
JPC emulates an entire x86 pc within the JVM, and is capable of running both DOS and Linux.
NestedVM provides binary translation for Java Bytecode. This is done by having GCC compile to a MIPS binary which is then translated to a Java class file. Hence any application written in C, C++, Fortran, or any other language supported by GCC can be run in 100% pure Java with no source changes.

It seems that LLJVM can also meet your requirement.
LLJVM: Source code is first compiled to LLVM intermediate representation (IR) by a frontend such as llvm-gcc or clang. LLVM IR is then translated to Jasmin assembly code, linked against other Java classes, and then assembled to JVM bytecode.

As of 2016 there is a young but promising option called gcc-bridge. Its intend is to leverage the JVM's implementation of R. The goal is to use R-libraries written in C or Fortran. But gcc-bridge can be used independently as a regular maven plugin. Also see the gcc-brigde-example.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.