How to compile code generated by a Java or C++ App - java

I've been learning compiler theory and assembly and have managed to create a compiler that generates x86 assembly code.
How can I take this assembly code and turn it into a .exe? Is there some magical API or tool I have to interact with? Or is it simpler than I think?
I'm not really sure what's in a .exe, or how much abstraction lies between assembly code and the .exe itself.
My 'compiler' was written in Java, but I'd like to know how to do this in C++ as well.
Note that if I take the generated assembly, it compiles to a .exe just fine for example with vc++.
Edit: To be more precise, I already know how to compile assembly code using a compiler. What I'm wanting is to have my program to basically output a .exe.

It looks like you need to spawn an assembler process and a linker process. On UNIX, it's as simple as invoking the fork() function, which would create a new process, and the exec() function, specifying the assembler and the linker executable names as the function's parameter, with suitable arguments to those executables, which would be the names of your generated assembly and object files. That's all you'd need to do on a UNIX system.

Normally you use an assembler and a linker to create an exe file. There is no magic involved. The different parts are assembled, a header and other boilerplate is added so the OS knows where the bootstrap code is located for the program and to organize the memory.
The VC++ compiler does that under the hood. You might consider playing a bit on Linux as you can better see the machinery working on Unix platforms. It is fundamentally the same, but it s just difficult to look through to UI on windows.

In principle, each assembly line corresponds to a machine instruction which is just a few bytes in the exe file. You can find them in the specs for the processor. So you can write your own asembeler if you know the codes and the exe format (header, relocation info etc).
How Do Assemblers Map x86 Instruction Mnemonics to Binary Machine Instructions?

Related

Why is Bytecode not human-readable?

I'm confused about a certain topic:
When you compile Java or Python, you get bytecode which will run on the respective VMs. In a previous question I had asked why, when you open a .pyc or .class file in a text editor, it appears as gibberish and not like readable bytecode (LOAD, STORE operations etc).
Now the answer I got at the time based around the argument of "That's like saying if you opened an .exe file and expected to see x86 assembly" and they made the analogy that bytecode that I've seen is the "assembly" version of the real bytecode which is not readable.
This would be okay and make sense if not for one thing. You can't compare an exe file to a bytecode file. An exe file is ALREADY compiled to machine code. A bytecode file is NOT. A bytecode file is fed to a VM which then interprets it (usually with JIT).
That means that whoever wrote the JVM for instance, (which is just a piece of software itself), would need to write a bytecode-interpreter. And I really doubt they wrote an interpreter to handle the following:
Java .class file:
I could be wrong and maybe they DID write an interpreter to handle this form of bytecode for some odd reason, but it doesn't seem likely. However, if the JVM handles the "assembly" version of the bytecode, then that would mean the cycle is
.java -> .class (unreadable) -> .class (readable right as it enters the JVM) There's almost a meaningless step in between.
I'm just really confused at this point.
They did write an interpreter for this form of bytecode. They read it as bytes, of course, not ASCII characters, which makes it more usable. But, for example, each instruction code takes only one byte, not e.g. five to write store.
The goal was to have something compact in memory usage, but not actually compiled to machine code that would be specific to only one device. Java bytecode is more or less its own form of machine code.
If you would like to read it, however, use the javap command to decompile it to a more readable form.
Bytecode is the "machine code" for a virtual machine. As such, it has much the same goals and restrictions as "real" machine code - compact, efficient decoding, etc.
The fact that bytecode is executed by a virtual machine rather than by a "real" machines is not particularly relevant.

Am I able to compile a C++ source code via a string?

I understand that java source codes can be compiled via a string using JavaCompiler. With a long String containing my java code, I can test if my code is compilable.
Source: http://docs.oracle.com/javase/6/docs/api/javax/tools/JavaCompiler.html
An Example: http://www.java2s.com/Code/Java/JDK-6/CompileaJavafilewithJavaCompiler.htm
My question is: I have a long string that contains a compilable C++ code. Am i able to do something similar using some form of java library or is it just impossible?
Thanks
Edit 1: As requested, the String can be user-generated (typed in a GUI - JTextArea) OR read from a .cpp file..
If you are using VisualStudio you can use a pre-build event to call the c++ compiler and compile a file. If you are having an error your Java project will not build. The idea here is that you are making an event happen before the build. You can make that event whatever you want, like for example, checking if a file compiles,
Here is a tutorial: https://dillieodigital.wordpress.com/2012/11/27/quick-tip-aborting-builds-in-visual-studio-based-on-file-contents/
In the part where he enters a script to run, that's where you would put your call to your favorite c++ compiler. He is not checking the same type of file but the principle is the same, he is checking a file.
If you are running Windows, which, you would be if you are using VisualStudio, this below will be helpful.
Compiling a Native C++ Program on the Command Line:
https://msdn.microsoft.com/en-us/library/ms235639.aspx
So basically, you're making the VisualStudio project do a pre-build event which is a call on the command line to the c++ compiler to check your file before you build your Java project.
Hope that helps.
If you link to the LLVM library, there are facilities for this.
But beware that LLVM does not provide a stable API, so it is difficult to construct examples that continue to work. Even using the C API (which still requires updating the SONAME), I have had breakage with every single LLVM release.
My question is: I have a long string that contains a compilable C++ code.
The C++11 standard does not mention any function able to do that (compile C++ code in some string). And I know no library (except perhaps libclang, but I don't know if it able to compile a string) able to do that.
Actually, a C++ compiler practically needs to make a lot of optimizations (if you want the code to run not too slowly), so will spend some significant time (relative to computer speed, e.g. several tenth of seconds even for a small C++ source code) to compile your generated C++ code. And heavily templatized C++ code may take a lot of time (even an infinite amount in pathological cases, since C++ templates are accidentally Turing Complete) to be compiled.
So practically speaking, you gain no advantage to not writing a C++ source file. Some compilers (e.g. GCC on Linux with g++ -x c++ /dev/stdin) are able to compile C++ code from their standard input, so you could use (on POSIX systems) popen to feed them.
Just write your C++ code into some temporary C++ source file (perhaps in some tmpfs file system, if you want to avoid disk IO) or perhaps a pipe(7) or fifo(7)... and fork a compilation. On Linux and Posix systems, you could compile (e.g. with g++ -Wall -fPIC -O -shared /tmp/temporary1234.cc -o /tmp/temporary1234.so) that code to a "plugin" or shared object that your main program could later dlopen
If you are generating the C++ code, you could consider using (instead of generating C++ source then compiling it), some Just-In-Time compilation library like gccjit, LLVM, libjit, lightning, asmjit etc... Then you'll generate some AST-like internal representation (specific to the JIT library!) of the code.

Does the Java interpreter convert the byte-code files to an executable file?

I had this question in software course:
True/False: The Java interpreter converts files from a byte-code format to executable files.
I think the statement is false. In class, they said the interpreter "executes" the byte-code files, on the system using the JVM (I didn't listen too much but I think I got it fairly correctly), but as I understood, it doesn't actually convert it to executable files (which presumably are .exe files), just runs it on the system directly.
"True/False: The Java interpreter converts files from a byte-code format to executable files".
The answer is false1.
The Java interpreter is one of the two components of the JVM that is responsible for executing Java code. It does it by "emulating" the execution of the Java Virtual Machine instructions (bytecodes); i.e. by pretending to be a "real" instance of the virtual machine.
The other JVM component that is involved is the Just In Time (JIT) compiler. This identifies Java methods that have been interpreted for a significant amount of time, and does an on-the-fly compilation to native code. This native code is then executed instead of interpreting the bytecodes.
But the JIT compiler does not write the compiled native code to the file system. Instead it writes it directly into a memory segment ready to be executed.
Java's interpret / JIT compile is more complicated, but it has a couple of advantages:
It means that it is not necessary to compile bytecodes to native code before the application can be run, which removes a significant impediment to portability.
It allows the JVM to gather runtime statistics on how the application is functioning, which can give hints as to the best way to optimize the native code. The result is faster execution for long-running applications.
The downside is that JIT compilation is one of the factors that tends to make Java applications slow to start (compared with C / C++ for example).
1 - ... for mainstream Java (tm) compilers. Android isn't Java (tm)2. Note that the first version of Java was interpreter only. I have also seen Java (not tm) implementations where the native code compilers were either ahead-of-time or eager ... or a combination of both.
2 - You are only permitted by Oracle to describe your "java-like" implementation as Java(tm) if it passes the Java compliance tests. Android wouldn't.
The Java compiler converts the source code to bytecode. This bytecode is then interpreted (or just-in-time-compiled and then executed) by the JVM. This bytecode is a kind of intermediate language that has not platform dependence. The virtual machine then is the layer that provides system specific functionality.
It is also possible to compile Java code to native code, a project aiming this is for example the GCJ.
To answer your question: no, a normal Java compiler does not emit an executable binary, but a set of classes that can be executed using a JVM. You can read more about this on Wikipedia.
False for regular JVMs. No executable files are created. The conversion from bytecode to native code for that platform takes place on the fly during execution. If the program is stopped, the compiled code is gone (was in memory only).
The new Android JVM ART does compile the bytecode into executables before to have better startup and runtime behavior. So ART creates files.
ART straddles an interesting mid-ground between compiled and interpreted code, called ahead-of-time (AOT) compilation. Currently with Android apps, they are interpreted at runtime (using the JIT), every time you open them up. This is slow. (iOS apps, by comparison, are compiled native code, which is much faster.) With ART enabled, each Android app is compiled to native code when you install it. Then, when it’s time to run the app, it performs with all the alacrity of a native app. http://www.extremetech.com/computing/170677-android-art-google-finally-moves-to-replace-dalvik-to-boost-performance-and-battery-life
The answer is false
reason:
JIT-just in time compiler and java interpreter does a same thing in different way but as per performance JIT wins. The main task is to convert the given bytecode into machine dependent Assembly language as of abstract information.Assembly level language is a low level language which understood by machine's assembler and after that assembler converts it to 01010111.....

Is there a similar thing to a JAR file in C++?

The question may sound a little vague, but I wasn't sure how else to phrase it. I was wondering if you could make a C++ file that was similar to a JAR file (so it runs independently of eclipse/cmd). I was also wondering if there is a similar thing to a Frame/JFrame in C++. Not a problem here, I am merely curious.
NOTE: I am a C++ noob, but have been programming in java for over a year.
Your Java code is translated into JAR which is bytecode which is translated in run-time to machine code. You can run this file on each platform.
C++ is translated to machine code at the time of compilation. You can't transport compiled executable between platforms. For each platform you need to compile source files again.
Short answer: No.
Longer answer: No, because C++ is not platform independent. Even if you use only standard library functions like the standard containers, the executable you create can not run on other systems, sometimes not even on other version of the same platform (Linux is known for this).
Yes of course. C++ is a platform independent programming language, which means that you can compile every simple program on every platform, as long as you don't use platform specific features. This means you can compile it on every platform. But that does not mean that your executable is cross-platform, like a Java JAR.
When you compile it, you create a native executable.
In Windows, it compiles to an exe. In Linux / OS X (unix) an extension-free file.
So, it depends on what you really want. (Since your question is a bit vague)
And if you are searching for a single cross-platform solution: the answer is no.
If you are searching for a way to make your application start without terminal: the answer is yes. (But I don't know how, since I never used Windows)

How to use MATLAB code in mapper (Hadoop)?

I have a matlab code that processes images. I want to create a Hadoop mapper that uses that code. I came across the following solutions but not sure which one is best (as it is very difficult to install matlab compiler runtime on each slave node in hadoop for me):
Manually convert that matlab code into OpenCV in C++ and call its exe/dll (and supply it appropriate parameters) from the mapper. Not sure, since the cluster has Linux installed on every node instead of Windows.
Use Hadoop Streaming. But Hadoop streaming requires an executable as the mapper and the executable of matlab also requires Matlab Compiler Runtime which is very difficult to install on every slave node.
Convert it automatically into C/C++ code and create its exe automatically (not sure whether this is right because either the exe will require the matlab runtime to run or there can be compiler issues in the conversion which are very difficult to fix )
Use Matlab Java Builder. But the jar file thus created will need the runtime too.
Any suggestions?
Thanks in advance.
As you are probably already suspecting, this is going to be inherently difficult to do because of the runtime requirement for MATLAB. I had a similar experience (having to distribute the runtime libraries) when attempting to run MATLAB code over Condor.
As far as the options you are listing are concerned, option #1 will work best. Also, you will probably not be available to avoid working with Linux.
However, if you don't want to lose the convenience provided by higher level software (such as MATLAB, Octave, Scilab and others) you could try Hadoop streaming in combination with Octave executable scripts.
Hadoop streaming does not care about the nature of the executable (whether it is an executable script or an executable file, according to this (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html)).
All it requires, is that it is given an "executable" that in addition can a) read from stdin, b) send output to stdout.
GNU Octave programs can be turned into executable scripts (in Linux) with the ability to read from stdin and send the output to stdout (http://www.gnu.org/software/octave/doc/interpreter/Executable-Octave-Programs.html).
As a simple example consider this:
Create a file (for example "al.oct") with the following contents:
#!/bin/octave -qf (Please note, in my installation i had to use "#!/etc/alternatives/octave -qf")
Q = fread(stdin); #Standard Octave / MATLAB code from here on
disp(Q);
Now from the command prompt issue the following command:
chmod +x al.oct
al.oct is now an executable...You can execute it with "./al.oct". To see where the stdin,stdout fits in (so that you can use it with Hadoop) you can try this:
>>cat al.oct|./al.oct|sort
Or in other words..."cat" the file al.oct, pipe its output to the executable script al.oct and then pipe the output of al.oct to the sort utility (this is just an example,we could have "cat" any file, but since we know that al.oct is a simple text file we just use this).
It could be of course that Octave does not support everything your MATLAB code is trying to call, but this could be an alternative way to using Hadoop Streaming without losing the convenience / power of higher level code.
Does not the nature of the algorithm to be converted matter? If the MATLAB/Octave code is tightly coupled, spreading it out over a map-reduced may yield horrible behavior.
With respect to your first option: The Matlab Coder now supports many image processing functions (partly via system objects) to automatically generate C-code of your algorithm, which is basically platform independent and needs no runtime environment. From my experience this code is about a factor 2..3 slower than "hand-coded" OpenCV (strongly depends on your algorithm and cpu).
The main drawback is, you need a Matlab Coder license ($$$).
Most of the answers here seem to be pre MATLAB R2014b.
In R2014b, MATLAB allows mapreduce from within MATLAB and integration with Hadoop.
I cannot be certain about your specific use case but you may want to check:
http://www.mathworks.com/help/matlab/mapreduce.html
http://www.mathworks.com/discovery/matlab-mapreduce-hadoop.html

Categories