I have a matlab code that processes images. I want to create a Hadoop mapper that uses that code. I came across the following solutions but not sure which one is best (as it is very difficult to install matlab compiler runtime on each slave node in hadoop for me):
Manually convert that matlab code into OpenCV in C++ and call its exe/dll (and supply it appropriate parameters) from the mapper. Not sure, since the cluster has Linux installed on every node instead of Windows.
Use Hadoop Streaming. But Hadoop streaming requires an executable as the mapper and the executable of matlab also requires Matlab Compiler Runtime which is very difficult to install on every slave node.
Convert it automatically into C/C++ code and create its exe automatically (not sure whether this is right because either the exe will require the matlab runtime to run or there can be compiler issues in the conversion which are very difficult to fix )
Use Matlab Java Builder. But the jar file thus created will need the runtime too.
Any suggestions?
Thanks in advance.
As you are probably already suspecting, this is going to be inherently difficult to do because of the runtime requirement for MATLAB. I had a similar experience (having to distribute the runtime libraries) when attempting to run MATLAB code over Condor.
As far as the options you are listing are concerned, option #1 will work best. Also, you will probably not be available to avoid working with Linux.
However, if you don't want to lose the convenience provided by higher level software (such as MATLAB, Octave, Scilab and others) you could try Hadoop streaming in combination with Octave executable scripts.
Hadoop streaming does not care about the nature of the executable (whether it is an executable script or an executable file, according to this (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html)).
All it requires, is that it is given an "executable" that in addition can a) read from stdin, b) send output to stdout.
GNU Octave programs can be turned into executable scripts (in Linux) with the ability to read from stdin and send the output to stdout (http://www.gnu.org/software/octave/doc/interpreter/Executable-Octave-Programs.html).
As a simple example consider this:
Create a file (for example "al.oct") with the following contents:
#!/bin/octave -qf (Please note, in my installation i had to use "#!/etc/alternatives/octave -qf")
Q = fread(stdin); #Standard Octave / MATLAB code from here on
disp(Q);
Now from the command prompt issue the following command:
chmod +x al.oct
al.oct is now an executable...You can execute it with "./al.oct". To see where the stdin,stdout fits in (so that you can use it with Hadoop) you can try this:
>>cat al.oct|./al.oct|sort
Or in other words..."cat" the file al.oct, pipe its output to the executable script al.oct and then pipe the output of al.oct to the sort utility (this is just an example,we could have "cat" any file, but since we know that al.oct is a simple text file we just use this).
It could be of course that Octave does not support everything your MATLAB code is trying to call, but this could be an alternative way to using Hadoop Streaming without losing the convenience / power of higher level code.
Does not the nature of the algorithm to be converted matter? If the MATLAB/Octave code is tightly coupled, spreading it out over a map-reduced may yield horrible behavior.
With respect to your first option: The Matlab Coder now supports many image processing functions (partly via system objects) to automatically generate C-code of your algorithm, which is basically platform independent and needs no runtime environment. From my experience this code is about a factor 2..3 slower than "hand-coded" OpenCV (strongly depends on your algorithm and cpu).
The main drawback is, you need a Matlab Coder license ($$$).
Most of the answers here seem to be pre MATLAB R2014b.
In R2014b, MATLAB allows mapreduce from within MATLAB and integration with Hadoop.
I cannot be certain about your specific use case but you may want to check:
http://www.mathworks.com/help/matlab/mapreduce.html
http://www.mathworks.com/discovery/matlab-mapreduce-hadoop.html
Related
Suppose i want to search some text in a file. I want to know when we should use system utilities/programs like grep and when we should use Java API's like reading a line, and then search the text in that line or use java Scanner class.
I want to understand the trade-offs between the two approaches. I mean, suppose if we use grep, then will there be communication overhead between JVM and the grep process? Is creation of a new OS process for grep an overhead?
Does grep performs better than normal java file search?
Please help...
Yes, there will be an overhead. Starting an external process and communicating with it is costly. And moreover, many systems don't have a grep command. If you want to make your Java code portable, don't rely on OS-specific commands.
Another problem is that OS commands will be able to search (for example) in files, but not in your in-memory data structures.
You're basically trading off system independence for the perceived gain of the tool, in some cases this can't be avoid.
Not every system will have the tools you want installed, in the locations you think they should be or the version you need.
Even if you can deploy the tools with your application, you will need to provide an implementation for each of your targeted platforms.
Sure, it's easy to say "it will never be run on X", but can never come around real quick ;)
There is also the added over head of executing and managing the IO of the external applications, while not difficult, it's much more complex that a well written Java API.
As I said, sometimes, you simply don't have a choice (I have some media inspection tools that I use on Windows and Mac that I'm not about to try and implement in Java, not because it can't be, but because it's complex and time consuming and somebody has already done it (with a native program)).
You need to balance the choice over what the benefits are of the external command weight against the issues of using it. You should also investigate if a API has already begin developed that might solve the problem at hand.
IMHO
i need to perform some operations on files - rename, delete and etc.
what is better? use cmd commands or use java.io.File methods?
thanks.
Normally it's not a good idea to depend on OS specific things in a platform independent environment, not mentioning the speed which would be much slower with the local commands.
I would stick with the Java implementations, if it's possible with them.
After lots of comments I catch the real question. So my answer is:
Of course it is better to use java feauters, because:
If you wan't your program to be portable you can't use command line, because it wouldn't work on unix or other systems expect windows. Also can't work on some other versions of windows.
To use command line feautures from your code you should creat new process end execute this commands, which is very slow.
Instead of using a .bat file, how the code can be built for java program for compiling and executing a list of java programs.
I strongly recommend to use an existing build tool like Ant or Maven1 for this. These tools exist for years, have been widely used, tested, they are the way to go. Just do not reinvent the wheel.
1Just in case you wanted to know, internally, these tools use the old and undocumented com.sun.tools.javac.Main class from tools.jar to programmatically invoke javac
On Runtime.exec
Though perhaps not the most ideal solution, you an execute a shell command as a separate Process using Runtime.getRuntime().exec(someCommand). There are also overloads that takes parameters as a String[].
This is not an easy solution. Managing a concurrent Process and preventing a deadlock etc is not trivial.
Related questions
Is java Runtime.exec(String[]) platform independent?
What is the purpose of Process class in Java?
How can I compile and deploy a java class at runtime?
Compiling a class using Java code using process
Java: Executing a Java application in a separate process
Running a program from within Java code..
On draining Process streams
Generally you can't just waitFor() a Process to terminate; you must also drain its I/O streams to prevent deadlock.
From the API:
Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.
Related questions
Draining Standard Error in Java
On the Java 6 Compiler API
One option to compiling a Java source code within Java is to use the Java 6 Compiler API. This requires a JDK to be installed (not just a JRE).
See also
interface JavaCompiler from package javax.tools
external tutorial article
Related questions
Null Pointer Exception while using Java Compiler API
The java.lang.Runtime class has a method allowing you to execute arbitrary shell commands. So it should look something like this :
List<String> commandsToExecute = ...
for (String cmd : commandsToExecute) {
Process p = Runtime.getRuntime().exec (cmd);
p.waitFor(); // If you need to run them all sequentially.
}
There are several other versions of the Runtime.exec() method which are all described in the documentation.
Another issue with using Runtime.getRuntime().exec(someCommand) is that you need to read both the Output stream and Error streams from the spawn process otherwise your process will hang.
There is a limited amount of buffer available for both streams and once they fill the program will wait for you to read from them and be unable to continue. These two buffers must be read in their own separate threads so that one will not deadlock the other.
You can use ANT. instead of running the ANT from Eclipse or whatever, you can also run it from command. This means you can create a java program that executes commands -> ergo that executes ant with parameters.
These parameters can be derived from variables from the list of applications you want to build.
It doesn't directly answer the question, but some librairies can help to use the "Runtime.exec()" method (consuming I/O streams, etc.), to invoke "javac". For example this one, named "Shell" (french article where the library can be downloaded at the end).
I've been learning compiler theory and assembly and have managed to create a compiler that generates x86 assembly code.
How can I take this assembly code and turn it into a .exe? Is there some magical API or tool I have to interact with? Or is it simpler than I think?
I'm not really sure what's in a .exe, or how much abstraction lies between assembly code and the .exe itself.
My 'compiler' was written in Java, but I'd like to know how to do this in C++ as well.
Note that if I take the generated assembly, it compiles to a .exe just fine for example with vc++.
Edit: To be more precise, I already know how to compile assembly code using a compiler. What I'm wanting is to have my program to basically output a .exe.
It looks like you need to spawn an assembler process and a linker process. On UNIX, it's as simple as invoking the fork() function, which would create a new process, and the exec() function, specifying the assembler and the linker executable names as the function's parameter, with suitable arguments to those executables, which would be the names of your generated assembly and object files. That's all you'd need to do on a UNIX system.
Normally you use an assembler and a linker to create an exe file. There is no magic involved. The different parts are assembled, a header and other boilerplate is added so the OS knows where the bootstrap code is located for the program and to organize the memory.
The VC++ compiler does that under the hood. You might consider playing a bit on Linux as you can better see the machinery working on Unix platforms. It is fundamentally the same, but it s just difficult to look through to UI on windows.
In principle, each assembly line corresponds to a machine instruction which is just a few bytes in the exe file. You can find them in the specs for the processor. So you can write your own asembeler if you know the codes and the exe format (header, relocation info etc).
How Do Assemblers Map x86 Instruction Mnemonics to Binary Machine Instructions?
What options / methods / software are available to convert a JAR file to a managed .NET assembly?
Please provide all commercial and non-commercial methods in the answer.
These don't include solutions which require Java to be installed on the host machine.
I could be wrong, but I'm pretty sure that's impossible. The java byte code is different to the code produced to run on the CLR.
Snarky answer: Get the source code, and port it.
EDIT: A little poking comes up with http://sourceforge.net/projects/ikvm/, a Java Virtual Machine implementation for .NET. Not quite what you asked for, but it's probably going to be the best you can do.
Confronted with this situation last year, I wrote a small wrapper (in java) that read the inputs from a temp file, invoked the jar and placed the output in anther temp file. The .NET project would create the input file, call the JVM and start the wrapper, wait for it to finish and read the output file. Quick and Dirty. at least in my case