I'm new to Java, and was told to use the Java Native Interface to run some code I wrote in C.
Now, this might be a stupid question, but what's the point of the JNI ? Can't I simply execute my process from a Java UI program and get its stdout to parse ?
Also, I've read that the use of JNI might cause security issues. Do these issues directly depend on the quality of the invoked code ? Or is this something deeper ?
Thanks.
what's the point of the JNI ?
It enables you to mix C and Java code within the same process.
Can't I simply execute my process from a Java UI program and get its stdout to parse ?
A lot of things that can be achieved by using JNI can also be achieved by using inter-process communication (IPC). However, you'd have to ship all the input data to the other process, and then ship all the results back. This can be pretty expensive, which makes IPC impractical for many situations where JNI can be used (e.g. wrapping existing C libraries).
Also, I've read that the use of JNI might cause security issues. Do these issues directly depend on the quality of the invoked code ? Or is this something deeper ?
The point here is that the JVM does a lot of work to ensure that whatever Java code is thrown at it, things like buffer overruns, stack smashing attacks etc can't occur. For example, it performs bounds checking on all array accesses (which C doesn't).
On the other hand, JNI code is a black box to the JVM. If there's a problem with the C code (e.g. a buffer overrun), all bets are off.
Can't I simply execute my process from a Java UI program and get its stdout to parse ?
Do you think it's always appropriate to start a new process every time you want to execute any native code? Do you really want to be transferring potentially large amounts of data between processes? (Imagine a native image transformation.)
Also, I've read that the use of JNI might cause security issues. Do these issues directly depend on the quality of the invoked code?
Yes. Basically native code has less security sandboxing than Java running in a JVM. If the code has security bugs (e.g. buffer overflows) then clearly that will affect the security of your overall app.
I should say that it's relatively rare for Java developers to need to worry about JNI - I've certainly only touched it a couple of times in my career. You may also want to look at SWIG if the need arises.
Can't I simply execute my process from a Java UI program and get its stdout to parse ?
That would depend on what you are calling.
Note that you cannot just call programs via JNI, but library code.
In addition to that, spawning new processes is relatively expensive and managing multiple processes is complicated.
Related
For example there could be a bit of java byte code mixed together with some C. Jvm will execute java byte code and turns execution over to OS if a C part is hit. Is this technically possible or in practice?
Generally you can write C code which creates JVM, executing (execve) provided bytecode and either run them in separate threads with some IPC between or using JNA/JNI to exchange the data, or make operations and wait for completion.
I met some projects using this approach (for example part of Android system, Cloudera Impala and some others), but the code there is overcomplicated and hardly traceable. For sure it's took too much effort to make it work properly. Sometimes it's better either run 2 processes using different technologies with good IPC with data serialization (thrift, protobuf) or use only one of them.
If you still need to run both, I'd prefer to build a system in Java calling native functions with JNI rather than opposite.
I'm working on a web application but I need to call certain proprietary C++ library functions. As I understand native methods are not thread safe, it is therefore possible that an access Violation in native code can crash application server JVM. (Tomcat). This native API is very small part of the overall web application functionality, I would say only 5% of users will ever access this functionality. No matter how thorough application is tested ( I don't have access to native source code), there is a risk of a potential bug in native library can bring down whole application server logging out users and potentially downtime.
So the question - which strategy is better?
1) Should I wrap native library in a separate process so that main web server is not impacted by a bug in native code. I can probably use UNIX sockets to communicate to this separate process from my web server. ( Avoiding overhead of TCP socket). If this happens fix the problem as quickly as possible and accept downtime for 5% of users.
Or
2) Bite the bullet and continue to use JNI in servlet container. ( With a risk of potential downtime for everyone)
Regards,
Rohit
It depends:
Take into account, that if a function is not thread-safe, that not necessarily means that it will crash if called from multiple-threads. It might simply return completely wrong results.
If your application cannot overcome it somehow, then you have no other options, you need to serialize access to the native code.
If you are sure that the only side-effect of calling the not-thread safe function is that it can crash, then you need to make sure that the crash does not results in other type of errors, like inconsistent data in your application in the back-end (database corruption, etc.). (You may use transactions to prevent this.)
If your application is able to overcome all of the above, then a 3rd piece of information is still needed:
You need to study how much downtime/crash your users tolerate. If they tolerate the possible down-times, then go ahead and do not care about the crashes, you can safely "bite the bullet", because it won't harm your users or your application.
In all other cases you have to serialize access to the native functions.
Wrapping them into a process might be a good idea, but you have to make sure that the function(s) can be run ONLY in one thread at a time. So probably you need to implement some mechanism to make the other threads/servlets wait until one of them finished calling the function(s).
I am curious about what automatic methods may be used to determine if a Java app running on a Windows or PC is malware. (I don't really even know what exploits are available to such an app. Is there someplace I can learn about the risks?) If I have the source code, are there specific packages or classes that could be used more harmfully than others? Perhaps they could suggest malware?
Update: Thanks for the replies. I was interested in knowing if this would be possible, and it basically sounds totally infeasible. Good to know.
If it's not even possible to automatically determine whether a program terminates, I don't think you'll get much leverage in automatically determining whether an app does "naughty stuff".
Part of the problem of course is defining what constitutes malware, but the majority is simply that deducing proofs about the behaviour of other programs is surprisingly difficult/impossible. You may have some luck spotting particular patterns, but on the whole you can't be confident (and I suspect it's provably impossible) that you've caught all possible attack vectors.
And in the general sphere, catching 95% of vectors isn't really worthwhile when the attackers simply concentrate on the remaining 5%.
Well, there's always the fundamental philosophical question: what is a malware? It's code that was intended to do damage, or at least code that doesn't do what it claims to. How do you plan to judge intent based on libraries it uses?
Having said that, if you at least roughly know what the program is supposed to do, you can indeed find suspicious packages, things the program wouldn't normally need to access. Like network connections when the program is meant to run as a desktop app. But then the network connection could just be part of an autoupdate feature. (Is autoupdate itself a malware? Sometimes it feels like it is.)
Another indicator is if a program that ostensibly doesn't need any special privileges, refuses to run in a sandbox. And the biggest threat is if it tries to load a native library when it shouldn't need one.
But all these only make sense if you know what the code is supposed to do. An antivirus package might use very similar techniques to viruses, the only difference is what's on the label.
Here is a general outline for how you can bound the possible actions your java application can take. Basically you are testing to see if the java application is 'inert' (can't take harmful actions) and thus it probably not mallware.
This won't necessarily tell you mallware or not, as others have pointed out. The app could still do annoying things like pop-up windows. Perhaps the best indication, is to see if the application is digitally signed by an author you trust; if not -- be afraid.
You can disassemble the class files to determine which Java APIs the application uses; you are looking for points where the java app uses the OS. Since java uses a virtual machine, there are well defined points where a java application could take potentially harmful actions -- these are the 'gateways' to various OS calls (for example opening a socket or reading a file).
Its difficult to enumerate all the APIs, different functions which execute the same OS action should require the same Permission. But java's docs don't provide an exhaustive list.
Does the java app use any native libraries -- if so its a big red flag.
The JVM does not offer the ability to run arbitrary code, or use native system APIs; in particular it does not offer the ability to modify the registry (a typical action of PC mallware). The only way a java application can do this is via native libraries. Typically there is no need for a normal application written in java to use native code (unless it needs to use devices).
Check for System.loadLibrary() or System.load() or Runtime.loadLibrary() or Runtime.load(). This is how the VM loads native libraries.
Does it use the network or file system?
Look for use of java.io, java.net.
Does it make system calls (via Runtime.exec())
You can check for the use of java.lang.Runtime.exec() or ProcessBuilder.exec().
Does it try to control the keyboard / mouse?
You could also run the application in a restricted policy JVM (the instructions/tools for doing this are not as simple as they should be) and see what fails (see Oracle's security tutorial) -- note that disassembly is the only way to be sure, just because the app doesn't do anything harmful once, doesn't mean it won't in the future.
This definitely is not easy, and I was surprised to find how many places one needs to look at (for example several java functions load native libraries, not just one).
Some modern network cards support Direct Memory Access for improved performance. How can I utilize this feature from Java?
Does the JVM provide this automatically, or do I need to do an allocateDirect on the ByteBuffers that I am using to talk to that NIC?
Does anyone have documentation that discusses this?
It is the operating systems task to use the DMA feature of the network card. The JVM does not really care how the OS does it, and simply uses the operating system's functions for talking to "network interfaces".
You cannot do this from inside Java in the typical desktop/server JVMs, as this is operating system area which requires you to reach out into C code. Go have a look on JNI or JNA to see how to do this. Please note that this may make your application brittle if you do not get this exactly right.
Yeah - ankon's answer is right. Java operates in a sandbox - a virtual machine (hence the, "VM" in JVM; Sun actually built ONE physical version -- it's on display somewhere).
Java was never designed (intentionally) to reach outside the sandbox, unlike ActiveX, which can go just about anywhere on a PC.
Just think of all the bad things ActiveX has done over the years via a browser. You wouldn't want that to happen with Java, would you?
Although...
you might be able to instantiate an object in Java that does have access to the hardware (like one of those ActiveX controls, or some DLL, for example - which you'd have to write, too).
The problem I see is the throughput. With 100MB or 1000MB cards, would a JVM (remember, this is a VM running on an OS, so you're a couple of layers removed from the hardware) have the speed to handle what's coming in under load? Would you want a Java program holding up data in your NIC while it tinkered with it (think of the impact to the rest of the system)?
At this point, you're probably better off writing the hard-working guts of your solution in C. And, if you still need Java to play with that data, put it in a place where Java can get to it.
If you're not getting the network throughput you need in java, then you're going to need to write a C wrapper in order to access it.
Have you benchmarked your code to find where your performance issues really are? If you let us know that we can likely help you out without resorting to JNI.
I need to invoke tesseract OCR (its an open source library in C++ that does Optical Character Recognition) from a Java Application Server. Right now its easy enough to run the executable using Runtime.exec(). The basic logic would be
Save image that is currently held in memory to file (a .tif)
pass in the image file name to the tesseract command line program.
read in the output text file from Java using FileReader.
How much improvement in terms of performance am I likely to get by writing a JNI wrapper for Tesseract? Unfortunately there is not an open source JNI wrapper that works in Linux. I would have to do it myself and am wondering about whether the benefit is worth the development cost.
It's hard to say whether it would be worth it. If you assume that if done in-process via JNI, the OCR code can directly access the image data without having to write it to a file, then it would certainly eliminate any disk I/O constraints there.
I'd recommend going with the simpler approach and only undertaking the JNI option if performance is not acceptable. At least then you'll be able to do some benchmarking and estimate the performance gains you might be able to realize.
If you do pursue your own wrapper, I recommend you check out JNA. It will allow you to call most "native" libraries writing only Java code, and will give you more help than does raw JNI to do it safely. JNA is available for most platforms.
I'm agree with tweakt. Do not use JNI if there is no perfomance reasons to do this. Your application stability is also could be in danger if you use JNI calls if there will be some possibilities of memory leaks or even crashes in your JNI layer or in OCR itself. This will never happen if you use it via command line interface (All memory will be released at the program exit and all abnormal program terminations can be checked in the caller code).