Limit resource utilization of JNA calls without changing dll - java

How can you prevent a JNA method-call from exceeding thresholds for CPU utilization, thread-counts, and memory limits?
Background:
I'm working on a safety critical application and one of the non-safety-critical features requires the use of a library written in C. The dlls have been given to me as a black-box and there's no chance that I'll get access to the source code beyond the java interface files. Is there a way to limit the CPU usage, thread-count, and memory used by the JNA code?

See ulimit and sysctl, which are applicable to your overall JVM process (or any other process, for that matter).
It's not readily possible to segment parts of your JVM which are making native accesses via JNA from those that aren't, though.
You should run some profiling while you exercise your shared library to figure out what resources it does use, so you can focus on setting limits around those (lsof or strace would be used on linux, I'm not sure of the equivalent on windows).

For most operating systems you must either call your C code from a new thread or new process. I would recommend calling it from a new process as then you can sandbox it easier and deeper. Typically on a Unix like system one switches to a new user set aside for the service and that has user resource limits on it. However, on Linux one can use user namespaces and cgroups for more dynamic and flexible sandboxing. On Microsoft Windows one typically uses Job objects for resource sandboxing but permissions based sandboxing is more complicated (a lot of Windows is easily sandboxable with access controls but the GUI and window messaging parts make things complicated and annoying).

Related

Limiting memory usage of a Java/Rhino/Nashorn object

I'm extending a server application written in Java to allow user-defined callbacks (written in Javascript) to be run in response to requests. I've done some reading, and while it seems possible to disable Java classes in Nashorn, there is nothing stopping a user from creating Javascript code that allocates an enormous array without using any Java APIs. I'm wondering if there is any way to restrict this, either proactively or reactively.
The solution I came up with is to have a process pool of JVMs with small max heap sizes, which are responsible for running the user-defined code. There will be a worker pool manager to spawn new processes when needed. This way, the main process, as well as other user-defined code, will not be affected by a single malicious user. While this solution will likely work, it seems heavy-handed. Is there no better solution for preventing malicious users from using too much memory?
I'm not particularly set on Javascript, so if there exists any other scripting language that can be run within a JVM and also has support for memory usage limits, I would be open to using it instead of Nashorn. Unfortunately, it seems like Jython, JRuby, and LuaJava all don't have what I'm looking for. Thanks in advance.

Executing Java byte-code in a very restricted part of a running JVM

Is there a way to run some java byte-code into a specially restricted part of a running JVM ? I'm thinking about access to very little ram (a few tens of kilobytes perhaps) and no access to the external world whatsoever (apart from that ram).
The goal would be to execute some user provided byte-code into this safe environment in a way that the host cannot ever crash or leak information from the execution of rogue byte-code.
You can run untrusted bytecodes within a security sandbox, and setup the sandbox so that there is no possibility of communicating with the outside world. This is what a browser-resident JVM does when you run an untrusted applet ... except that you need the sandbox restrictions to be tighter. (An applet sandbox doesn't block ALL network connections.)
Reference: How do I create a Java sandbox?
However, it is NOT POSSIBLE to entirely control what the rogue code does. For example, if it decides to go into an infinite loop or allocate a huge data structure, the trusted part of your JVM has no bomb-proof way of stopping it. And if there is a security flaw in the JVM, class libraries or your sandbox, then there's a chance that the rogue code could exploit it.
Note that none of this involves restricting the code to a particular area of RAM. You can't do that in Java.
You could use JavaPathfinder (JPF) for this type of exercise. JPF is a model checking tool that takes a source-code/byte-code and executes it in its own virtual machine, you can define various properties (deadlock-free, infinite loops, etc.) to check for.
JPF operates as a self-standing tool so it would be hard to integrate it in your application but perhaps you could call it externally and then just query for results.

How to make full use of multiple processors?

I am doing web crawling on a server with 32 virtual processors using Java. How can I make full of these processors? I've seen some suggestions on multi-threaded programming, but I wonder how that could ensure all processors would be taken advantage of since we can do multi-threaded programming on single processor machine as well.
There is no simple answer to this ... except the way to ensure all processors are used is to use multi-threading the right way. (Note: that is a circular answer!)
Basically, the way to get effective use of multiple processors is to:
ensure that there is work that can be done in parallel, and
reduce / eliminate contention points that force one thread to wait while another thread does something.
This is difficult enough when you are doing simple computation. For a web crawler, you've got the additional problems that the threads will be competing for network and (possibly) remove server bandwidth, and they will typically be attempting to put their results into a shared data structure or database.
That's about all that can be said at this level of generality ...
And as #veer correctly points, you can't "ensure" it.
... but using a load of threads will surely be quicker wall-time-wise because all the miserable network latency will happen in parallel ...
Actually, if you go overboard, a load of threads can reduce throughput because of contention. Just throwing lots of threads at the problem is rarely a good idea.
A computer or a program is only as fast as the slowest link in its processing chain. Just increasing the CPU capacity is not going to ensure a drastic performance peak. Leaving aside other issues like your cache-size, RAM, etc., there are two basic kinds of approach to your question about how to take advantage of all your processors:
[1] Using a Jit/just-in-time compiler/interpreter technology such as Java/.NET. I don't know much about java, but the .NET jitter is definitely designed to take advantage of all the available processors on the mahcine. In fact, this very feature makes a jitter stand out against other static language compilers like C/C++, because the jitter "knows" that it is sitting on 32 processors, it is in a much better position to take advantage of them than a program statically compiled on any other machine. (provided you have written a robust multi-threading code for it!)
[2] Programming in C/C++. This is the classic approach. If you compile your code on the same machine with 32 CPUs, and take proper care in your program such as memory-management, handling pointers, etc. the C/C++ program will be the most optimal and will perform better than its CLR/JVM counterpart (as it runs without the extra overhead of a garbage-collector or a VM).
But keep in mind that writing robust code is much easier in .NET/Java than C/C++. So, if you are not a "hard-core" programmer, I would suggest going with the former approach. Also remember to handle your multiple threads with care, such as locking variables when multiple threads try to change the same variables. However, excessive locking might make your code hang, if a variable behaves unexpectedly.
Processor management is implemented in native through the Virtual machine you are using i.e., JVM. You can have a look here Java Hotspot VM Options to optimize your machine if you are using Java Hotspot VM. If you are using a third party VM then your provider may help you with tuning it for your requirements.
Application performance in design practically depends on you.
If you would like to monitor your threads and memory usage to optimize your application, you can use any VM monitoring tools available to date. The Java virtual machine (JVM) has built-in instrumentation that enables you to monitor and manage it using JMX.
For details you can check Platform Monitoring and management using JMX. For third party VMs you have to contact the vendor I guess.

Performance Wise, Python VS JAVA For File Based Processing

I need to create daemon that will monitor certain directory and will process every file that's written to that particular path.
My choice is either java or python.
Did you guys have any experience using both technology? what is the best one?
EDIT 1: files that will be processed is simple text file (one line with tab separated fields).
I just need to move it to buffer and send to further to my php file.
EDIT 2: It's for freebsd server
Performance-wise, for an I/O - syscall bound task such as you're mentioning, it's going to be a wash, most likely, depending a bit on the platform. Java tends to have better CPU usage (partly because a JVM can effectively use multiple cores on a multicore CPU on different threads, with CPython having problems with that; partly because of strong JIT abilities), but typically pays for them with higher RAM footprints (no big deal if you have 64GB of RAM laying around and not much else to do on the machine, say, but often an issue in other circumstances).
If you specify the platform (Linux vs Windows vs ...), we might be able to offer more help.
Edit: with processing required as light as the OP's mentioned in the Q's edit, there's really nothing either way in the CPU-load part of the task. Unfortunately I don't know what freebsd offers for "directory watching" (like Linux's inotify, etc).

Direct memory access to the network card in java

Some modern network cards support Direct Memory Access for improved performance. How can I utilize this feature from Java?
Does the JVM provide this automatically, or do I need to do an allocateDirect on the ByteBuffers that I am using to talk to that NIC?
Does anyone have documentation that discusses this?
It is the operating systems task to use the DMA feature of the network card. The JVM does not really care how the OS does it, and simply uses the operating system's functions for talking to "network interfaces".
You cannot do this from inside Java in the typical desktop/server JVMs, as this is operating system area which requires you to reach out into C code. Go have a look on JNI or JNA to see how to do this. Please note that this may make your application brittle if you do not get this exactly right.
Yeah - ankon's answer is right. Java operates in a sandbox - a virtual machine (hence the, "VM" in JVM; Sun actually built ONE physical version -- it's on display somewhere).
Java was never designed (intentionally) to reach outside the sandbox, unlike ActiveX, which can go just about anywhere on a PC.
Just think of all the bad things ActiveX has done over the years via a browser. You wouldn't want that to happen with Java, would you?
Although...
you might be able to instantiate an object in Java that does have access to the hardware (like one of those ActiveX controls, or some DLL, for example - which you'd have to write, too).
The problem I see is the throughput. With 100MB or 1000MB cards, would a JVM (remember, this is a VM running on an OS, so you're a couple of layers removed from the hardware) have the speed to handle what's coming in under load? Would you want a Java program holding up data in your NIC while it tinkered with it (think of the impact to the rest of the system)?
At this point, you're probably better off writing the hard-working guts of your solution in C. And, if you still need Java to play with that data, put it in a place where Java can get to it.
If you're not getting the network throughput you need in java, then you're going to need to write a C wrapper in order to access it.
Have you benchmarked your code to find where your performance issues really are? If you let us know that we can likely help you out without resorting to JNI.

Categories