Limiting memory usage of a Java/Rhino/Nashorn object

Limiting memory usage of a Java/Rhino/Nashorn object - java

I'm extending a server application written in Java to allow user-defined callbacks (written in Javascript) to be run in response to requests. I've done some reading, and while it seems possible to disable Java classes in Nashorn, there is nothing stopping a user from creating Javascript code that allocates an enormous array without using any Java APIs. I'm wondering if there is any way to restrict this, either proactively or reactively.
The solution I came up with is to have a process pool of JVMs with small max heap sizes, which are responsible for running the user-defined code. There will be a worker pool manager to spawn new processes when needed. This way, the main process, as well as other user-defined code, will not be affected by a single malicious user. While this solution will likely work, it seems heavy-handed. Is there no better solution for preventing malicious users from using too much memory?
I'm not particularly set on Javascript, so if there exists any other scripting language that can be run within a JVM and also has support for memory usage limits, I would be open to using it instead of Nashorn. Unfortunately, it seems like Jython, JRuby, and LuaJava all don't have what I'm looking for. Thanks in advance.

Related

Use JNI library in Servlet container

I'm working on a web application but I need to call certain proprietary C++ library functions. As I understand native methods are not thread safe, it is therefore possible that an access Violation in native code can crash application server JVM. (Tomcat). This native API is very small part of the overall web application functionality, I would say only 5% of users will ever access this functionality. No matter how thorough application is tested ( I don't have access to native source code), there is a risk of a potential bug in native library can bring down whole application server logging out users and potentially downtime.
So the question - which strategy is better?
1) Should I wrap native library in a separate process so that main web server is not impacted by a bug in native code. I can probably use UNIX sockets to communicate to this separate process from my web server. ( Avoiding overhead of TCP socket). If this happens fix the problem as quickly as possible and accept downtime for 5% of users.
Or
2) Bite the bullet and continue to use JNI in servlet container. ( With a risk of potential downtime for everyone)
Regards,
Rohit

It depends:
Take into account, that if a function is not thread-safe, that not necessarily means that it will crash if called from multiple-threads. It might simply return completely wrong results.
If your application cannot overcome it somehow, then you have no other options, you need to serialize access to the native code.
If you are sure that the only side-effect of calling the not-thread safe function is that it can crash, then you need to make sure that the crash does not results in other type of errors, like inconsistent data in your application in the back-end (database corruption, etc.). (You may use transactions to prevent this.)
If your application is able to overcome all of the above, then a 3rd piece of information is still needed:
You need to study how much downtime/crash your users tolerate. If they tolerate the possible down-times, then go ahead and do not care about the crashes, you can safely "bite the bullet", because it won't harm your users or your application.
In all other cases you have to serialize access to the native functions.
Wrapping them into a process might be a good idea, but you have to make sure that the function(s) can be run ONLY in one thread at a time. So probably you need to implement some mechanism to make the other threads/servlets wait until one of them finished calling the function(s).

Limit resource utilization of JNA calls without changing dll

How can you prevent a JNA method-call from exceeding thresholds for CPU utilization, thread-counts, and memory limits?
Background:
I'm working on a safety critical application and one of the non-safety-critical features requires the use of a library written in C. The dlls have been given to me as a black-box and there's no chance that I'll get access to the source code beyond the java interface files. Is there a way to limit the CPU usage, thread-count, and memory used by the JNA code?

See ulimit and sysctl, which are applicable to your overall JVM process (or any other process, for that matter).
It's not readily possible to segment parts of your JVM which are making native accesses via JNA from those that aren't, though.
You should run some profiling while you exercise your shared library to figure out what resources it does use, so you can focus on setting limits around those (lsof or strace would be used on linux, I'm not sure of the equivalent on windows).

For most operating systems you must either call your C code from a new thread or new process. I would recommend calling it from a new process as then you can sandbox it easier and deeper. Typically on a Unix like system one switches to a new user set aside for the service and that has user resource limits on it. However, on Linux one can use user namespaces and cgroups for more dynamic and flexible sandboxing. On Microsoft Windows one typically uses Job objects for resource sandboxing but permissions based sandboxing is more complicated (a lot of Windows is easily sandboxable with access controls but the GUI and window messaging parts make things complicated and annoying).

How to make full use of multiple processors?

I am doing web crawling on a server with 32 virtual processors using Java. How can I make full of these processors? I've seen some suggestions on multi-threaded programming, but I wonder how that could ensure all processors would be taken advantage of since we can do multi-threaded programming on single processor machine as well.

There is no simple answer to this ... except the way to ensure all processors are used is to use multi-threading the right way. (Note: that is a circular answer!)
Basically, the way to get effective use of multiple processors is to:
ensure that there is work that can be done in parallel, and
reduce / eliminate contention points that force one thread to wait while another thread does something.
This is difficult enough when you are doing simple computation. For a web crawler, you've got the additional problems that the threads will be competing for network and (possibly) remove server bandwidth, and they will typically be attempting to put their results into a shared data structure or database.
That's about all that can be said at this level of generality ...
And as #veer correctly points, you can't "ensure" it.
... but using a load of threads will surely be quicker wall-time-wise because all the miserable network latency will happen in parallel ...
Actually, if you go overboard, a load of threads can reduce throughput because of contention. Just throwing lots of threads at the problem is rarely a good idea.

A computer or a program is only as fast as the slowest link in its processing chain. Just increasing the CPU capacity is not going to ensure a drastic performance peak. Leaving aside other issues like your cache-size, RAM, etc., there are two basic kinds of approach to your question about how to take advantage of all your processors:
[1] Using a Jit/just-in-time compiler/interpreter technology such as Java/.NET. I don't know much about java, but the .NET jitter is definitely designed to take advantage of all the available processors on the mahcine. In fact, this very feature makes a jitter stand out against other static language compilers like C/C++, because the jitter "knows" that it is sitting on 32 processors, it is in a much better position to take advantage of them than a program statically compiled on any other machine. (provided you have written a robust multi-threading code for it!)
[2] Programming in C/C++. This is the classic approach. If you compile your code on the same machine with 32 CPUs, and take proper care in your program such as memory-management, handling pointers, etc. the C/C++ program will be the most optimal and will perform better than its CLR/JVM counterpart (as it runs without the extra overhead of a garbage-collector or a VM).
But keep in mind that writing robust code is much easier in .NET/Java than C/C++. So, if you are not a "hard-core" programmer, I would suggest going with the former approach. Also remember to handle your multiple threads with care, such as locking variables when multiple threads try to change the same variables. However, excessive locking might make your code hang, if a variable behaves unexpectedly.

Processor management is implemented in native through the Virtual machine you are using i.e., JVM. You can have a look here Java Hotspot VM Options to optimize your machine if you are using Java Hotspot VM. If you are using a third party VM then your provider may help you with tuning it for your requirements.
Application performance in design practically depends on you.
If you would like to monitor your threads and memory usage to optimize your application, you can use any VM monitoring tools available to date. The Java virtual machine (JVM) has built-in instrumentation that enables you to monitor and manage it using JMX.
For details you can check Platform Monitoring and management using JMX. For third party VMs you have to contact the vendor I guess.

Direct memory access to the network card in java

Some modern network cards support Direct Memory Access for improved performance. How can I utilize this feature from Java?
Does the JVM provide this automatically, or do I need to do an allocateDirect on the ByteBuffers that I am using to talk to that NIC?
Does anyone have documentation that discusses this?

It is the operating systems task to use the DMA feature of the network card. The JVM does not really care how the OS does it, and simply uses the operating system's functions for talking to "network interfaces".

You cannot do this from inside Java in the typical desktop/server JVMs, as this is operating system area which requires you to reach out into C code. Go have a look on JNI or JNA to see how to do this. Please note that this may make your application brittle if you do not get this exactly right.

Yeah - ankon's answer is right. Java operates in a sandbox - a virtual machine (hence the, "VM" in JVM; Sun actually built ONE physical version -- it's on display somewhere).
Java was never designed (intentionally) to reach outside the sandbox, unlike ActiveX, which can go just about anywhere on a PC.
Just think of all the bad things ActiveX has done over the years via a browser. You wouldn't want that to happen with Java, would you?

Although...
you might be able to instantiate an object in Java that does have access to the hardware (like one of those ActiveX controls, or some DLL, for example - which you'd have to write, too).
The problem I see is the throughput. With 100MB or 1000MB cards, would a JVM (remember, this is a VM running on an OS, so you're a couple of layers removed from the hardware) have the speed to handle what's coming in under load? Would you want a Java program holding up data in your NIC while it tinkered with it (think of the impact to the rest of the system)?
At this point, you're probably better off writing the hard-working guts of your solution in C. And, if you still need Java to play with that data, put it in a place where Java can get to it.

If you're not getting the network throughput you need in java, then you're going to need to write a C wrapper in order to access it.
Have you benchmarked your code to find where your performance issues really are? If you let us know that we can likely help you out without resorting to JNI.

Excessive memory allocation in Java Sandbox security

Under the Java security model it is possible to block most dangerous actions from untrusted classes, but the last time I checked (a few years ago now) it was still possible for untrusted code to perform a denial of service attack by continually allocating memory until the JVM crashes with an OutOfMemoryException. Looking now, I can't see any improvement in the situation.
I have a requirement to run untrusted code from 3rd parties inside a Java application and I'd like to know if it is possible to somehow restrict the heap/stack space that a class or thread can allocate in the Java security model. Thus preventing memory allocation based DoS attacks. I know about -Xss, but as I understand it that restricts all threads, most of which need no restriction.
I have also considered creating a container for the untrusted code that will run in its own JVM and communicate with the main app through sockets, or doing some static analysis on the untrusted code. However, these both sound like more effort than I hoped, although if someone knows of a trick or opensource library for this I'm interested.
So, is there a way to restrict the amount of memory than a thread can allocate to itself or some other way of preventing memory allocation denial of service attacks in Java?

There is currently no way to do this with standard APIs in Java.
More people have been interested in this and there is a JSR underway for this called Resource Consumption Management API which may be something to look into.

You will need to run the untrusted code in a separate process. There may still be ways to DoS, for instance on old versions of Windows you could easily use up all GDI resources (not tried recently, not now we have Swing).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.