I am designing an architecture for a simple cloud based app that compiles and runs java programs.
I am planning to expose services via SOAP through which a client can provide source to the server.
The server would respond with Log messages like compilation failures etc or success messages. Basically the console output.
Questions
From an architecture point of view what are the security considerations that I should be aware of and what would be the right way to do it?
What other validations should I put in before code compilations and what other things do you see that need to be taken care while implementing this in the cloud?
Are there any open source APIs that might already take care of above things?
I have come across javax.tools.JavaCompiler and other utilities that should do the job.
Code compilation isn't too dangerous because the only program that is running is the compiler. What you should worry about is the actual running of their compiled Java program.
If you haven't already, look into SecurityManager. Basically, you'll want to make your own class extending SecurityManager and then override all those methods. If the program shouldn't be able to do something, you throw a SecurityException. That's pretty much the easiest way to sandbox something in Java. Once you do that, you simply call System.setSecurityManager(mySecurityManager);
Related
I'm interested in using JMX to monitor/configure a simple Java Client/Server application. For example, we would capture any network exceptions that occur in a Java program.
Can MBeans be extended in this way? Or are they limited to more concrete get & set functions?
So far, I've looked in Notifications and Monitor MBeans
Thanks
Well I would say that it's definitely doable. I was using JMX in an Apache Wicket application earlier with custom MBeans. Anyway MBeans is just a wrapper around some logic in your server application. So you can take the data directly from your application.
If you want to take an example how is this done in a working application you might want to checkout this:
https://github.com/apache/wicket/blob/master/wicket-jmx/src/main/java/org/apache/wicket/jmx/wrapper/MarkupSettings.java
The class basically holds a reference to the application and asks for data directly form the server app.
When the server starts up, then it registers all the MBeans through an initializer class:
https://github.com/apache/wicket/blob/master/wicket-jmx/src/main/java/org/apache/wicket/jmx/Initializer.java
Then every time when you take a look in your MBean server you will see the latest up-to-date information coming directly from the app.
There are some caveats though. One caveat is that Java in general doesn't provide any good abstraction to capture all Exceptions of a given type coming from any source of the application. You can register your catch-all exception handler but as far as I can remember it doesn't work perfectly.
What I was doing when I had to do something like this, I was using AspectJ to register an all catch place to handle exceptions. I was using compile time weaving to reduce the performance implication but I am not sure how much does it affect the overall performance (if it affects at all).
¯\_(ツ)_/¯
The other caveat is that JMX connections are usually difficult to set up in an enterprise environment. If you have to log-in through two hops just to arrive to the production servers because there are firewalls everywhere than your monitoring connection will definitely fail and you need to keep buying beer to your sysadmin and convince your manager that this is not imposing any security risk. :)
There is one thing though. You say
to monitor/configure a simple Java Client/Server application
You want to configure / monitor the clients as well? I've never done that. I am not sure that's even possible.
Though this is a question with broader scope, I want to write a Online Test Code for my company where people can be given questions to write code in java/php/c etc and the code run and compiles online. I have seen this happening in site like codeacademy, Udacity etc. Just want to understand the architecture behind it. I have searched a lot along the similiar lines a lot on Google but couldnt find a concrete answer. Though after reading bits and pieces here and there i understood that the code is sent to compiler on server and then the results are sent back. Not sure how exactly that happens. Can somebody point me to a starting point of it.
What you can basically have, according to a MVC pattern applied to a web architecture, is something like this:
A web application client-side, which allows the user to insert some code,
possibly leveraging Javascript for early syntactic check
A server endpoint,
receiving the inserted code as input from the client
The sequence of operations could be:
Server-side, the input is transformed into the appropriate structure for the target programming language, e.g. a Java class or a C module.
Possibly, more context is defined (e.g. a classpath).
Then, if the language is compiled, the compiler is invoked (e.g. javac or gcc). This can happen in several ways, e.g. exec in C or Runtime.getRuntime().exec in Java. Otherwise the code can be deployed on a server or some simulators can be run and passed the code.
Subsequently, the code is executed and output is intercepted (e.g. by directioning the console output to a file or just leveraging the target language infrastructure, like in this example). The execution can happen through command line (e.g. java) or via other tools (e.g. curl for running a deployed php code as it was a client browser accessing it)
Last step for the server is to send back the intercepted output to the client in a readable format, e.g. HTML. As an alternative, if you used Java, you could go for Applet, which doesn't change the basic architecture.
However, more in general, the point is that compilers and interpreters are base software. They are not intended for general users, which can easily live with the Operating System only. Therefore, "on line compiling", at the best of my knowledge, is something different from "posting code, letting it execute on a server, and visualizing the answer". Online compiling would mean distributing the responsibility of compiling across the network, which does make sense, but, in my opinion, it is not meant to use for demonstrative purpose (like you are mentioning).
I used domjudge for my company and customized it for my need.
PHP code is very well written. It is very modular and simple to adapt to your requirements.
I am curious about what automatic methods may be used to determine if a Java app running on a Windows or PC is malware. (I don't really even know what exploits are available to such an app. Is there someplace I can learn about the risks?) If I have the source code, are there specific packages or classes that could be used more harmfully than others? Perhaps they could suggest malware?
Update: Thanks for the replies. I was interested in knowing if this would be possible, and it basically sounds totally infeasible. Good to know.
If it's not even possible to automatically determine whether a program terminates, I don't think you'll get much leverage in automatically determining whether an app does "naughty stuff".
Part of the problem of course is defining what constitutes malware, but the majority is simply that deducing proofs about the behaviour of other programs is surprisingly difficult/impossible. You may have some luck spotting particular patterns, but on the whole you can't be confident (and I suspect it's provably impossible) that you've caught all possible attack vectors.
And in the general sphere, catching 95% of vectors isn't really worthwhile when the attackers simply concentrate on the remaining 5%.
Well, there's always the fundamental philosophical question: what is a malware? It's code that was intended to do damage, or at least code that doesn't do what it claims to. How do you plan to judge intent based on libraries it uses?
Having said that, if you at least roughly know what the program is supposed to do, you can indeed find suspicious packages, things the program wouldn't normally need to access. Like network connections when the program is meant to run as a desktop app. But then the network connection could just be part of an autoupdate feature. (Is autoupdate itself a malware? Sometimes it feels like it is.)
Another indicator is if a program that ostensibly doesn't need any special privileges, refuses to run in a sandbox. And the biggest threat is if it tries to load a native library when it shouldn't need one.
But all these only make sense if you know what the code is supposed to do. An antivirus package might use very similar techniques to viruses, the only difference is what's on the label.
Here is a general outline for how you can bound the possible actions your java application can take. Basically you are testing to see if the java application is 'inert' (can't take harmful actions) and thus it probably not mallware.
This won't necessarily tell you mallware or not, as others have pointed out. The app could still do annoying things like pop-up windows. Perhaps the best indication, is to see if the application is digitally signed by an author you trust; if not -- be afraid.
You can disassemble the class files to determine which Java APIs the application uses; you are looking for points where the java app uses the OS. Since java uses a virtual machine, there are well defined points where a java application could take potentially harmful actions -- these are the 'gateways' to various OS calls (for example opening a socket or reading a file).
Its difficult to enumerate all the APIs, different functions which execute the same OS action should require the same Permission. But java's docs don't provide an exhaustive list.
Does the java app use any native libraries -- if so its a big red flag.
The JVM does not offer the ability to run arbitrary code, or use native system APIs; in particular it does not offer the ability to modify the registry (a typical action of PC mallware). The only way a java application can do this is via native libraries. Typically there is no need for a normal application written in java to use native code (unless it needs to use devices).
Check for System.loadLibrary() or System.load() or Runtime.loadLibrary() or Runtime.load(). This is how the VM loads native libraries.
Does it use the network or file system?
Look for use of java.io, java.net.
Does it make system calls (via Runtime.exec())
You can check for the use of java.lang.Runtime.exec() or ProcessBuilder.exec().
Does it try to control the keyboard / mouse?
You could also run the application in a restricted policy JVM (the instructions/tools for doing this are not as simple as they should be) and see what fails (see Oracle's security tutorial) -- note that disassembly is the only way to be sure, just because the app doesn't do anything harmful once, doesn't mean it won't in the future.
This definitely is not easy, and I was surprised to find how many places one needs to look at (for example several java functions load native libraries, not just one).
can I call Java from Node.js via JNI? Are there any examples?
You should try the node-java npm module which is a well-written wrapper over JNI.
node-jave doesn't appear to (yet) have broad adoption, but playing with it, I've been impressed with how straightforward and robust it has been.
It's as simple as:
var list = java.newInstanceSync("java.util.ArrayList");
list.addSync("item1");
list.addSync("item2");
console.log(list.getSync(1)); // prints "item2"
You can do just about anything with your embedded JVM - create objects, call methods, access fields, etc.
There is a slight impedance mismatch between Node and Java, so if you are going to interact with something complicated, I'd recommend writing most of your interactions in Java and exposing a simpler interface across the Node/Java barrier. It just makes for easier debugging that way.
--- Dave
p.s., RealWorldUseCase(tm): I worked at a place that had a pretty complex (and spaghetti-coded) protocol between multiple browser clients and a Java-based service. I wrote a pretty sweet test-harness which used jsdom to host N simulated browsers and used node-java as a wrapper around the Java service code. It was trivial to shim out the transport interfaces, both in JS for the clients, and in Java for the service, so whenever any of these things sends a message, I capture that and stick it in a queue for probabilistic delivery to the intended target (ie, I virtualized the network). In this way, I could run a full-on simulation of multiple clients interacting with and through a Java service, and run the whole thing inside a single process without any wire communication. And then I could do fun stuff like deliberately reorder message deliveries to make sure the code was resilient to timing bugs. And when a bug was discovered, I had the message orderings logged and could reproduce them to repro the bug. Oh, and the whole thing set up and ran a pretty complex scenario with a few thousand lines of logging and finished in under 1 second per run. 2-weeks well spent. Fun stuff.
RealWorld Use Case #2: selenium-inproc - a module that wraps the SeleniumRC JAR file providing a node interface to browser automation testing w/ Selenium without having to run yet another localhost service.
That looks tricky. Node.JS runs on the Google Chrome JavaScript engine V8. What you will have to do is to create a V8 C++ binding (v8 c++ Crash Course shows an example) that starts a JVM and does all the JNI handling.
I think you might be better off letting a JavaServer and Node.js communicate via the network (someone wrote an example for using RabbitMQ for Java/Node.js message based communication). Here, JSON would be a great data exchange format (if you trust your Java server produces proper JSON you can just eval() it in Node).
I think what you are looking for is a native extension to use as a bridge. Although I don't have an example of what you are saying, I do have an example on how to call a C++ extension using Node.JS
https://github.com/jrgleason/NodeJSArduinoLEDController
I am not aware of all the details of Node.js but i am assuming that your mentioning of JNI is actually the Java Native Interface. One can only use JNI from Java, so imho it does not make sense to access Java from JNI if you are not already in java.
It would appear that this is the wrong approach, and you need to search the Node.js doco for their integration chapter...
I wonder if it is possible at all. but even if it is possible I imagine it is hard to implement and I am certain that nobody has done that yet.
how about using a named pipe to communicate between processes(java and node.js) ?
I'm looking to build a web service that can compile some entered code (probably C/Java) and can run some tests on it. What kind of design should I follow? What compiler can I place on my server to do the job? Recommendations? Pros? Cons?
Kattis uses GCC and the Sun java compiler to compile C/C++/Java. What platforms you intend to support will of course determine what compilers you can use. I think it'll be easier for you if you just go with multiple compilers instead of trying to find one that can compile every language you want to support.
One of the biggest problems will probably be to prevent the submitted code from taking over your host. Java contains built in support for limiting what classes a program can use, but I'm not sure how one would prevent things like forking and creating sockets in C/C++.
You will probably want something like the Go Playground
For Java, see the JavaCompiler.
I provide a little tool called the SSCCE Text Based Compiler that can do this on the client side, and as the docs. note, it requires a Java SDK, not just a JRE.
Pros:
Server-side compilation & running of code sounds funky!
Cons:
A long time ago I also provided a tool to compile code (but not run it) on one of my domains. It turned out that particular types of code could tie the Sun compiler up in knots that would require more than 30 minutes to compile less than 100 lines of code! Denial of Service attack, anyone? Since I did not have the time to implement a solution, I withdrew the tool.
For running the code, you will almost certainly need to implement a comprehensive SecurityManager.
The simplest thing to get going is a web container (like Tomcat or Jetty) where the users are allowed to upload their own JSP-pages.
These are automatically compiled by the web container, and executed, when requested.