How to evaluate user expressions in a sandbox

How to evaluate user expressions in a sandbox - java

I want my app to evaluate an expression from an untrusted user, that I'll be reading from a JSON file. Such as:
value = "(getTime() == 60) AND isFoo('bar')"
I've found many threads about this here on StackOverflow. Usually recommending using Java's own ScriptEngine class, which can read JavaScript. Or recommending the user to either use an existing library such as JEXL, MVEL, or any other from this list:
http://java-source.net/open-source/expression-languages
But they all seem to rely on a trusted user (ex.: a configuration file you write yourself and want to do some scripting in it). But in my case, I want my expression evaluation to run in a secure sandbox. So the user cannot do something as simple as:
value = "while(true)" // or
value = "new java.io.File(\"R:/t.txt\").delete()" // this works on MVEL
And lock up my app, or access unwanted resources.
1) So are any of those existing libraries able to be easily configured so that it can run on a safe box? By 'easily', I mean high level configuration API that would faster for me to use than to write my own expression evaluator. After doing a little bit of my own research, both JEXL and MVEL seem to be out.
2) Or is there an existing expression language that is extremely simple so that it cannot be exploited by an untrusted user? All the ones I found are very complex, and implement things like loops, import statements etc. All I need is to parse math, logic operators and my own defined variables and methods. Anything beyond that is outside of my scope.
3) If the only solution is to write my own expression evaluator, then where can I find some guidance on how to write a consistent security model? I'm new to this, and have no idea of what are the common tricks used for code injection. Which is why I wanted avoid having to write this on my own.

I could recommend embedding Rhino, enabling the user to write javascript. It fits your criteria in (2) perfectly being a java library that enables you to run javascript (or run java from javascript).
You set up a context and the user only has access to what you put in the context or make accessible from it. The javascript expressions can be as simple as the simplest case you show above, or can get as complex as they need to. Embedding Rhino and exposing a limited set of objects was a great way to enable all sorts of user scripting in a past project and that was some years ago, Rhino is quite mature now.
You've also got the advantage that if your problem requires it, you may well be able to set it up so that the same expressions will happily run client or server side.
More information on embedding Rhino to accomplish what you need at http://www.mozilla.org/rhino/tutorial.html#runScript

Related

When do new things get added to the Java API

I have been wondering about this for a while and I can’t really find a clear answer. You see the standard Java API is really big and it includes a lot of different libraries and classes for you to use from GUI design to sending data over the Internet to basic things like sending a String to the console.
It also includes things like reading MIDI generating secure random Strings, things that seem really specific. But at the same time there doesn’t seem to be any standard JSON libraries available while JSON is an universal way of sending data between systems.
So what I want to know is: When does something get added to the Java API? What does something need to be considered to be added to the API?

There is a "framework" that drives how new features "get" into java; to manifest themselves later on as new language elements or libraries.
Enter ... the Java Community Process!
Meaning: this is a forum where people make suggestions; which then get discussed; and at some point are either "added to Java somehow"; or rejected.
And for starters: the JSON-P project about a JSON processing API was/is driven by the jcp, see entry 374.
Finally: but you are correct, not everything that shows up in the "standard library" should be there; whereas other important parts take way too long before people can agree on a proposal. And of course, there is also a long history of evolution.
So: when you could restart Java from scratch; you would organize things in a different way (and to a certain degree, that is what Java9 is trying to enable with the new module concept).

How to implement Java ESAPI for preventing XSS?

I've read a lot of posts that ESAPI for Java can be used to prevent XSS by using Validator & Encoder. By the way, I am using Eclipse. I'm not using Maven nor Spring.
My questions are:
How to implement Java ESAPI for preventing XSS?
Are there other configurations needed aside from adding the ESAPI jar in the Build Path?
Thanks in advance for your answers.

Preventing XSS has some trickery to it. Validator lets you define input characters to accept/reject. But there's also the concept of differing contexts, and that's where the Encoder class comes in. There will be instances in your career where you'll be forced to accept characters as input that can be used to attack a browser.
The basic ESAPI implementation is like this: reject input characters According to whitelists. Use the Encoder implementations according to the output contexts... the trick part comes in when making decisions in regards to "Do I encode for HTML first, or for Javascript first? Either of those can have impacts on your application, and they need to be decided upon based on your application's needs. I've had applications that required users to input valid Javascript for example... and in those contexts, you need to be very careful.
Answering your part 2: Yes. ESAPI as by now you probably know, requires two properties files to be defined... validation.properties, and esapi.properties. You can compile them into the jarfile yourself (which, would require you to learn maven, so probably not...) or to specify at runtime, java locations, using the standard -Dmy.property syntax. The loading exceptions actually guide you to the right path.

JRuby DSL encapsulation, exclude standard library

I'm trying to make a Java program that allows users to do some limited scripting with a Ruby DSL that I've written. The script the user writes is saved to a Proc object in JRuby. The problem arises in that the user can still access methods that are standard to Ruby, such as File.new, or creating classes, or basically messing with other internal logic of the program or computer.
Is there a way to limit the user's script to only the constraints of the DSL, using JRuby or Ruby or even Java? Or at least to remove the user's access to certain classes?

Since you're running under JRuby, you can use a Java security policy (policy file documentation) to prevent users from being able to do things like file or network I/O. Of course, this will keep your code from having those capabilities, too! You can whitelist code by jar URI or by jar signature, so one tactic is to create a "hull" of trusted code that strongly validates its input, package it in its own jar, trust it, and use it exclusively for your own code. Doing this right gets complicated fast (have an extensive test suite!), but it can be done.
To have explicit control over the namespace available to your DSL, you can use BasicObject. It doesn't mix in Kernel or any of the other things available in the standard Ruby namespace. This doesn't give you security, though, because users can still use ::File directly or include ::Kernel to get it all back!

Beginner's guide to writing grammar

The application I am working inputs lot of data from file import and updates the database column accordingly. I need to come up with a custom Rule engine that would process all the input values based on validation and perform transformation of data accordingly. E.x.
One of the fields in our application is Product Name. So one of the rules we need to implement is to convert Product name from lower case to upper case, if the input value from the file is in lower case. Similarly, there are many text/mathematical transformations that need to be done. For these reasons, we need to come up with custom rule engine where we define the rules for each attribute, parse them and then apply the rules.
I do know that ANTLR is one of the parser generators around for Java. I am seeking advice on following queries:
1> General information on working of a parser generator and best practices for implementing grammar.
2> Since I need to design this rule engine completely, can anyone point me to a sample rule engine out there that I can refer to? right from UI to database design. I am using GWT for UI, Java for core logic and oracle for database
3> Are there any other parser generators around for Java
4> Though I do want to follow the path of defining my own grammar and using parser generator to build this rule engine, is there any other approach I should consider?

You might want to consider just using JbossRules (formerly Drools) which is a Java based rules engine. Alternatively, a scripting engine may be another way to implement your rules (e.g. Apache Rhino (Javascript in Java)).
Writing your own in this situation seems like overkill, but it may allow you to provide better security guarantees if end users are going to be creating the rules / scripts.
EDIT to address questions in comments:
I suggest using an existing rules engine (ala JbossRules/Drools) instead of writing your own parser and grammar (for the rule component). Take a look here for instance: Drools.
For specialized logic that rules may need to use (db access or computation libraries) you should write a single Java API used by your rules (so that rules are not deeply accessing your other code since that can lead to bugs if/when you refactor). This advice applies regardless of which rules engine you use (your own or an existing one).
I assume that you already have the data format of your data input files solved and that you are only looking for a solution to the rule format and rule parsing.

There is JavaCC, which is a Parser generator and there is groovy for evaluating rules. If you are going to use a script engine or not depends on the grammar. If the rules can't be expressed in javascript, java, python, etc, and you want to write them in a new language, well then you have to use a parser generator. But you can always do anything you want inside methods that you create and then call them from the rules. The rules will be evaluated by the script engine.

Coding a parser for a domain specific language in Java

We want to design a simple domain specific language for writing test scripts to automatically test a XML-based interface of one of our applications. A sample test would be:
Get an input XML file from network shared folder or subversion repository
Import the XML file using the interface
Check if the import result message was successfull
Export the XML corresponding to the object that was just imported using the interface and check if it correct.
If the domain specific language can be declarative and its statements look as close as my sentences in the sample above as possible, it will be awesome because people won't necessarily have to be programmers to understand/write/maintain the tests. Something like:
newObject = GET FILE "http://svn/repos/template1.xml"
reponseMessage = IMPORT newObject
newObjectID = GET PROPERTY '/object/id/' FROM responseMessage
(..)
But then I'm not sure how to implement a simple parser for that languange in Java. Back in school, 10 years ago, I coded a language parser using Lex and Yacc for the C language. Maybe an approach would be to use some equivalent for Java?
Or, I could give up the idea of having a declarative language and choose an XML-based language instead, which would possibly be easier to create a parser for? What approach would you recommend?

You could try JavaCC or Antlr for creating a parser for your domain specific language. If the editors of that file are not programmers, I would prefer this approach over XML.

Take a look at Xtext - it will take a grammar definition and generate a parser as well as a fully-featured eclipse editor pluging with syntax highlighting and -checking.

ANTLR should suffice
ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages. ANTLR provides excellent support for tree construction, tree walking, translation, error recovery, and error reporting.

Look at Antlr library. You'll have to use EBNF grammatic to describe your language and then use Antlr to make java classes from your grammatic.

Have a look at how Cucumber defines its test cases:
(source: cukes.info)
http://cukes.info/ - can run in JRuby.

Or, I could give up the idea of having a declarative language and
choose an XML-based language instead,
which would possibly be easier to
create a parser for? What approach
would you recommend?
This could be easily done using XML to describe your test scenarios.
< GETFILE object="newObject" file="http://svn/repos/template1.xml"/ >
Since your example of syntax is quite simple, it should also be possible to simply use StringTokenizer to tokenize and parse these kind of scripts.
If you want to introduce more complex expressions or control structures you probably better choose ANTLR

I realize this thread is 3 years old but still feel prompted to offer my take on it. The questioner asked if Java could be used for a DSL to look as closely as possible like
Get an input XML file from network shared folder or subversion repository
Import the XML file using the interface
Check if the import result message was successfull
Export the XML corresponding to the object that was just imported
using the interface and check if it correct.
The answer is yes it can be done, and has been done for similar needs. Many years ago I built a Java DSL framework that - with simple customization - could allow the following syntax to be used for compilable, runnable code:
file InputFile
message Message
get InputFile from http://<....>
import Message from InputFile
if validate Message export Message
else
begin
! Signal an error
end
In the above, the keywords file, message, get, import, validate and export are all custom keywords, each one requiring two simple classes of less than a page of code to implement their compiler and runtime functions. As each piece of functionality is completed it is dropped into the framework, where it is immediately available to do its job.
Note that this is just one possible form; the exact syntax can be freely chosen by the implementor. The system is effectively a DIY high-level assembly language, using pre-written Java classes to perform all the functional blocks, both for compiling and for the runtime. The framework defines where these bits of functionality have to be placed, and provides the necessary abstract classes and interfaces to be implemented.
The system meets the primary need of clarity, where non-programmers can easily see what's happening. Changes can be made quickly and run immediately as compilation is almost instantaneous.
Complete (open) source code is available on request. There's a generic Java version and also one for Android.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.