How to defend against xss when saving data and when displaying it - java

Let's say I have a simple CRUD application with a form to add new object and edit an existing one. From a security point of view I want to defend against cross-site scripting. Fist I would validate the input of submitted data on the server. But after that, I would escape the values being displayed in the view because maybe I have more than one application writing in my database (some developer by mistake inserts unvalidated data in the DB in the future). So I will have this jsp:
<%# taglib prefix="esapi" uri="http://www.owasp.org/index.php/Category:OWASP_Enterprise_Security_API" %>
<form ...>
<input name="myField" value="<esapi:encodeForHTMLAttribute>${myField}</esapi:encodeForHTMLAttribute>" />
</form>
<esapi:encodeForHTMLAttribute> does almost the same thing as <c:out>, it HTML escapes sensitive characters like < > " etc
Now, if I load an object that somehow was saved in the database with myfield=abc<def the input will display correctly the value abc<def while the value in the html behind will be abc<def.
The problem is when the user submits this form without changing the values, the server receives the value abc<def instead of what is visible in the page abc<def. So this is not correct. How should I implement the protection in this case?

The problem is when the user submits this form without changing the values, the server receives the value abc<def instead of what is visible in the page abc
Easy. In this case HTML decode the value, and then validate.
Though as noted in a few comments, you should see how we operate with the OWASP ESAPI-Java project. By default we always canonicalize the data which means we run a series of decoders to detect multiple/mixed encoding as well as to create a string safe to validate against with regex.
For the part that really guarantees you protection however, you normally want to have raw text stored on the server--not anything that contains HTML characters, so you may wish to store the unescaped string, if only that you can safely encode it when you send it back to the user.
Encoding is the best protection for XSS, and I would in fact recommend it BEFORE input validation if for some reason you had to choose.
I say may because in general I think its a bad practice to store altered data. It can make troubleshooting a chore. This can be even more complicated if you're using a technology like TinyMCE, a rich-text editor in the browser. It also renders html so its like dealing with a browser within a browser.

Related

Add XSS validations/ sanitize script tags in Java

I have an API written in Java where I am doing POST call but the malicious content is being sent to the API. Ideally, to prevent XSS attack, the API should not accept such data or at least sanitize it before storing/responding to it.
{"first_name":"<script>alert(document.cookie);</script>","last_name":"
<script>alert(document.cookie);</script>"}
I want to add XSS validations/ sanitize script tags in Java to prevent the content from XSS attack. Can anyone suggest the best way to prevent XSS attack in Java? Is there a way to encode and decode the HTML tags shown above?
After going through the different documentation, I found that owasp-java-encoder can be used to encode HTML content and this function can be used to encode HTML Content Context.
<%= Encode.forHtmlContent(UNTRUSTED)
%>
I am looking for something which allows me to encode the HTML content while storing data and decode it while displaying it.
As Stephen P mentions, you should generally be encoding data on output. You want to do this on output to ensure you're using the correct encoding for the output, and to prevent double encoding. The OWASP encoders are a good choice for this. See the OWASP XSS Prevention Cheat Sheet for details on when to use various encoders.
You want to validate/sanitize on input as much as possible, using white list validation if possible. But for free form text you won't, and shouldn't, do XSS encoding at this point.

How to allow specific characters with OWASP HTML Sanitizer?

I am using the OWASP Html Sanitizer to prevent XSS attacks on my web app. For many fields that should be plain text the Sanitizer is doing more than I expect.
For example:
HtmlPolicyBuilder htmlPolicyBuilder = new HtmlPolicyBuilder();
stripAllTagsPolicy = htmlPolicyBuilder.toFactory();
stripAllTagsPolicy.sanitize('a+b'); // return a+b
stripAllTagsPolicy.sanitize('foo#example.com'); // return foo#example.com
When I have fields such as email address that have a + in it such as foo+bar#gmail.com I end up with the wrong data in the the database. So two questions:
Are characters such as + - # dangerous on their own do they really need to be encoded?
How do I configure the OWASP html sanitizer to allow specific characters such as + - #?
Question 2 is the more important one for me to get an answer to.
You may want to use ESAPI API to filter specific characters. Although if you like to allow specific HTML element or attribute you can use following allowElements and allowAttributes.
// Define the policy.
Function<HtmlStreamEventReceiver, HtmlSanitizer.Policy> policy
= new HtmlPolicyBuilder()
.allowElements("a", "p")
.allowAttributes("href").onElements("a")
.toFactory();
// Sanitize your output.
HtmlSanitizer.sanitize(myHtml, policy.apply(myHtmlStreamRenderer));
I know I am answering question after 7 years, but maybe it will be useful for someone.
So, basically I agree with you guys, we should not allow specific character for security reasons (you covered this topic, thanks).
However I was working on legacy internal project which requried escaping html characters but "#" for reason I cannot tell (but it does not matter). My workaround for this was simple:
private static final PolicyFactory PLAIN_TEXT_SANITIZER_POLICY = new HtmlPolicyBuilder().toFactory();
public static String toString(Object stringValue) {
if (stringValue != null && stringValue.getClass() == String.class) {
return HTMLSanitizerUtils.PLAIN_TEXT_SANITIZER_POLICY.sanitize((String) stringValue).replace("#", "#");
} else {
return null;
}
}
I know it is not clean, creates additional String, but we badly need this.
So, if you need to allow specific characters you can use this workaround. But if you need to do this your application is probably incorrectly designed.
The danger in XSS is that one user may insert html code in his input data that you later inserts in a web page that is sent to another user.
There are in principle two strategies you can follow if you want to protect against this. You can either remove all dangerous characters from user input when they enter your system or you can html-encode the dangerous characters when you later on write them back to the browser.
Example of the first strategy:
User enter data (with html code)
Server remove all dangerous characters
Modified data is stored in database
Some time later, server reads modified data from database
Server inserts modified data in a web page to another user
Example of second strategy:
User enter data (with html code)
Unmodified data, with dangerous characters, is stored in database
Some time later, server reads unmodified data from database
Server html-encodes dangerous data and insert them into a web page to another user
The first strategy is simpler, since you usually reads data less often that you use them. However, it is also more difficult because it potentially destroys the data. It is particulary difficult if you needs the data for something other than sending them back to the browser later on (like using an email address to actually send an email). It makes it more difficult to i.e. make a search in the database, include data in an pdf report, insert data in an email and so on.
The other strategy has the advantage of not destroying the input data, so you have a greater freedom in how you want to use the data later on. However, it may be more difficult to actually check that you html-encode all user submitted data that is sent to the browser. A solution to your particular problem would be to html-encode the email address when (or if) you ever put that email address on a web page.
The XSS problem is an example of a more general problem that arise when you mix user submitted data and control code. SQL injection is another example of the same problem. The problem is that the user submitted data is interpreted as instructions and not data. A third, less well known example is if you mix user submitted data in an email. The user submitted data may contain strings that the email server interprets as instructions. The "dangerous character" in this scenario is a line break followed by "From:".
It would be impossible to validate all input data against all possible control characters or sequences of characters that may in some way be interpreted as instructions in some potential application in the future. The only permanent solution to this is to actually sanitize all data that is potentially unsafe when you actually use that data.
To be honest you should really be doing a whitelist against all user supplied input. If it's an email address, just use the OWASP ESAPI or something to validate the input against their Validator and email regular expressions.
If the input passes the whitelist, you should go ahead and store it in the DB. When displaying the text back to a user, you should always HTML encode it.
Your blacklist approach is not recommended by OWASP and could be bypassed by someone who is committed to attacking your users.
You should decode after sanitising your input:
System.out.println(StringEscapeUtils.unescapeHtml("<br />foo'example.com"));

How to check if the content is plain text or not?

I have a plain text area where I accept only plain text from users. I want to make sure that users do not put any markup in the text area. I also assume that users can post in different languages. So, what is the best approach to validate the content both at the server side (using java) and at the client side (using jquery).
Any help in this regard would be appreciated.
Update: I am sorry if the question wasn't clear enough. To make it simple, this is what I want to do - I let users type text in the textarea (no rich text box here). For each double new line in the text area i want to show a paragraph in the HTML page. How do I do that correctly?
It makes little sense to validate user input on HTML content. You can just escape HTML when redisplaying this user input on the webpage. Since you mentioned that you're using Java on the server side and thus you're likely using JSP as view technology, it's good to know that you can use the JSTL <c:out> tag and fn:escapeXml() function to escape HTML before printing to output.
E.g. when redisplaying user-controlled input:
<c:out value="${somebean.sometext}" />
or when redisplaying user-submitted request parameter:
<input type="text" name="foo" value="${fn:escapeXml(param.foo)}" />
This way for example <script>alert('xss')</script> will be printed to HTML output as <script>alert('xss')</script> and thus be displayed in HTML literally as the enduser typed in itself.
If you really insist to validate this, you could eventually grab a HTML parser like Jsoup for this.
String text = request.getParameter("text");
if (!text.equals(Jsoup.parse(text).text())) {
// There was HTML in the text.
}
Update as per the comments you actually want to sanitize the input against a HTML whitelist to remove potential malicious tags. You can do this with Jsoup as well, see also this page.
String sanitized = Jsoup.clean(text, Whitelist.basic());
The allowed elements of Whitelist#basic() is specified in the API documentation.
If it's HTML markup you want to prevent, you could use a regular expression to throw an error if it sees a chevron (<)

Clean up user input from unwanted HTML in a Spring web application

I need to tidy user input in a web application so that I remove certain HTML-tags and encode < to &gt etc.
I've made a couple of simple util methods that strips the HTML, but I find myself adding these EVERYWHERE in my application.
Is there a smarter way to tidy the user input? E.g. in the binding process, or as a filter somehow?
I've seen JTidy that can act as a servlet filter, but I'm not sure that this is what I want because I need to clean user input, not output of my JSP's.
From JTidy's homepage:
It can be used as a tool for cleaning up malformed and faulty HTML generated by your dynamic web application.
It can Validate HTML without changing the output and generate warnings for each page so you could identify JSP or Servlet that need to be fixed.
It can save you hours of time. The more HTML you write in JSP or Servlets, the more time you will save. Don't waste time manually looking for problems, figuring out why your HTML doesn't display like it should.
In addition to JTidy validation you could submit dynamically generated pages to online HTML validators for example W3C Markup Validation Service, WAVE Accessibility Tool or WDG HTML Validator even if you are behind the firewall.
I find myself adding these EVERYWHERE in my application.
Really? It's unusual to have many user inputs that accept HTML. Most inputs should be plain text, so that when the user types < they literally get a less-than sign, not a (potentially-tidied/filtered-out) tag. This requires HTML-encoding at the output stage. Typically you'd get that from the <c:out> tag.
(Old-school JSP before JSTL, lamentably, provided no HTML-encoder, so if for some reason that's what you're working with you would have to provide your own HTML-encoding method built out of string replacments, or use one of the many third-party tools that contain one.)
For the usually-few-if-any ‘rich text’ fields that are deliberately meant to accept user-supplied HTML, you should be filtering them strongly to prevent JavaScript injection from the markup. This is a difficult job! A “couple of simple util methods that strips the HTML” are highly unlikely to do it correctly and securely.
The proper way to do this is to parse the input HTML into a DOM; walk over it checking that only known-safe element and attribute names are used; then serialise it back to well-formed [X]HTML. There are a number of tools that can do this and yes, jTidy is one. You would use the method Tidy.parseDOM on the input field value, remove unwanted items from the resulting DOM with removeChild and removeAttribute, then reserialise using pprint.
A good alternative to HTML-based rich text is to give the user a simpler form of textual markup that you can then convert to known-safe HTML tags. Like this SO text box I'm typing into now.
There's Interceptor interface in Spring MVC which may be used to do some common stuff on every request. Regardless of tool you are using for tidying, you may use it for getting what you need at one point. See this manual to manage using ut. Just put the tidying routine into preHandle method and walk through data in HttpServletRequest to update it.

Customizing jsp pages

I would like to let users customize pages, let's call them A and B. So basically I want to provide a hyperlink to a jps page with big text box where a user should be able to enter any text, html (to appear on page A), with ability to preview it and save.
I haven't really deal with this sort of issues before and would appreciate help on how implement it (examples and reference would be very helpful too)
Thanks
Are you using any kind of web framework(Spring MVC / Struts / Tapestry / etc...)? If you are, they all have tutorials on dealing with user inputs / form submission, so take a look at that. They all differ slightly in how user input is processed so it's impossible to answer this question generically.
If you're not (e.g. this is straight JSP), take a look at this tutorial.
Basically, what you want to do is to define an HTML form on your page B with textarea where user would input custom HTML. When form is submitted, you'll get the text user entered as a request parameter and you can store it somewhere (in the database / flat file / memory / what have you). On your page A you'll need to retrieve that text and bind it to request or page scope, you can then display it using <%= %> or <jsp:getProperty> tags.
To ChssPly76's answer I'd just add that if you're going to provide text entry of html on a web page (or anywhere, really) you're going to want to provide some kind of validation and a mechanism to provide feedback if the html is bad. You might dispense with this for a raw internal tool but anything for public consumption will need it. e.g. what do you do if someone enters
<b>sometext
You can deal with this with simple rules that parse away html tags, a preview that lets people know how they're doing so far ala stackoverflow, an rtf input option, or just a validate and if the tags don't balance a big honking "Try again", but you'll want some kind of check that you won't just be putting up broken pages.

Categories