How to avoid XSS attack

How to avoid XSS attack - java

If user enters javascript:alert(‘e’); in the comment section of my application, then on reloading the page it shows an alert . How to validate these kind of user inputs in java.

A standard industry approach to remediate XSS attacks is to white list encode all inputs of your application. Make sure nothing out of predetermined input is allowed to come in. This is usually done with the use of regular expressions that scan all inputs.
Additionally to secure your application against reflected XSS attacks encoding output on the server side should be considered as well.
Have a look at the ESAPI and antisamy frameworks whitch provide API's to assist in this effort.
Antisamy Project
ESAPI Security API

Every time you write a user input back to new page, the simplest approach is to escape the content.

Related

Best Practice For XSS Attacks in Rest Api [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have read a lot about it, but couldnt really decide which way is the best.
I have a web app and a java rest application which serves to customers.
What is the best way to prevent xss attacks using parameters in rest api and frontend?
Validating each parameter in both server and client side
Filter and control request params
On client side control before putting every data in between tags
etc...
Thank you for your time.

As with anything defense in depth is important, so validation and encoding should be done on any user provided input. Encoding is very important because what might be considered malicious is contextual. For example, what might be safe HTML might be an SQL Injection attack.
Parameters in a REST API may be saved which means they are returned from subsequent requests or the results may be reflected back to the user in the request. This means that you can get both reflected and stored XSS attacks. You also need to be careful about DOM Based XSS attacks. A more modern categorization that addresses overlap between stored, reflected, and DOM XSS is Server XSS and Client XSS.
OWASP has a great Cross Site Scripting Prevention Cheat Sheet that details out how to prevent cross site scripting. I find the XSS Prevention Rules Summary and the Output Encoding Rules Summary sections to be very handy.
The big take away is that browsers parse data differently depending on the context, so it is very important that you don't just HTML Entity Encode the data everywhere. This means it is important to do two things:
Rule #0 - Only insert untrusted (user provided) data in allowed locations. Only insert data into an HTML document into a "slot" defined by Rules #1-5.
When you insert data into one of the trusted slots follow the encoding rules for that specific slot. Again the rules are detailed in the previously linked Cross Site Scripting Prevention Cheat Sheet.
There is also a DOM based XSS Prevention cheat sheet. Like the server side XSS cheat sheet, it provies a set of rules to prevent DOM based XSS.

When it comes to XSS only possible choice is to validate user input, any kind of user input, whether it is passed from the browser or in any other way (like from terminal client).
It depends on the scenario you are following.
If it is just data without HTML content then you don't need to worry about XSS.
Otherwise, just removing <,> symbols or casting them into character encoded string would be enough.
Also you can avoid using innerHTML to append new content to the document, use innerText instead and even if there are XSS content it won't execute.
But it gets little bit complicated when api response returns HTML content as well which you need to display somewhere. In such cases avoid directly displaying user input inside HTML snippet - try to character encode or remove <, > symbols and it will be just fine

Whitelist validation for http request

I am trying to create a servlet request filter which filters any incoming request based on the whitelist characters.
I want to accept only those characters which matches the whitelist pattern to avoid any malicious code to be executed by the attacker in the form of script or modified URL.
Does anyone know which whitelist characters should be used for filtering any HTTP request string?
Any help would be appreciated
Thanks in Advance

Implement pattern matching mechanism to find whitelist characters from your URL pattern by using RegEx..
Follow this link1
Or you can try:
if (inputUrl.contains(whiteList)) {
// your code goes here
}
Or If you need to know where it occurs, you can use indexOf:
int index = inputUrl.indexOf(whiteList);
if (index != -1) // -1 means "not found"
{
...
}
Thanks,
~Chandan

The problem is that "malicious" is very broad term. You should have clear idea what types of attacks are you trying to protect from and then take measures to prevent it.
You cannot specify set of characters in general which need to be filtered out, you need to know domain in which your input in url will be used. Generally dangerous is not url itself but url parameters which are provided by your users and then interpreted by your application. Depending on how your application will use this input, you need to take specific precautions. So for example:
Url param is used to determine target of redirect. User can use this to navigate victim to malicious site, site which masks as your site but will steal users credentials providing false credentials and so on. In that case you should construct whitelist of allowed destinations expected by your aplication and forbid others. See OWASP top TEN - Unvalidated redirects and forwards.
You save data from url param to DB. You should prevent SQL injection by using Parametrized queries. See OWASP SQL injection Cherat Sheet,
Url param data will be displayed as html. You should sanitize your html by some already proven sanitizer such as OWASP html sanitizer or AntiSamy to prevent Cross Site Scripting.
And so on...
The point is, there is no silver bullet to protect you from all the malicious attack vectors especially not by whitelisting certain characters in servlet filter. You should know where is potentially malicious data used and process it with its specific usage in mind because different targets will have different vulnerabilities and will require different measures for protection.
Good start for high level overview of security issues and measures form protection against them is OWASP TOP TEN. Then I recommend some more detailed guides and resources provided by owasp.

Do I need to enable canonicalization when using OWASP ESAPI?

We are adding ESAPI 2.x (owasp java security library) to an application.
The change is easy though quite repetitive. We are adding validations to all input parameters so we make sure all the characters they are composed by are within a whitelist.
This is it:
Validator instance = ESAPI.validator();
Assert.assertTrue(instance.isValidInput("test", "xxx#gmail.com", "Email", 100, false));
Then Email patterns is set in the validation.properties file like:
Validator.Email=^[A-Za-z0-9._%'-]+#[A-Za-z0-9.-]+\\.[a-zA-Z]{2,4}$
Easy!
We are not encoding output given that after the input validation, data becomes trusted.
I can see in ESAPI that it has a flag to canonicalize the input String. I understand that canonicalization is "de-encoding" so any encoded String is transformed in plain text.
The question is. Why do we need to canonicalize?
Can anybody show a sample of an attack that will be prevented by using canonicalization?? (in java)
thank you!

Here's one (of several thousand possible examples):
Take this simple XSS input:
<script>alert('XSS');</script>
//Now we URI encode it:
%3Cscript%3Ealert(%27XSS%27)%3B%3C%2Fscript%3E
//Now we URI encode it again:
%253Cscript%253Ealert(%2527XSS%2527)%253B%253C%252Fscript%253E
Canonicalization on the input that's been encoded once will result in the original input, but in ESAPI's case, the third input will throw an IntrusionException because there is NEVER a valid use case where user input will be URI-encoded more than once. In this particular example, canonicalization means "all URI data will be reduced into its actual character representation." ESAPI actually does more than just URI decoding, btw. This is important if you wish to perform both security and/or business validation using regular expressions--the primary use of regular expressions in most applications.
At a bare minimum, canonicalization gives you good assurance that sneaking malicious input into the application isn't easy: The goal is to restrict to known-good values (whitelist) and reject everything else.
In regards to your ill-advised comment here:
We are not encoding output given that after the input validation, data becomes trusted.
Here's the dirty truth: Javascript, XML, JSON, and HTML are not "regular languages." They're nondeterministic. What this means in practical terms is that it is mathematically impossible to write a regular expression to reject all attempts to insert HTML or Javascript into your application. Look at that XSS Filter Evasion Cheat sheet I posted above.
Does your application use jquery? The following input is malcious:
$=''|'',_=$+!"",__=_+_,___=__+_,($)[_$=($$=(_$=""+{})[__+__+_])+_$[_]+(""+_$[-__])[_]+(""+!_)[___]+($_=(_$=""+!$)[$])+_$[_]+_$[__]+$$+$_+(""+{})[_]+_$[_]][_$]((_$=""+!_)[_]+_$[__]+_$[__+__]+(_$=""+!$)[_]+_$[$]+"("+_+")")()
So you must encode all data when output to the user, for the proper context, this means that if the piece of data is going to be first input into a javascript function, and then displayed as HTML, you encode for Javascript, and then HTML. If its output into an HTML data field (such as a default input box) you encode it for an HTML Attribute.
Its actually MORE IMPORTANT to do output encoding than to do input filtering in protecting against XSS. (If I HAD to just choose one...)
The pattern you want to follow in web development is one where any input that is coming from the outside world is treated as malicious at all times. You encode any time you're handing off to a dynamic interpreter.

Canonicalization of data is also about deriving the data to its basic form. So if we take a different scenario where a file path(relative/symlink) and its allied directory permission is involved we need to first canonicalize the path and then validate else it will allow somebody to explore those files without permission by just passing the target acceptable data.

How to fix endemic XSS vulnerabilities in a Java webapp

I am working on a Java web application that is many years old.
Most of the <bean:write>s in the JSPs have filter="false" even when it isn't needed, probably because of developers blindly copying existing code. <bean:write> is the Struts tag to output a JSP variable, and when filter="false" is specified it does not do HTML escaping (so filter="false" is similar to the <c:out> attribute escapeXml="false"). This means that the application is vulnerable to XSS attacks, because some of these <bean:write filter="false">s are outputting user input.
A blanket removal of filter="false" isn't an option because in some cases the application allows the user to enter HTML using a TinyMCE text area, so we do need to output raw HTML in some cases to retain the user-entered formatting (although we should still be sanitising user-entered HTML to remove scripts).
There are thousands of filter="false"s in the code so an audit of each one to work out whether it is required would take too long.
What we are thinking of doing is making our own version of the bean:write tag, say secure:write, and doing a global find/replace of bean:write with secure:write in our JSPs. secure:write will strip scripts from the output when filter="false" is specified. After this change users would still be able to cause formatting HTML to be output where they shouldn't really be able to, but we aren't worried about that for the time being as long as the XSS vulnerabilities are fixed.
We would like to use a library to implement the script-stripping in the secure:write tag and we have been looking at https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project and https://code.google.com/p/owasp-java-html-sanitizer/. Both look like they are capable of sanitising HTML, although AntiSamy looks like it is intended to be used to sanitise HTML on the way in to the application instead of on the way out, and since data is output more often than it is input we are concerned that running all of our secure:write output through it could be slow.
I have 2 main questions:
1) Will our proposed approach work to fix the XSS vulnerabilities caused by filter="false"?
2) Can anyone recommend a library to use for HTML sanitisation when displaying content, i.e. which is fast enough to not significantly affect the page-rendering performance? Has anyone used AntiSamy or owasp-java-html-sanitizer for something similar?

1) Will our proposed approach work to fix the XSS vulnerabilities caused by filter="false"?
This definitely sounds like an improvement that will reduce your attack surface, but it's not sufficient.
Once an attacker can no longer inject <script>doEvil()</script> they will then focus on injecting javascript:doEvil() where URLs are expected, so you will need to harden places where URLs are injected as well.
If you're using an XSS scanner, I would do what you describe, then rerun your scanners, making sure that it tests for injected javascript URLs.
Once URLs are locked down, you should audit any writes into style attributes or elements and event handler attributes.
2) Can anyone recommend a library to use for HTML sanitisation when displaying content, i.e. which is fast enough to not significantly affect the page-rendering performance? Has anyone used AntiSamy or owasp-java-html-sanitizer for something similar?
Shameless plug: https://code.google.com/p/owasp-java-html-sanitizer/
A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS.

Clean up user input from unwanted HTML in a Spring web application

I need to tidy user input in a web application so that I remove certain HTML-tags and encode < to &gt etc.
I've made a couple of simple util methods that strips the HTML, but I find myself adding these EVERYWHERE in my application.
Is there a smarter way to tidy the user input? E.g. in the binding process, or as a filter somehow?
I've seen JTidy that can act as a servlet filter, but I'm not sure that this is what I want because I need to clean user input, not output of my JSP's.
From JTidy's homepage:
It can be used as a tool for cleaning up malformed and faulty HTML generated by your dynamic web application.
It can Validate HTML without changing the output and generate warnings for each page so you could identify JSP or Servlet that need to be fixed.
It can save you hours of time. The more HTML you write in JSP or Servlets, the more time you will save. Don't waste time manually looking for problems, figuring out why your HTML doesn't display like it should.
In addition to JTidy validation you could submit dynamically generated pages to online HTML validators for example W3C Markup Validation Service, WAVE Accessibility Tool or WDG HTML Validator even if you are behind the firewall.

I find myself adding these EVERYWHERE in my application.
Really? It's unusual to have many user inputs that accept HTML. Most inputs should be plain text, so that when the user types < they literally get a less-than sign, not a (potentially-tidied/filtered-out) tag. This requires HTML-encoding at the output stage. Typically you'd get that from the <c:out> tag.
(Old-school JSP before JSTL, lamentably, provided no HTML-encoder, so if for some reason that's what you're working with you would have to provide your own HTML-encoding method built out of string replacments, or use one of the many third-party tools that contain one.)
For the usually-few-if-any ‘rich text’ fields that are deliberately meant to accept user-supplied HTML, you should be filtering them strongly to prevent JavaScript injection from the markup. This is a difficult job! A “couple of simple util methods that strips the HTML” are highly unlikely to do it correctly and securely.
The proper way to do this is to parse the input HTML into a DOM; walk over it checking that only known-safe element and attribute names are used; then serialise it back to well-formed [X]HTML. There are a number of tools that can do this and yes, jTidy is one. You would use the method Tidy.parseDOM on the input field value, remove unwanted items from the resulting DOM with removeChild and removeAttribute, then reserialise using pprint.
A good alternative to HTML-based rich text is to give the user a simpler form of textual markup that you can then convert to known-safe HTML tags. Like this SO text box I'm typing into now.

There's Interceptor interface in Spring MVC which may be used to do some common stuff on every request. Regardless of tool you are using for tidying, you may use it for getting what you need at one point. See this manual to manage using ut. Just put the tidying routine into preHandle method and walk through data in HttpServletRequest to update it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.