Embedded CSS with owasp-java-html-sanitizer

Embedded CSS with owasp-java-html-sanitizer - java

I am working on a project where we are allowing users to submit html/css and we will create a pdf out of that code. We have the code working, but I would like to sanitize the data that is coming in to prevent any attacks. There is a method to sanitize the inline css through:
http://javadox.com/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/r223/javadoc/org/owasp/html/HtmlPolicyBuilder.html#allowStyling(org.owasp.html.CssSchema)
But is there anything that can be used to check anything within a style tag? I do realize that embedded css may be too difficult to check, but I couldn't find anything in my google searches on this topic. The CssSchema seems to check every property I need, I just cannot apply it to what is between the style tags?

The answer to my question was use AntiSamy.
http://atetric.com/atetric/javadoc/org.owasp.antisamy/antisamy/1.5.3/org/owasp/validator/css/CssScanner.html
This class gave me everything I need. It allows me to scan external, embedded and inline css. I am currently working on the inline, because I think I have to pull out each inline element individually. I did have to strip out the Style tags in order to use CssScanner with scanStyleSheet, but it worked.

Related

Selenium 2 : finding web element locators programmatically

I am tired of manually finding locators (id,xpath,css,linkText etc..) for web elements from my web page source. It also consumes more efforts. So, to avoid that I want to write a code that interacts with page source directly and generates locators details (e.g. id="xyz" , xpath ="html/body/table/tr/td/a" etc.)
To achieve this ,I think I can generate ID locator by using split() function of String object. But, what I don't know is how to generate xpaths, css and linkText locators for all page components?

Although I'd generally recommend to construct XPath expressions on your own (as you can better exploit things to mach against like class attributes), probably the most reasonable and convenient automatic way to determine XPath expressions for selenium is to use either Firebug's or Chrome Developer Tool's "Find XPath" feature. They both at least use #id attributes to shorten XPath expressions.
If you want to write some code yourself, eg. for embedding in other tools you use, you might want to have a look at the answers of "PHP XML - Find out the path to a known value" which solves the problem in PHP, or another one with answers for Javascript: "Javascript get XPath of a node".
If you're using any tools not working on the DOM (Selenium/Firebug/Chrome Dev' Tools/JavaScript will do), watch out for the problems I described in "Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?".

There are (at present) no "tools" that do even a marginally decent job of synthesizing reliable and concise xpath or css based locators. I've been writing selenium and HTML/CSS code for nearly the duration of the industry, and the so-called CASE-methods that purport to do this job better than humans produces laughably flawed output more often that it generates useful material. However, I would add a codicil: there is hope.
By taking careful stock of the various XPath and CSS methods (see http://www.w3schools.com/xpath/xpath_axes.asp for some general guidelines) and using only the most minimal locator strings that will pass muster in Firebug, Selenium IDE and other similar plug-ins, one can progress gradually towards a better approach. In general one should (where possible) use only one component from an object's attribute list and avoid using dynamically defined quantities. Best practices would encourage picking class, name or id only if they are "immutable".
Mutability is a tricky issue: simply dragging a cursor over an object or clicking on it may change the class or css characteristics. Sometimes this can be surmounted by using only the "fixed" portion of the offending attribute. For example, a class might initially be 'tabContent', but when the cursor is placed over the corresponding object it might change to 'tabContentMouseOver'. You get the idea. By using an xpath locator string //*[contains(#class,'tabContent')] you stand a better than even chance of hitting the desired object, irrespective of whether it is clicked, highlighted, or even disabled.
The next "trick" I'd encourage you to consider is using the "buddy" principle; many objects (button-like ones in particular) nowadays consist of an image with no dynamic properties wrapped by a div that manages its event-driven behavior. For such situations you might find that //div[#eventproxy='tabObject']/following::img[contains(#src,'tabImage') and text()='Contents'] or something similar will cover all the bases. Contents will vary with your situation of course.
Make vigorous use of the hierarchical axes methods ('following' is one I use quite often), but only where necessary; sometimes '//' will suffice instead.
Penalize yourself for every unnecessary character, and reward yourself for methods which are concise and can weather frequent and severe code changes. Above all, persevere.
In general I avoid using "pure" CSS locators for the following reason -- they were never intended as locators. "Cascading Style Sheets" by their very nature are designed to impact the maximum number of web objects possible, and are very rarely unique to any one piece of content. Web-coders are notorious for changing these on the fly to produce spiffy new effects or restructure content to suit customer demands; why hitch your tests to content that is known to fluctuate? Besides, everything CSS can do (and I do mean EVERYTHING) can also be done inside XPaths if you so choose.
The canard about XPaths being "slower" than CSS methods I believe has been disproved often enough that it should be taken with several tablespoons of salt. Still, if you really feel more comfortable with CSS techniques, go for it! Experience will educate you better than any blurb you find in stackoverflow ever will.

Creole 1.0 to HTML5 renderer for Java

I want to build a CMS/Wiki, I would like to make it with HTML5.
Unfortunately I have been looking for a Wiki rendering engine that could take as input Creole 1.0 syntax and render it as HTML5, Can anyone point me to a library for this purpose?
My second option is to write a renderer for XWiki to support HTML5. Any ideas of how to develop such thing?

The XWiki Rendering engine already supports Creole 1.0 as input syntax, and the output conforms to their recommended HTML output, including the <pre> and <tt> tags for verbatim text (one for block, one for inline). Most of this HTML will be valid HTML5 as well, except for the tt tag which has been removed.
tt was perceived as a purely stylistic tag, and semantically more meaningful tags like kbd, samp, code and var had been available for a long time. The problem is that there are too many alternatives available, so it's hard to pick just one tag to represent correctly (from a semantical POV) all the things that tt is being used for. Should we add 4 different verbatim syntaxes to Creole? Or should we just use code everywhere and ignore its semantics, making it the new tt? Or maybe use pre both for inline and block content, and change the CSS so that it's not always a block element?
Anyway, in order to implement a new html/5.0 syntax renderer, you'd probably have to copy the xhtml module, change most of the classes to just inherit their xhtml/1.0 equivalent, except for XHTMLChainingRenderer where you should alter the way beginFormat and endFormat behave. You should also make an HTML5 parser, so also extend the XHTMLParser class and add another handler for code tags (we should probably do this by default, since it's a valid XHTML tag that we're currently ignoring).

IlleagalStateException when wrapping spring mvc select tag with custom tag

Basic problem
I've come across a bit of a problem while writing my own custom JSP tags to "wrap" the spring MVC form tags. I've wrapped other tags successfully but come unstuck with the select and options tags, this throws an IlleagalStateException. I've debugged the spring code and found that the options tag looks for an ancestor select tag. I'm doing this with tag files so the spring select tag is actually in a different tag file. I guess that's why it doesn't find it.
So the questions is what can I do to get round this?
Possible solutions
I've tried looking for solutions but all I've found is other people having the same problem but no solution posted. I did ponder writing my own select and options tags without using the spring tags but I don't really want to have to replicate the binding that it gives you for free. I don't mind changing to use Java classes rather than tag files but I found previously that the output won't be evaluated as a JSP so you can't output another JSP tag.
Reasons for doing this
Having thought about this for a week since first asking the question I am now clearer on what I want to achieve.
To simplify the markup needed in my JSP's
Factoring out common code (e.g. form:errors after an input or getting a translation from spring:message)
To encapsulate look and feel (CSS goes a long way but often you need to change the markup too)
To be able to build enhanced components that extend the functionallity of the spring tags (e.g. render a multi-select as a picklist or display readonly inputs as text labels)
I'll be interested to hear what people think.
Thanks

Firstly, I'm not sure what you mean by wanting control over styling. I thought you could pass-in class and id attributes to Spring tags and they were copied through (? - although I might be getting confused with Grail tags, as I've been writing Grails apps lately). Edit: plus you can style Spring generated tags by referencing an outer element. E.g. surround your form elements with a div and then style the form elements like: #myDiv input { color: red; }.
From my experience (10+ years webapp dev), its not worth the extra effort to try and future proof your app. When you choose a framework like Spring MVC you are getting a lot of stuff for free, that you would normally have to write yourself. The cost of this free stuff is a certain amount of lock-in (as you said). Spring is pretty good when it comes to this aspect - you can use as little or as much as you want and its usually pretty straight forward to engineer it out if needs be in the future.
So my take is: use the Spring tags "as is". The likelihood of you needing to remove the Spring aspect in the future is very small. As such its a worthwhile risk to "put off" if/until that scenario arrises. You have likely already spent as much time and code trying to engineer your future-proof solution as you would've spent removing the Spring tags - that it outweighs any benefit it might have provided. And add to that - you've written that code and you and/or someone else will have to maintain that code now - versus letting the Spring developers maintain the code for you.
Lastly, if you really don't want to have this lock-in and want full control over styling, then write your form elements by hand.
<select name="foo_select">
<option value="">-- select a foo type --</option>
<c:forEach var="foo" items="${fooGroups}">
<option value="${foo}">${foo}</option>
</c:forEach>
</select>

I've thought about this for a good week now and this is the shortlist:
Give up and directly use the spring tags in my JSP's
Don't use the spring tags at all and replicate their logic in my own tags
Possibly write a tag class that extends or makes use of the spring tag class
Expand the scope of my tags to wrap both the select and options tags
Given the reasons for wanting to do this (which I have now clarified in the question), I've decided to go for the last option. I wasn't keen on this originally because I thought I might end up with hundreds of parameters but it's actually not too bad. The tag files are designed for wrapping common bits of markup so this is what they're for. I've also wrapped my own tag further so there is a picklist tag which outputs my custom select tag and then writes the JS needed to initialise it.
I think this is the best of the possible solutions I've come across based on what I wanted to achieve. This is what I'm going with but I'd still be interested to hear of other peoples solutions if they think they have something better.

Using JSP to write a widget generator to be used in other servlets

I'm new to JSP, so here's the problem.
I want to create a block of HTML code with dynamism in it.
This block of code needs to be repeated/reused multiple times in multiple places on my site.
It's ugly to create a method with lots of
responseOut.println("<html> text with escaped characters")
So I'm wondering if JSP can be used to create reusable (callable or addressable by Class.methodname) methods.
It's easy to do this in PHP within the PHP framework.
I guess it all depends to the extent that JSP is s precompilation method? or run dynamically in the webserver...
I'm working in Eclipse (with GAE), so any comments and hints in this framework would also be appreciated.
Thanx
Dan

You can encapsulate parts of Java (or JSP code) in tags that can be reused in other contexts.
Have a look at this tutorial for a first introduction.

At its simplest, you can put it in a JSP fragment file and use <jsp:include> to include it. The "dynamism" can just be achieved using taglibs/EL. How exactly to do it depends on the sole functional requirement which is yet unclear. At least, HTML code does definitely not belong in a Servlet class.
However the statement "<html> text with escaped characters" makes me think that all you really need is the JSTL <c:out> tag. It will by default escape HTML entities like <, >, etc to prevent HTML code injection (and XSS holes).

Guide to proper escaping in Play framework

I'm trying to map out how the Play framework supports escaping.
This is a nice page spelling out the needed functionality:
https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet
So I'm trying to relate that to Play template features and fully understand what Play does and doesn't do.
HTML escaping: ${} or the escape() function
Attribute escaping: I can't find a built-in solution
JavaScript escaping: there's an escapeJavaScript() http://www.playframework.org/documentation/1.2/javaextensions
CSS escaping: I can't find a built-in solution
URL escaping: nothing special built-in, but usual Java solution e.g. Java equivalent to JavaScript's encodeURIComponent that produces identical output? - Update: there's urlEncode() at http://www.playframework.org/documentation/1.2/javaextensions
Another point of confusion is the support for index.json (i.e. using templates to build JSON instead of HTML). Does ${} magically switch to JavaScript escaping in a JSON document, or does it still escape HTML, so everything in a JSON template has to have an explicit escapeJavaScript()?
There's also an addSlashes() on http://www.playframework.org/documentation/1.2/javaextensions , but it doesn't seem quite right for any of the situations I can think of. (?)
It would be great to have a thorough guide on how to do all the flavors of escaping in Play. It looks to me like the answer is "roll your own" in several cases but maybe I'm missing what's included.

I've been looking into this so decided to write up my own answer based on what you already had, this OWASP cheat sheet and some experimentation of my own
HTML escaping:
${} or the escape() function
Attribute escaping: (common attributes)
This is handled in play so long as you wrap your attributes in double quotes (") and use ${}.
For complex attributes (href/src/etc.) see JavaScript below
Example unsafe code
<a id=${data.value} href="...">...</a>
<a id='${data.value}' href="...">...</a>
This would break with this for data.value:
% href=javascript:alert('XSS')
%' href=javascript:alert(window.location)
JavaScript escaping: (and complex attributes)
Use escapeJavaScript(). http://www.playframework.org/documentation/1.2/javaextensions
Example unsafe code
<a onmouseover="x='${data.value}'; ..." href="...">...</a>
This would break with this for data.value:
'; javascript:alert(window.location);//
CSS escaping:
Not sure as I've no need for this.
I'd imagine you'd need to create your own somehow. Hopefully there is something out there to manipulate the strings for you.
URL escaping:
use urlEncode(). http://www.playframework.org/documentation/1.2/javaextensions

I think you are absolutely correct in your summary. Play gives you some of the solutions, but not all. However, in the two places where Play does not offer something (in the CSS and attribute), I cant actually find a need for it.
The OWASP standard specifies that you should escape untrusted code. So, the only way you would have untrusted code in your CSS is if it is being generated dynamically. If it is being generated dynamically, then there is nothing stopping you doing so using standard Groovy templates, and therefore using ${} and escape().
As for the attribute escaping, again, the only time you are going to need this as far as I can tell, is when you are building your view in the groovy templates, so again, you can use ${} or escape().

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.