Creole 1.0 to HTML5 renderer for Java

Creole 1.0 to HTML5 renderer for Java - java

I want to build a CMS/Wiki, I would like to make it with HTML5.
Unfortunately I have been looking for a Wiki rendering engine that could take as input Creole 1.0 syntax and render it as HTML5, Can anyone point me to a library for this purpose?
My second option is to write a renderer for XWiki to support HTML5. Any ideas of how to develop such thing?

The XWiki Rendering engine already supports Creole 1.0 as input syntax, and the output conforms to their recommended HTML output, including the <pre> and <tt> tags for verbatim text (one for block, one for inline). Most of this HTML will be valid HTML5 as well, except for the tt tag which has been removed.
tt was perceived as a purely stylistic tag, and semantically more meaningful tags like kbd, samp, code and var had been available for a long time. The problem is that there are too many alternatives available, so it's hard to pick just one tag to represent correctly (from a semantical POV) all the things that tt is being used for. Should we add 4 different verbatim syntaxes to Creole? Or should we just use code everywhere and ignore its semantics, making it the new tt? Or maybe use pre both for inline and block content, and change the CSS so that it's not always a block element?
Anyway, in order to implement a new html/5.0 syntax renderer, you'd probably have to copy the xhtml module, change most of the classes to just inherit their xhtml/1.0 equivalent, except for XHTMLChainingRenderer where you should alter the way beginFormat and endFormat behave. You should also make an HTML5 parser, so also extend the XHTMLParser class and add another handler for code tags (we should probably do this by default, since it's a valid XHTML tag that we're currently ignoring).

Related

Embedded CSS with owasp-java-html-sanitizer

I am working on a project where we are allowing users to submit html/css and we will create a pdf out of that code. We have the code working, but I would like to sanitize the data that is coming in to prevent any attacks. There is a method to sanitize the inline css through:
http://javadox.com/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/r223/javadoc/org/owasp/html/HtmlPolicyBuilder.html#allowStyling(org.owasp.html.CssSchema)
But is there anything that can be used to check anything within a style tag? I do realize that embedded css may be too difficult to check, but I couldn't find anything in my google searches on this topic. The CssSchema seems to check every property I need, I just cannot apply it to what is between the style tags?

The answer to my question was use AntiSamy.
http://atetric.com/atetric/javadoc/org.owasp.antisamy/antisamy/1.5.3/org/owasp/validator/css/CssScanner.html
This class gave me everything I need. It allows me to scan external, embedded and inline css. I am currently working on the inline, because I think I have to pull out each inline element individually. I did have to strip out the Style tags in order to use CssScanner with scanStyleSheet, but it worked.

Selenium 2 : finding web element locators programmatically

I am tired of manually finding locators (id,xpath,css,linkText etc..) for web elements from my web page source. It also consumes more efforts. So, to avoid that I want to write a code that interacts with page source directly and generates locators details (e.g. id="xyz" , xpath ="html/body/table/tr/td/a" etc.)
To achieve this ,I think I can generate ID locator by using split() function of String object. But, what I don't know is how to generate xpaths, css and linkText locators for all page components?

Although I'd generally recommend to construct XPath expressions on your own (as you can better exploit things to mach against like class attributes), probably the most reasonable and convenient automatic way to determine XPath expressions for selenium is to use either Firebug's or Chrome Developer Tool's "Find XPath" feature. They both at least use #id attributes to shorten XPath expressions.
If you want to write some code yourself, eg. for embedding in other tools you use, you might want to have a look at the answers of "PHP XML - Find out the path to a known value" which solves the problem in PHP, or another one with answers for Javascript: "Javascript get XPath of a node".
If you're using any tools not working on the DOM (Selenium/Firebug/Chrome Dev' Tools/JavaScript will do), watch out for the problems I described in "Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?".

There are (at present) no "tools" that do even a marginally decent job of synthesizing reliable and concise xpath or css based locators. I've been writing selenium and HTML/CSS code for nearly the duration of the industry, and the so-called CASE-methods that purport to do this job better than humans produces laughably flawed output more often that it generates useful material. However, I would add a codicil: there is hope.
By taking careful stock of the various XPath and CSS methods (see http://www.w3schools.com/xpath/xpath_axes.asp for some general guidelines) and using only the most minimal locator strings that will pass muster in Firebug, Selenium IDE and other similar plug-ins, one can progress gradually towards a better approach. In general one should (where possible) use only one component from an object's attribute list and avoid using dynamically defined quantities. Best practices would encourage picking class, name or id only if they are "immutable".
Mutability is a tricky issue: simply dragging a cursor over an object or clicking on it may change the class or css characteristics. Sometimes this can be surmounted by using only the "fixed" portion of the offending attribute. For example, a class might initially be 'tabContent', but when the cursor is placed over the corresponding object it might change to 'tabContentMouseOver'. You get the idea. By using an xpath locator string //*[contains(#class,'tabContent')] you stand a better than even chance of hitting the desired object, irrespective of whether it is clicked, highlighted, or even disabled.
The next "trick" I'd encourage you to consider is using the "buddy" principle; many objects (button-like ones in particular) nowadays consist of an image with no dynamic properties wrapped by a div that manages its event-driven behavior. For such situations you might find that //div[#eventproxy='tabObject']/following::img[contains(#src,'tabImage') and text()='Contents'] or something similar will cover all the bases. Contents will vary with your situation of course.
Make vigorous use of the hierarchical axes methods ('following' is one I use quite often), but only where necessary; sometimes '//' will suffice instead.
Penalize yourself for every unnecessary character, and reward yourself for methods which are concise and can weather frequent and severe code changes. Above all, persevere.
In general I avoid using "pure" CSS locators for the following reason -- they were never intended as locators. "Cascading Style Sheets" by their very nature are designed to impact the maximum number of web objects possible, and are very rarely unique to any one piece of content. Web-coders are notorious for changing these on the fly to produce spiffy new effects or restructure content to suit customer demands; why hitch your tests to content that is known to fluctuate? Besides, everything CSS can do (and I do mean EVERYTHING) can also be done inside XPaths if you so choose.
The canard about XPaths being "slower" than CSS methods I believe has been disproved often enough that it should be taken with several tablespoons of salt. Still, if you really feel more comfortable with CSS techniques, go for it! Experience will educate you better than any blurb you find in stackoverflow ever will.

What is the practicality of placing non-formatting elements inside javadoc?

So, i've been using javadoc for quite a while and know that it supports basic text formatting like <strong> <em> <ul> <ol> etc..
Today i was doing some javadoc, and i wanted to put in <input/> and <select/> so i wrapped it in a {#literal ..}
But i noticed that javadoc wasn't complaining when i put those in without it wrapped in #literal.
To my surprise, i looked at the method signature and read the javadoc, and it actually PARSED the fields.
So my question is, is there a practical use for putting html elements like input and select? And furthermore, isn't this a security concern if somebody generated a web-based javadoc from eclipse?

The reason for allowing HTML is to support basic formatting (font styles, lists, tables, etc). I cannot think of a reason that you would want form controls in there. Likely, preventing such usage wasn't deemed important enough to spend development cycles.

Using JSP to write a widget generator to be used in other servlets

I'm new to JSP, so here's the problem.
I want to create a block of HTML code with dynamism in it.
This block of code needs to be repeated/reused multiple times in multiple places on my site.
It's ugly to create a method with lots of
responseOut.println("<html> text with escaped characters")
So I'm wondering if JSP can be used to create reusable (callable or addressable by Class.methodname) methods.
It's easy to do this in PHP within the PHP framework.
I guess it all depends to the extent that JSP is s precompilation method? or run dynamically in the webserver...
I'm working in Eclipse (with GAE), so any comments and hints in this framework would also be appreciated.
Thanx
Dan

You can encapsulate parts of Java (or JSP code) in tags that can be reused in other contexts.
Have a look at this tutorial for a first introduction.

At its simplest, you can put it in a JSP fragment file and use <jsp:include> to include it. The "dynamism" can just be achieved using taglibs/EL. How exactly to do it depends on the sole functional requirement which is yet unclear. At least, HTML code does definitely not belong in a Servlet class.
However the statement "<html> text with escaped characters" makes me think that all you really need is the JSTL <c:out> tag. It will by default escape HTML entities like <, >, etc to prevent HTML code injection (and XSS holes).

Guide to proper escaping in Play framework

I'm trying to map out how the Play framework supports escaping.
This is a nice page spelling out the needed functionality:
https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet
So I'm trying to relate that to Play template features and fully understand what Play does and doesn't do.
HTML escaping: ${} or the escape() function
Attribute escaping: I can't find a built-in solution
JavaScript escaping: there's an escapeJavaScript() http://www.playframework.org/documentation/1.2/javaextensions
CSS escaping: I can't find a built-in solution
URL escaping: nothing special built-in, but usual Java solution e.g. Java equivalent to JavaScript's encodeURIComponent that produces identical output? - Update: there's urlEncode() at http://www.playframework.org/documentation/1.2/javaextensions
Another point of confusion is the support for index.json (i.e. using templates to build JSON instead of HTML). Does ${} magically switch to JavaScript escaping in a JSON document, or does it still escape HTML, so everything in a JSON template has to have an explicit escapeJavaScript()?
There's also an addSlashes() on http://www.playframework.org/documentation/1.2/javaextensions , but it doesn't seem quite right for any of the situations I can think of. (?)
It would be great to have a thorough guide on how to do all the flavors of escaping in Play. It looks to me like the answer is "roll your own" in several cases but maybe I'm missing what's included.

I've been looking into this so decided to write up my own answer based on what you already had, this OWASP cheat sheet and some experimentation of my own
HTML escaping:
${} or the escape() function
Attribute escaping: (common attributes)
This is handled in play so long as you wrap your attributes in double quotes (") and use ${}.
For complex attributes (href/src/etc.) see JavaScript below
Example unsafe code
<a id=${data.value} href="...">...</a>
<a id='${data.value}' href="...">...</a>
This would break with this for data.value:
% href=javascript:alert('XSS')
%' href=javascript:alert(window.location)
JavaScript escaping: (and complex attributes)
Use escapeJavaScript(). http://www.playframework.org/documentation/1.2/javaextensions
Example unsafe code
<a onmouseover="x='${data.value}'; ..." href="...">...</a>
This would break with this for data.value:
'; javascript:alert(window.location);//
CSS escaping:
Not sure as I've no need for this.
I'd imagine you'd need to create your own somehow. Hopefully there is something out there to manipulate the strings for you.
URL escaping:
use urlEncode(). http://www.playframework.org/documentation/1.2/javaextensions

I think you are absolutely correct in your summary. Play gives you some of the solutions, but not all. However, in the two places where Play does not offer something (in the CSS and attribute), I cant actually find a need for it.
The OWASP standard specifies that you should escape untrusted code. So, the only way you would have untrusted code in your CSS is if it is being generated dynamically. If it is being generated dynamically, then there is nothing stopping you doing so using standard Groovy templates, and therefore using ${} and escape().
As for the attribute escaping, again, the only time you are going to need this as far as I can tell, is when you are building your view in the groovy templates, so again, you can use ${} or escape().

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.