GWT Safe HTML Framework: When to use, and why? - java

In reading JavaDocs and various GWT articles, I've occassionally run into the following Safe* classes:
SafeHtml
SafeHtmlBuilder
It looks like SafeHtml is somehow used when creating a new Widget or Composite, and helps ensure that the Widget/Composite doesn't execute any scripts on the client-side. Is this the case, or am I way off-base? Can someone provide a code example of SafeHtml being used properly in action?
If so, then what's the point of SafeHtmlBuilder? Do you use it inside of a Widget to somehow "build up" safe HTML?

The simplest way to view SafeHtml is as a String where any HTML markup has been appropriately escaped. This protects against Cross-Site Scripting (XSS) attacks as it ensures, for example, if someone enters their name in a form as <SCRIPT>alert('Fail')</SCRIPT> this is the text that gets displayed when your page is rendered rather than the JavaScript being run.
So instead of having something like:
String name = getValueOfName();
HTML widget = new HTML(name);
You should use:
String name = getValueOfName();
HTML widget = new HTML(SafeHtmlUtils.fromString(name));
SafeHtmlBuilder is like a StringBuilder except that it automatically escapes HTML markup in the Strings you add. So to extend the above example:
String name = getValueOfName();
SafeHtmlBuilder shb = new SafeHtmlBuilder();
shb.appendEscaped("Name: ").appendEscaped(name);
HTML widget = new HTML(shb.toSafeHtml());
The is a good guide to SafeHtml in the GWT documentation that is worth a read.

SafeHtmlBuilder is to SafeHtml what StringBuilder is to String.
As for the Safe* API, use it whenever you deal with HTML (or CSS for SafeStyles, or URLs for SafeUri and UriUtils), more precisely building HTML/CSS/URL from parts to be fed to the browser for parsing, with no exception.
Actually, we were recently discussing whether to deprecate Element.setInnerHtml and other similar APIs (HasHTML) in favor of Element.setInnerSafeHtml and the like (HasSafeHtml).

Related

Freemarker embed image on ftl

I am trying to embed an image on a Freemarker ftl template to send as an email, I've based on this question Feemarker writing images to html, I did the exact same thing as this question said, but the email is being generated like this
What may be causing this error, and how to fix it?
My template looks like this
<img alt="My image" src="${imgAsBase64}" />
The image is a Chart, and I get the Base64 String, which I called imageBase64Str, via a Primefaces JavaScript function that generates the Base64 of the chart image, I pass it to the the bean and pass the parameter to the template like this
String encoded = imageBase64Str.split(",")[1];
byte[] decoded = Base64.decodeBase64(encoded);
String imgDataAsBase64 = new String(decoded);
String imgAsBase64 = "data:image/png;base64," + imgDataAsBase64;
emailParams.put("imgAsBase64", imgAsBase64);
String encoded = imageBase64Str.split(",")[1]; is suspicious. Looks like you are changing the base 64 string generated in some different way. Is the image actually a png or it's in another format? I think that if you remove that split and just do emailParams.put("imgAsBase64", imageBase64Str); it may work.
However you need to consider that this solution won't work for many email clients. According to this link https://www.campaignmonitor.com/blog/email-marketing/2013/02/embedded-images-in-html-email/ Base64 embedded images are not supported on a few major email clients, web and standalone, including Gmail and Outlook. Given that they are the most common email clients you don't want to deliver a solution that doesn't work on them or most of your users are gonna be unhappy.
IMO your best bet is to host the images in a server and use fully qualified URLs in your freemarker template.
An alternative is using the attachment and reference them in the html source as explained here: https://stackoverflow.com/a/36870709/2546299 but it require changes on the way the emails are sent (need to add the attachments) so it may not be suitable for your case.

How to change I18N locale NOT by querystring?

This is a struts2 question.
Currently, I am using i18n for internationalization in my webapp.
Some of my jsp pages has querystring to store the request information.
For example,
http://myWebsite.com/myWebsite/myPage?productId=12345
When users try to switch language, I rewrite the URL by javascript to
http://myWebsite.com/myWebsite/myPage?request_locale=zh_CN
And it loses the query string.
And my urls are used in different ways:
http://myWebsite.com/myWebsite/myPage
http://myWebsite.com/myWebsite/myPage?productId=12345#myAnchor
http://myWebsite.com/myWebsite/myPage?productId=12345&key2=value2&key3=value3#myAnchor
http://myWebsite.com/myWebsite/myPage?productId=12345&key2=value2&key3=value3&request_locale=zh_CN
http://myWebsite.com/myWebsite/myPage?productId=12345&key2=value2&key3=value3
...
When I try to handle all this differences in javascript, it becomes so complicated.
Is there any good way to retain the querystrings and anchors after switching locale?
When you rewrite the URL using JS, you should do it based on the current URL.
So, if in the browser is showed
http://myWebsite.com/myWebsite/myPage?productId=12345
you should rewrite it as
http://myWebsite.com/myWebsite/myPage?productId=12345&request_locale=zh_CN
For that, using JS you should get the current URL displayed. Take a look to this question for how to.

Are there any java libraries for validating user supplied HTML, on the server side?

I have a service which takes the user supplied rich text (can have HTML tags) and saves it into the database. That data gets used by some other application. But sometimes the user supplied data has missing HTML tags and wrong closing tags. I want to validate if the user supplied data is valid HTML or not and depending on that I want to warn the user.
Are there any java libraries to do HTML validation?
You can try JTidy, but it's too slow for simple HTML cleaning.
If you want just process HTML you can try NekoHTML, it's lightweight and fast
You can try JTidy.
JTidy is a Java port of HTML Tidy, a
HTML syntax checker and pretty
printer.
You can use Jsoup, from the project README
Here is an example:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
...
String markup = "<body><head>...";
Jsoup.isValid(markup, null);
Instead of null, you can pass a Whitelist ? object as second parameter to the isValid method.
Plus, you can easily install this library using Gradle
Validator.nu, which implements the HTML5 spec, IMO.
There's a great thing called NekoHTML which is just a thin wrapper over the Apache Xerces parser that turns on error-recovery/correction. It doesn't validate so much as error-correct, so you can process the result as XML, i.e. run it through XPaths or XSLTs. It has worked flawlessly for me for several months on completely arbitrary HTML from 3rd-party sites.

Libs for HTML sanitizing

I'm looking for a html sanitizer which I can call per API to sanitise strings which I get from my webapp. Are there some useful easy to use libs available? Does anyone knows maybe one or two?
I don't need something big it just must be able to find unclosed tags and close them.
https://github.com/OWASP/java-html-sanitizer is now marked ready for production use.
A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS.
You can use prepackaged policies
Sanitizers.FORMATTING.and(Sanitizers.LINKS)
or the tests show how you can configure your own easily:
new HtmlPolicyBuilder()
.allowElements("a")
.allowUrlProtocols("https")
.allowAttributes("href").onElements("a")
.requireRelNofollowOnLinks()
or write custom policies to do things like changing h1s to divs with a certain class:
new HtmlPolicyBuilder()
.allowElements("h1", "p")
.allowElements(
new ElementPolicy() {
public String apply(String elementName, List<String> attrs) {
attrs.add("class");
attrs.add("header-" + elementName);
return "div";
}
}, "h1"))
JTidy may help you.
The HTML Parser JSoup also supports sanitisation by policy: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer
Apart from JTidy you can also take a look at:
Nekohtml
TagSoup
Getting text in HTmL document
http://roberto.open-lab.com/2009/11/05/a-java-html-sanitizer-also-against-xss/

why is '<' showing as <

I am outputting a string form my java class like this
String numQsAdded = "<div id='message1'>"+getQuestion()+"</div>";
This string is being sent back to the client side as a XMLHttpRequest. So, in my jsp page I have a javascript alert method that prints out the string returned from the server. it translates '<' to < and '>' to >
how can i avoid this?
I have tried changing my string to:
String numQsAdded = "<div id='message1'>"+getQuestion()+">/div<";
but this has even worse effects. then '&' is translated as 'amp'
XMLHttpRequest encodes the string before sending it. You will have to unescape the string.
on the client side javascript, try using:
alert(unescape(returned_string))
< is the way to show "<" in html, which is produced from XMLHttpRequest. try using XMLRequest
It is the entity reference for "<" while &gt ; is the entity reference for ">" you will need to unescape the string using the unescape() method
Paul Fisher's answer is the right one. I'll take a moment to explain why. HTML-Encoding of content from the server is a security measure to protect your users from script injection attacks. If you simply unescape() what comes from the server you could be putting your users at risk, as well as your site's reputation.
Try doing what Paul said. It's not difficult and it's much more secure. Just to make it easier, here's a sample:
var divStuff = document.createElement('div');
divStuff.appendChild(containerElement);
divStuff.id = 'message1';
divStuff.innerHTML = getQuestion();
This is much more secure and draws a better separation for you presentation layer in your application.
It might be better to send back a raw string with your message, and leave the client Javascript to create a div with class message1 to put it in. This will also help if you ever decide to change the layout or the style of your notices.
I don't think you can avoid that. It's how "<" is represented in HTML, and the result would be OK on your HTML page.

Categories