How do I set a URL that includes an ampersand with Thymeleaf? - java

I have something like:
Locale defaultLocale = Locale.getDefault();
final Context ctx = new Context(defaultLocale);
String url = getHost() + "/page?someId=" + some.getId() + "&someParam=" + Boolean.TRUE;
ctx.setVariable("url", url);
final String htmlContent = templateEngine.process("theHtmlPage", ctx);
But when I look at the resulting HTML to print url, it shows &amp instead of & in the URL.
Any suggestions?
I tried using backticks to escape the ampersand in the Java code, but it just printed those too. Looked around on SO, but didn't find much that was relevant. Also tried &
Update: Ok, this won't break the link, but Spring doesn't seem to resolve the parameter "someParam" as true without it.
Rendering tag:
<span th:utext="${url}"></span>
Output:
<span>http://localhost:8080/page?someId=1&someParam=true</span>

It's better to use the dedicated thymeleaf link url syntax.
If you want to construct and url with two parameters and set it to an href attribute you can do like this:
<a th:href="#{page(param1 = ${param1}, param2 = ${param2})}">link</a>
The generated html will be:
link
and the browser will request:
page?param1=val1&param2=val2
=== EDIT ===
To answer the downvote of dopatraman, I've just tested (again) my answer and it works well.
In my answer, the ampersand used as a parameters separator is automatically added by thymeleaf. And this added ampersand is html entity encoded, by thymeleaf, to be stored in the html.
If you have another ampersand inside param1 or param2, this ampersand should be html entity encoded inside the thymeleaf template. But it will appear percent encoded in the generated html.
Example (tested with thymeleaf 2.1.5.RELEASE):
param1 has value abc and param2 has value 12&3
Inside the thymeleaf template all ampersand must be encoded as html entity and we have:
<a th:href="#{page(param1 = ${'abc'}, param2 =${'12&3'})}">link</a>
In the generated html, the ampersand used as a parameter separator is encoded as an html entity and the ampersand in the param2 value is percent-encoded by thymeleaf:
link
When you click on the link, the browser will decode the html entity encoding but not the percent-encoding, and the url in the adress bar will be:
link
Checking with wireshark, we obtain from the HTTP request:
GET /page?param1=abc&param2=12%263

To avoid this kind of problems instead of '&' symbol you can use UTF code for that symbol, e.g in case of UTF-8 use '\u0026'.

Thymeleaf had a recent issue with encoding escapes, which has been fixed in 2.1.4.

Related

Text (URL) converted to wrong Encoding

this is a URL which I have in a text file. When my application reads the file in and converts it to a string it ends up with strange characters being added.
Before:
<p>W3Schools.com</p>
After:
"%3Cp%3E%3Ca%20href%3D%22http%3A//www.w3schools.com%22%20target%3D%22iframe_a%22%3EW3Schools.com%3C/a%3E%3C/p%3E"
I'm aware that this is probably an encoding error, my question is why does it happen and only to URLS? How could I stop it from doing it in Java and iOS.
The string you posted is not a URL, it's HTML that contains a URL.
You can't treat HTML as if it is a valid URL because it's not.
As #Duncan_C said: there's HTML code around your url. You can use the Jsoup library to get rid of that. Once you've done that it will encode properly.
Here's how to do that:
String html = "<p>W3Schools.com</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String url = link.attr("href");
System.out.println(url);

Escape special characters of html string in java

I have a html content as a string.
String attachment = "<div style=\"color:black;font-style:normal;font-size:10pt;font-family:verdana;\"><div><span style=\"background-color: rgb(255,255,255);\">This is special "'; </span></div></div>";
If I try to add this as a multipart form data I get an exception. The reason happens to be the special characters inside the html which is " and '. So I tried escaping the entire string using
org.apache.commons.lang.StringEscapeUtils.escapeJave(attachment);
After doing this the exception disappeared and it was working fine. But the double quotes used for the attributes, like style are also escaped using this method, which is not desired.
Instead of <div> style="color:black;
it was sent as <div> style=\"color:black;
So far I realized that I need to escape only the text inside the html content and not the entire text. i could extract the text content using jsoup or something else then form the html again.
But is there a generic easy solution to do this?

How to handle double escaping in chrome/firefox?

Here is my my java code in jsp :
custUrl="customer.action?custId=211&custAddressId=2341";
Now javascript code :
function submit() {
window.location = "<c:out value='<%=custUrl%>' />";
// here is generated javascript code
// window.location = "customer.action?custId=211&custAddressId=2341"
}
FireFox and Chrome (IE does not do double escape) are escaping the already escaped value (that's why I am getting the second paramter name as amp;custAddressId instead of custAddressId).
Is there any generic solution where i can handle double escaping in firefox/chrome?
UPDATE:-
so bottom line is i want to escape the intended characters with c:out (which is happening)
but also want to avoid the double escaping while sending the data to server which is happening
in case of some browsers
By default special characters are escaped by <c:out>. Turn escaping off as
<c:out value='<%=custUrl%>' escapeXml='false' />
Ampersand & is escaped as & in XML. Here amp is short for ampersand.
This isn't a Firefox/Chrome issue because final HTML generated is the same irrespective of which browser you use to access your site. IE's HTML source viewer must have chosen to display the ampersand in its unescaped form.

Jsoup Whitelist: Parsing non-english character

I am trying to clean HTML text and to extract plain text from it using Jsoup. The HTML might contain non-english character.
For example the HTML text is:
String html = "<p>Á <a href='http://example.com/'><b>example</b></a> link.</p>";
Now if I use Jsoup#parse(String html):
String text = Jsoup.parse(html).text();
It is printing:
Á example link.
And if I clean the text using Jsoup#clean(String bodyHtml, Whitelist whitelist):
String text = Jsoup.clean(html, Whitelist.none());
It is printing:
Á example link.
My question is, how can I get the text
Á example link.
using Whitelist and clean() method? I want to use Whitelist since I might be needed to use Whitelist#addTags(String... tags).
Any information will be very helpful to me.
Thanks.
Not possible in current version (1.6.1), jsoup print Á as Á because the entity escaping feature, there is no "don't escape" mode now (check Entities.EscapeMode).
You can 1. unescape these HTML entities, 2. extend jsoup's source code by adding a new escape mode with an empty map.

How can get html content include content of javascript?

i need to get contents on web page and read it via URL,but contents not include data on javascript any body can help me to solve this problem ? For example : i want to get bibtext content ' javascrip from URL : http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=111326695&CFTOKEN=18291914 how can i get content (2) from (1)
From a quick observation, here is what I would do:
1/ Get the content of this web page: http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=111326695&CFTOKEN=18291914
2/ Use regular expression to search for 'BibTeX' and locate the below string in the content:
<li style="list-style:disc; display:inline; margin-bottom:0px;">BibTeX</li>
3/ Use another regular expression to fish out:
exportformats.cfm?id=152611&expformat=bibtex
4/ Concatenate it to the url (make sure you decode & to &):
"http://portal.acm.org/" + "exportformats.cfm?id=152611&expformat=bibtex"
5/ Capture the content you're looking for. Ultimately http://portal.acm.org/exportformats.cfm?id=152611&expformat=bibtex gives you the content.

Categories