How to handle double escaping in chrome/firefox?

How to handle double escaping in chrome/firefox? - java

Here is my my java code in jsp :
custUrl="customer.action?custId=211&custAddressId=2341";
Now javascript code :
function submit() {
window.location = "<c:out value='<%=custUrl%>' />";
// here is generated javascript code
// window.location = "customer.action?custId=211&custAddressId=2341"
}
FireFox and Chrome (IE does not do double escape) are escaping the already escaped value (that's why I am getting the second paramter name as amp;custAddressId instead of custAddressId).
Is there any generic solution where i can handle double escaping in firefox/chrome?
UPDATE:-
so bottom line is i want to escape the intended characters with c:out (which is happening)
but also want to avoid the double escaping while sending the data to server which is happening
in case of some browsers

By default special characters are escaped by <c:out>. Turn escaping off as
<c:out value='<%=custUrl%>' escapeXml='false' />
Ampersand & is escaped as & in XML. Here amp is short for ampersand.
This isn't a Firefox/Chrome issue because final HTML generated is the same irrespective of which browser you use to access your site. IE's HTML source viewer must have chosen to display the ampersand in its unescaped form.

Related

Escape special characters of html string in java

I have a html content as a string.
String attachment = "<div style=\"color:black;font-style:normal;font-size:10pt;font-family:verdana;\"><div><span style=\"background-color: rgb(255,255,255);\">This is special "'; </span></div></div>";
If I try to add this as a multipart form data I get an exception. The reason happens to be the special characters inside the html which is " and '. So I tried escaping the entire string using
org.apache.commons.lang.StringEscapeUtils.escapeJave(attachment);
After doing this the exception disappeared and it was working fine. But the double quotes used for the attributes, like style are also escaped using this method, which is not desired.
Instead of <div> style="color:black;
it was sent as <div> style=\"color:black;
So far I realized that I need to escape only the text inside the html content and not the entire text. i could extract the text content using jsoup or something else then form the html again.
But is there a generic easy solution to do this?

How do I set a URL that includes an ampersand with Thymeleaf?

I have something like:
Locale defaultLocale = Locale.getDefault();
final Context ctx = new Context(defaultLocale);
String url = getHost() + "/page?someId=" + some.getId() + "&someParam=" + Boolean.TRUE;
ctx.setVariable("url", url);
final String htmlContent = templateEngine.process("theHtmlPage", ctx);
But when I look at the resulting HTML to print url, it shows &amp instead of & in the URL.
Any suggestions?
I tried using backticks to escape the ampersand in the Java code, but it just printed those too. Looked around on SO, but didn't find much that was relevant. Also tried &
Update: Ok, this won't break the link, but Spring doesn't seem to resolve the parameter "someParam" as true without it.
Rendering tag:
<span th:utext="${url}"></span>
Output:
<span>http://localhost:8080/page?someId=1&someParam=true</span>

It's better to use the dedicated thymeleaf link url syntax.
If you want to construct and url with two parameters and set it to an href attribute you can do like this:
<a th:href="#{page(param1 = ${param1}, param2 = ${param2})}">link</a>
The generated html will be:
link
and the browser will request:
page?param1=val1&param2=val2
=== EDIT ===
To answer the downvote of dopatraman, I've just tested (again) my answer and it works well.
In my answer, the ampersand used as a parameters separator is automatically added by thymeleaf. And this added ampersand is html entity encoded, by thymeleaf, to be stored in the html.
If you have another ampersand inside param1 or param2, this ampersand should be html entity encoded inside the thymeleaf template. But it will appear percent encoded in the generated html.
Example (tested with thymeleaf 2.1.5.RELEASE):
param1 has value abc and param2 has value 12&3
Inside the thymeleaf template all ampersand must be encoded as html entity and we have:
<a th:href="#{page(param1 = ${'abc'}, param2 =${'12&3'})}">link</a>
In the generated html, the ampersand used as a parameter separator is encoded as an html entity and the ampersand in the param2 value is percent-encoded by thymeleaf:
link
When you click on the link, the browser will decode the html entity encoding but not the percent-encoding, and the url in the adress bar will be:
link
Checking with wireshark, we obtain from the HTTP request:
GET /page?param1=abc&param2=12%263

To avoid this kind of problems instead of '&' symbol you can use UTF code for that symbol, e.g in case of UTF-8 use '\u0026'.

Thymeleaf had a recent issue with encoding escapes, which has been fixed in 2.1.4.

Need to scrape an url from a web page

I need to scrape a url from a website which is located within some javascript code.
<script type="text/javascript">
(function() {
// somewhere..
$.get("http://someurl.com?q=34343&b=343434&c=343434")...
});
</script>
I know that the url starts with http://someurl.com?q= and it needs to have at least a second query parameter (&b=) inside, but the rest of the content is unknown.
I initially tried with jsoup, however it's not really suitable for that task. Manually fetching the page and then applying a regex pattern on it is also not a preferable option since the page is huge. What could I do to get the url quick and safe?

You can use this regex
/\$\.get\("(http:\/\/someurl\.com\?q=[\w.\-%#\/]*&b=[\w.\-%&=\/]*)"\)/g
This regex will search directly for this string:
$.get("http://someurl.com?q=
It will then allow any number of URL valid characters to occur as the value of q.
It will then look to match
&b=
and then again any number of valid characters followed by the opposing quotation marks. I tested it with
MATCH - $.get("http://someurl.com?q=34343&b=343434&c=343434")
MATCH - $.get("http://someurl.com?q=34343&b=13a43&k=343434&c2=something")
FAIL - $.get("http://someurl.com?q=34343&c=343434&b=343434")
FAIL - $.get("http://someurl.com?a=34343&b=343434=343434")
If you only want to return the first result you can remove the global identifier from the end
/\$\.get\("(http:\/\/someurl\.com\?q=[\w.\-%#\/]*&b=[\w.\-%&=\/]*)"\)/

Jsoup having problems with special HTML symbols, ‘ — etc

I have some HTML (String) that I am putting through Jsoup just so I can add something to all href and src attributes, that works fine. However, I'm noticing that for some special HTML characters, Jsoup is converting them from say “ to the actual character “. I output the value before and after and I see that change.
Before:
THIS — IS A “TEST”. 5 > 4. trademark: 
After:
THIS — IS A “TEST”. 5 > 4. trademark: ?
What the heck is going on? I was specifically converting those special characters to their HTML entities before any Jsoup stuff to avoid this. The quotes changed to the actual quote characters, the greater-than stayed the same, and the trademark changed into a question mark. Aaaaaaa.
FYI, my Jsoup code is doing:
Document document = Jsoup.parse(fileHtmlStr);
//some stuff
String modifiedFileHtmlStr = document.html();
Thanks for any help!

The code below will give similar to the input markup. It changes the escaping mode for specific characters and sets ASCII mode to escape the TM sign for systems which don't support Unicode.
The output:
<p>THIS — IS A “TEST&rdquor;&period; 5 > 4&period; trademark&colon; </p>
The code:
Document doc = Jsoup.parse("" +
"<p>THIS — IS A “TEST”. 5 > 4. trademark: </p>");
Document.OutputSettings settings = doc.outputSettings();
settings.prettyPrint(false);
settings.escapeMode(Entities.EscapeMode.extended);
settings.charset("ASCII");
String modifiedFileHtmlStr = doc.html();
System.out.println(modifiedFileHtmlStr);

How do I use ColdFusion to replace text in HTML without replacing HTML tags?

I have a html source as a String variable.
And a word as another variable that will be highlighted in that html source.
I need a Regular Expression which does not highlights tags, but obly text within the tags.
For example I have a html source like
<cfset html = "<span>Text goes here, forr example it container also **span** </span>" />
<cfset wordToReplace = "span" />
<cfset html = ReReplace(html ,"[^(<#wordToReplace#\b[^>]*>)]","replaced","ALL")>
and what I want to get is
<span>Text goes here, forr example it container also **replaced** </span>
But I have an error. Any tip!

I need a Regular Expression which does
not highlights tags, but obly text
within the tags.
You wont find one. Not one that is fully reliable against all legal/wild HTML.
The simple reason is that Regular Expressions match Regular languages, and HTML is not even remotely a Regular language.
Even if you're very careful, you run the risk of replacing stuff you didn't want to, and not replacing stuff you did want to, simply due to how complicated HTML syntax can be.
The correct way to parse HTML is using a purpose-built HTML DOM parser.
Annoyingly CF doesn't have one built in, though if your HTML is XHTML, then you can use XmlParse and XmlSearch to allow you to do an xpath search for only text (not tags) that match your text... something like //*[contains(text(), 'span')] should do (more details here).
If you've not got XHTML then you'll need to look at using a HTML DOM parser for Java - Google turns up plenty, (I've not tried any yet so can't give any specific recommendations).

what you have to do is use a lookahead to make sure that your text isn't contained within a tag. granted this could probably be written better, but it will get you the results you want. it will even handle when the tag has attributes.
<cfset html = "<span class='me'>Text goes here, forr example it container also **span** </span>" />
<cfset wordToReplace = "span" />
<cfset html = ReReplace(html ,"(?!/?<)(#wordToReplace#)(?![^.*>]*>)","replaced","ALL")>

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to handle double escaping in chrome/firefox? - java

Related

Escape special characters of html string in java

How do I set a URL that includes an ampersand with Thymeleaf?

Need to scrape an url from a web page

Jsoup having problems with special HTML symbols, ‘ — etc

How do I use ColdFusion to replace text in HTML without replacing HTML tags?

Categories

Resources