I did a hello world web application in Java on Tomcat container. I have a query string
code=askdfjlskdfslsjdflksfjl#_=_
with underscores on both sides of = in the URL. When I tried to retrieve the query string in the servlet by request.getParameter("code"), I get only askdfjlskdfslsjdflksfjl. The part after # is missing.
How is this caused and how can I solve it?
That's because the part of the url after # is not a part of the query.
Section 3.4 of approprate RFC says:
The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.
The # is only interpreted by the browser, not the server. If you want to pass the # character to the server, you must URLEncode it.
Example:
URLEncoder.encode("code=askdfjlskdfslsjdflksfjl#=", "UTF-8");
Please read the percent encoding on Wikipedia. The # and = are reserved characters in URLs. Only unreserved characters can be used plain in URLs, all other characters are supposed to be URL-encoded. The URL-encoded value of a # is %23 and = is %3D. So this should do:
code=askdfjlskdfslsjdflksfjl%23_%3D_
If this actually originates from a HTML <a> link in some JSP like so:
some link
then you should actually have changed it to use JSTL's <c:url>:
<c:url var="servletUrlWithParam" value="servletUrl">
<c:param name="code" value="askdfjlskdfslsjdflksfjl#_=_" />
</c:url>
some link
so that it get generated as
some link
Note that this is not related to Java/Servlets per-se, this applies to every web application.
Related
I have an html form:
<form>
<input type="hidden" id="hiddenField"/>
...Other form fields
</form>
In this form I want to set a hidden field with xml data.
Can anyone suggest if it is fine to set the hidden field directly with xml data.
i.e. in my javascript function is it safe to directly set the hidden field with xml like: $(#hiddenFiled).val(xml); and get the xml in my java servlet?Please suggest.
No you can't keep xml without encoding
You can opt either
var stringValue=escape(xml);
var xmlValue= unescape (stringValue)
in javascript
Though these methods has been depreciated in newer versions so you could find it in another library like http://underscorejs.org/#escapeUnderScoreJs
Also don't keep XML in hidden field if it holds andy sensitive information.
Hidden form fields are not for session tracking.
We have two mechanism for session tracking, they are cookies and URL rewriting, the latest for the people that doesn't have cookies enabled in their browsers, I could only understand sending a session id in a hidden field when you have your own session tracker and are not using the one that is already with your server container (HttpSession and all), but why re-invent the wheel?
Hidden fields are for passing information between pages, sometimes I use a and I clearly don't want that information displayed to the user
Posting XML without javascript or browser plugins is impossible. You should not send it directly as a form parameter. See this answer for more info:.
Use a library that would encode them while sending to server, and decode them at the server side.
Underscore.js provides such functionality. See the documentation:
escape_.escape(string)
Escapes a string for insertion into HTML, replacing &, <, >, ", `, and ' characters.
_.escape('Curly, Larry & Moe');
=> "Curly, Larry & Moe"
unescape_.unescape(string)
The opposite of escape, replaces &, <, >, ", ` and ' with their unescaped counterparts.
_.unescape('Curly, Larry & Moe');
=> "Curly, Larry & Moe"
However, do keep in mind that usually browsers have limits over the amount of data that you can send through GET request (around 255 bytes). Hence it's always a good option to use POST instead of GET even when sending encoded XML.
I'm having difficulties getting an html page to pick up a rupee symbol (₹), store it into an SQL Server 2016 database and then retrieve it for display.
Important to note here is that I need to enter the actual symbol not the html version.
The basic flow of the page is that an administrator can add a new currency to the application via a web interface. There is a text box where
they would enter the actual rupee symbol ₹ and hit submit. This then passes the parameters via an HttpServletRequest to a java back end.
The java backend just inserts/updates this value to a SQL Server 2016 table in a field nchar(10).
When the page is refreshed it runs a select against this table and displays all the valid currencies.
The problem is that when the java application retrieves from the HttpServletRequest request object the symbol ₹ becomes â?¹. I can see this in the
debugger, I appreciate that this might be due to my debugger not being able to display this so I go forward.
The java (jdbc) updates the field. I view the field using Sql Server Management Studio and it displays â?¹ in both text and grid view.
I know that SSMS can dispay this symbol as I can insert it directly and it works. So it looks like the information is lost at the html>java request.
The web page itself is legacy and built using xslt. I have added some more details below of where I'm up to.
The website runs on tomcat 8 and the pages are built using xslt, the back end is java.
In the front end I have a text field in an EditCurrency page. I enter ₹ in the symbol field and hit submit.
The relevent fragments of the xslt page that is used to build the the front end are:
<!--header indicates page is utf8-->
<xsl:param name="csrfToken"/>
<xsl:param name="currencyFormatError"/>
...
<!-- on submission the EditCurrency java class is called. method=POST indicates it should allow UTF8 request URL's as is my understanding-->
<form id="cmanager" name="cmanager" onsubmit="return(vNewCurrency())" action="../servlet/webpay.website.admin.EditCurrency" method="POST">
<input name="csrfToken" type="hidden" value="{$csrfToken}"/>
<table width="100%" cellpadding="0" cellspacing="0" border="0">
The tomcat 8 server's server.xml set to encoding UTF-8. I understand this allows the request/response to handle UTF-8:
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8" />
Java class EditCurrency:
//Retrieves symbol from the HttpServletRequest req
//symbol returns â?¹
String symbol = (String) getParameter(PARAM_SYMBOL);
I've also tried to set the HttpServletRequest req using the following but it does nothing:
try {
req.setCharacterEncoding("UTF-8");
} catch (UnsupportedEncodingException ex) {
java.util.logging.Logger.getLogger(EditCurrency.class.getName()).log(Level.SEVERE, null, ex);
}
Sql Server:
Value â?¹ appears in the nchar(10) field.
Display html:
â?¹ is displayed when the screen is refereshed with this updated value.
So question is how do I fix this up!!??
I had considered some sort of reference table of all currencies and their display values etc but it doesn't seem correct way of doing it.
Unless every tool in the chain, including whatever you use to look at the intermediate results, is UTF-8 capable you will see garbage at some point.
The Unicode code point of the Rupee symbol is 0x20B9, which is UTF-8 encoded as three bytes 0xE2 0x82 0xB9. If you attempt to display that in a tool that uses ISO-8859-1 you see
0xE2 = â
0x82 = ? (there is no character in ISO-8859-1 for code 0x82, so you see a question mark)
0xB9 = ¹
So it appears the symbol is correct in the database, but you are displaying it incorrectly.
To "fix" this problem you must ensure that your tools are all set to UTF-8, and that the web server is configured to include the UTF-8 declaration in the HTML it sends.
My java server works as follows:
http://locahost:5555/?search="java"
The above link would work fine. However, if I ever want to use "#" as part of search string, it all goes wrong. For example:
http://locahost:5555/?search="c#"
For some reason everything after "#" gets ignored. If I use the decoded version of "#" it works fine again. For example:
http://locahost:5555/?search="c%23"
The system should be used by people that don't understand url encoding so they would never put %23 instead of #. Is there anyway around it?
Other than encoding it there is no way around it. More over the string after # treats as the location of the URL.
String after # will not be passed to the server through GET parameters. Use POST method instead.
https://developer.mozilla.org/en-US/docs/Web/API/Window.location
the user supposedly should not access the url directly so if they put "c#" in the url there would be no process on the other hand you could use
<form action="yourcontroller" method="post">
<input type="text" name="txtSearch" />
<input type="submit" value="search"/>
</form>
with this, it will take care of the special characters like "#" you mentioned.
don't forget to catch the parameter in your controller
request.getParamter("txtSearch");
It is in the browser. The server never gets a request with the hashtag (#) symbol, just up to the symbol.
A javascript workaround is probably a bad idea.
Hello I want to ask what can be the source of problem with bad encoding on the page.
This problem is very specific, because first part of page has good encoding and second part is broken.
Moreover it appears only in some scenarios, not allways.
The most weird thing is that starts to appear in the middle of one message and after this message, the rest of page has badly encoded characters.
This message is included in JSP with this part of code <fmt:message key="the.text.wchich.makes.problems"/>
Problem is not related to JSP, because bad encoding appears in the middle of message.
Gratulujeme, toto číslo si môžete zarezervovať kliknutím na tlačidlo Pokračovať.
But sometimes it outputs as
Gratulujeme, toto číslo si môžete zarezervovať kliknut�m na tlaÄidlo PokraÄovaÅ¥.
or
Gratulujeme, toto číslo si mô�¾ete zarezervovaÅ¥ kliknutÃm na tlaÄidlo PokraÄovaÅ¥.
So it is probably not the fault of badly entered text in database.
We are using Liferay 6.0, jsp, spring. Localized strings are stored in Oracle 11g database.
So, how is it possible that encoding begin to break in the middle of page?
You might need to specify encoding in your JSPs as:
<%# page contentType="text/html; charset=UTF-8" %>
You should be able to achieve the same result via CharacterEncodingFilter with forceEncoding parameter set and mapped to * path + INCLUDE dispatch.
This is just one suggestion. Try to set locale from themeDisplay object.
<fmt:setLocale value="<%=themeDisplay.getLocale() >"/>
see if it helps to fmt:message to identify proper locale of message.
Note: This expects that you should have proper locale set for user or at portal level.
The reason for this "escapes" me.
JSON escapes the forward slash, so a hash {a: "a/b/c"} is serialized as {"a":"a\/b\/c"} instead of {"a":"a/b/c"}.
Why?
JSON doesn't require you to do that, it allows you to do that. It also allows you to use "\u0061" for "A", but it's not required, like Harold L points out:
The JSON spec says you CAN escape forward slash, but you don't have to.
Harold L answered Oct 16 '09 at 21:59
Allowing \/ helps when embedding JSON in a <script> tag, which doesn't allow </ inside strings, like Seb points out:
This is because HTML does not allow a string inside a <script> tag to contain </, so in case that substring's there, you should escape every forward slash.
Seb answered Oct 16 '09 at 22:00 (#1580667)
Some of Microsoft's ASP.NET Ajax/JSON API's use this loophole to add extra information, e.g., a datetime will be sent as "\/Date(milliseconds)\/". (Yuck)
The JSON spec says you CAN escape forward slash, but you don't have to.
I asked the same question some time ago and had to answer it myself. Here's what I came up with:
It seems, my first thought [that it comes from its JavaScript
roots] was correct.
'\/' === '/' in JavaScript, and JSON is valid JavaScript. However,
why are the other ignored escapes (like \z) not allowed in JSON?
The key for this was reading
http://www.cs.tut.fi/~jkorpela/www/revsol.html, followed by
http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.2. The feature of
the slash escape allows JSON to be embedded in HTML (as SGML) and XML.
PHP escapes forward slashes by default which is probably why this appears so commonly. I suspect it's because embedding the string "</script>" inside a <script> tag is considered unsafe.
Example:
<script>
var searchData = <?= json_encode(['searchTerm' => $_GET['search'], ...]) ?>;
// Do something else with the data...
</script>
Based on this code, an attacker could append this to the page's URL:
?search=</script> <some attack code here>
Which, if PHP's protection was not in place, would produce the following HTML:
<script>
var searchData = {"searchTerm":"</script> <some attack code here>"};
...
</script>
Even though the closing script tag is inside a string, it will cause many (most?) browsers to exit the script tag and interpret the items following as valid HTML.
With PHP's protection in place, it will appear instead like this, which will NOT break out of the script tag:
<script>
var searchData = {"searchTerm":"<\/script> <some attack code here>"};
...
</script>
This functionality can be disabled by passing in the JSON_UNESCAPED_SLASHES flag but most developers will not use this since the original result is already valid JSON.
Yes, some JSON utiltiy libraries do it for various good but mostly legacy reasons. But then they should also offer something like setEscapeForwardSlashAlways method to set this behaviour OFF.
In Java, org.codehaus.jettison.json.JSONObject does offer a method called
setEscapeForwardSlashAlways(boolean escapeForwardSlashAlways)
to switch this default behaviour off.