Getting junk value while sending hindi characher in request string

Getting junk value while sending hindi characher in request string - java

i am sending दर्शन as a request parameter value from javascript and while i am fetching in my struts action using requestMap.get(key)[0] then it gives me à¤¦à¤°à¥à¤¶à¤¨ junk character?
here my action implements ServletResponseAware and i declare requestMap variable so it directly map request parameter value to requestMap object

The default encoding for HTTP protocol is not utf8. You need to handle it separately on your server side code.
Please refer the following link for more.
Common problems with i18n and servlets/jsp-s
You need to add the following in your jsp
<%#page contentType="text/html" pageEncoding="UTF-8"%>
Displaying Hindi font in jsp

Related

java how to decode get url parameter received throw BeanParam

I receive a GET response to this web service
#GET
#Path("/nnnnnn")
public Response pfpfpfpf(#BeanParam NNNNNN n)
The class NNNNN has:
#QueryParam("parameter")
private String parameter;
And for that parameter there is a get and set.
I send a request on a get with a query parameter and it is being bind automatically to my option NNNNN, everything is great.
but, now i am sending Japanese strings in the query url. I encode the paramter by UTF-8 before sending, and I have to decode them using UTF-8.
but my question is where should I call the URLDecoder? i tried to call it in the getter of that parameter, but it didn't work, i kept having something like C3%98%C2%B4%C3%98%C2 instead of the Japanese characters

The solution that works for me is :
on the servlet, i should do this:
request.setCharacterEncoding("UTF-8");
and then on the html page i had to add this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

This is a good question which has potential clear many doubts about how information is processed (encoded and decoded) between systems.
Before I proceed I must say have a fair understanding on Charset, Encoding etc. You may want to read this answer for a quick heads up.
This has to looked from 2 perspectives - browser and server.
Browser perspective of Encoding
Each browser will render the information/text, now to render the information/text it has to know how to interpret those bits/bytes so that it can render correctly (read my answer's 3rd bullet that how same bits can represent different characters in different encoding scheme).
Browser page encoding
Each browser will have a default encoding associated with it. Check this on how to see the default encoding of browser.
If you do not specify any encoding on your HTML page then default encoding of browser will take effect and will render the page as per those encoding rules. so, if default encoding is ASCII and you are using Japanese or Chinese or characters from Unicode supplementary plane then you will see garbage value.
You can tell browser that do not use your default encoding scheme but use this one to render by website, using <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">.
And this exactly what you did/found and you were fine because this meta tag essentially overrode the default encoding of browser.
Another way to achieve same effect is do not use this meta tag but just change the browser's default encoding and still you will be fine. But this is not recommended and using Content-Type meta tag in your JSP is recommended.
Try playing around with browser default encoding and meta tag using below simple HTML.
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
の, は, でした <br></br>
昨夜, 最高
</body>
</html>
Server perspective of Encoding
Server should also know how to interpret the incoming stream of data, which basically means that which encoding scheme to use (server part is tricky because there are several possibilities). Read below from here
When data that has been entered into HTML forms is submitted, the form
field names and values are encoded and sent to the server in an HTTP
request message using method GET or POST, or, historically, via email.
The encoding used by default is based on a very early version of the
general URI percent-encoding rules, with a number of modifications
such as newline normalization and replacing spaces with "+" instead of
"%20". The MIME type of data encoded this way is
application/x-www-form-urlencoded, and it is currently defined (still
in a very outdated manner) in the HTML and XForms specifications. In
addition, the CGI specification contains rules for how web servers
decode data of this type and make it available to applications.
This again has 2 parts that how server should decode the incoming request stream and how it should encode the outgoing response stream.
There are several ways to do this depending upon the use case, for example:
There are methods like setCharacterEncoding, setContentType etc. in HTTP request and response object, which can be used to set the encoding.
This is exactly what you have done in your case that you have told the server that use UTF-8 encoding scheme for decoding the request data because I am expecting advanced Unicode supplementary plane characters. But this is not all, please do read more below.
Set the encoding at server or JVM level, using JVM attributes like -Dfile.encoding=utf8. Read this article on how to set the server encoding.
In your case you were fetching the Japanese characters from query string of the URL and query string is part of HTTP request object, so using request.setCharacterEncoding("UTF-8"); you were able to get the desired encoding result.
But same will not work for URL encoding, which is different from request encoding (your case). Consider below example, in both sysout you will not be able to see the desired encoding effect even after using request.setCharacterEncoding("UTF-8"); because here you want URL encoding since the URL will be something like http://localhost:7001/springapp/forms/executorTest/encodingTest/hellothere 昨夜, 最高 and in this URL there is no query string.
#RequestMapping(value="/encodingTest/{quertStringValue}", method=RequestMethod.GET)
public ModelAndView encodingTest(#PathVariable("quertStringValue") String quertStringValue, ModelMap model, HttpServletRequest request) throws UnsupportedEncodingException {
System.out.println("############### quertStringValue " + quertStringValue);
request.setCharacterEncoding("UTF-8");
System.out.println("############### quertStringValue " + quertStringValue);
return new ModelAndView("ThreadInfo", "ThreadInfo", "####### This is my encoded output " + quertStringValue);
}
Depending upon the framework you are using you may need additional configuration to specify a character encoding for requests or URLs so that you can either apply own encoding if the request does not already specify an encoding, or enforce the encoding in any case. This is useful because current browsers typically do not set a character encoding even if specified in the HTML page or form.
In Spring, there is org.springframework.web.filter.CharacterEncodingFilter for configuring request encoding. Read this similar interesting question which is based on this fact.
In nut shell
Every computer program whether an application server, web server, browser, IDE etc. understands only bits, so it need to know how to interpret the bits to make expected sense out of it because depending upon encoding used, same bits can represent different characters. And that's where "Encoding" comes into picture by giving a unique identifier to represent a character so that all computer programs, diverse OS etc. knows exact right way to interpret it.

How to i get original page parameters from jsp errorpage?

I have an jsp/servlet webapp on tompcat and i need something like a crash report each time an unexpected error accords
I have an error page defined and added with errorpage directive
<%#page errorPage="./erropage.jsp" %>
to my edit.jsp file(just an example).
The request to edit.jsp is made with post request (actually is an ajax request but this is not so important).
I need a solution to read original parameters (sent to edit.jsp page) from errorpage in order to buid a crash report.
request.getAttribute("javax.servlet.error.request_uri")
Doed not help me since this will include the actual url (get parameters).
Also, to build up a string from requested parameters in edit.jsp and set that string to session is not an option since there are to many files in witch i need to implement this.

Actually it seams like
request.getParamter()
in errorpage.jsp give the parameter from edit.jsp request!

I have a similar issue with cookies and I post a question about it.
Now, I believe ( I have some checks to do) that the error mechanism is using REDIRECT scheme (not FORWARD) so a new request object is created.. You can verify this assumption

Servlet response.sendRedirect encoding problems

So I'm redirecting my user using GET in a naive way:
response.sendRedirect("/path/index.jsp?type="+ e.getType()
+"&message="+ e.getMessage());
And this was working fine until I had to send messages, as actual text to be shown to users. The problem is if the message has non-ASCII characters in it. My .jsp files are encoded in UTF-8:
<%# page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
So all non-ASCII characters in 'message' gets garbled. I don't want to set my JVM default encoding to UTF-8, so how do I solve this? I tried to use
response.setCharacterEncoding("UTF-8");
on the Servlet before redirecting, but it doesn't work. when I try to execute:
out.print(request.getCharacterEncoding());
on my .jsp file it prints 'null'.

The sendRedirect() method doesn't encode the query string for you. You've to do it yourself.
response.sendRedirect("/path/index.jsp?type=" + URLEncoder.encode(e.getType(), "UTF-8")
+ "&message=" + URLEncoder.encode(e.getMessage(), "UTF-8"));
You might want to refactor the boilerplate to an utility method taking a Map or so.
Note that I assume that the server is configured to decode the GET request URI using UTF-8 as well. You didn't tell which one you're using, but in case of for example Tomcat it's a matter of adding URIEncoding="UTF-8" attribute to the <Context> element.
See also:
Unicode - How to get the characters right?
Unrelated to the concrete problem, the language="java" is the default already, just omit it. The contentType="text/html; charset=UTF-8" is also the default already when using JSP with pageEncoding="UTF-8", just omit it. All you really need is <%# page pageEncoding="UTF-8"%>. Note that this does effectively the same as response.setCharacterEncoding("UTF-8"), so that explains why it didn't have effect. The request.getCharacterEncoding() only concerns the POST request body, not the GET request URI, so it is irrelevant in case of GET requests.

Thanks you ... When i am using the response.sendRedirect("/path/index.jsp?type=" + URLEncoder.encode(e.getType(), "UTF-8"), My problem got fixed...
When we are using the response.sendRedirect(): We should encode the URL by the URLEncoder.encode() function then only.. it will be encoded correctly..
Thanks again...

How to handle non-ASCII Characters in Java while using PDPageContentStream/PDDocument

I am using PDFBox to create PDF from my web application. The web application is built in Java and uses JSF. It takes the content from a web based form and puts the contents into a PDF document.
Example: A user fill up an inputTextArea (JSF tag) in the form and that is converted to a PDF. I am unable to handle non-ASCII Characters.
How should I handle the non-ASCII characters or atleast strip them out before putting it on the PDF. Please help me with any suggestions or point me any resources. Thanks!

Since you're using JSF on JSP instead of Facelets (which is implicitly already using UTF-8), do the following steps to avoid the platform default charset being used (which is often ISO-8859-1, which is the wrong choice for handling of the majority of "non-ASCII" characters):
Add the following line to top of all JSPs:
<%# page pageEncoding="UTF-8" %>
This sets the response encoding to UTF-8 and sets the charset of the HTTP response content type header to UTF-8. The last will instruct the client (webbrowser) to display and submit the page with the form using UTF-8.
Create a Filter which does the following in doFilter() method:
request.setCharacterEncoding("UTF-8");
Map this on the FacesServlet like follows:
<filter-mapping>
<filter-name>nameOfYourCharacterEncodingFilter</filter-name>
<servlet-name>nameOfYourFacesServlet</servlet-name>
</filter-mapping>
This sets the request encoding of all JSF POST requests to UTF-8.
This should fix the Unicode problem in the JSF side. I have never used PDFBox, but since it's under the covers using iText which in turn should already be supporting Unicode/UTF-8, I think that part is fine. Let me know if it still doesn't after doing the above fixes.
See also:
Unicode - How to get the characters right?

Arabic characters appears like ??? after adding a Filter to JSP page

When I add a Filter to a particular JSP file, the Arabic characters in the output appears like ???, even when the page encoding is been set to UTF-8 by <% #page pageEncoding="UTF-8"%> and <% response.setCharacterEncoding("UTF-8");%>.
The strange thing is, before I added the Filter, the output of all Arabic pages appears with correct encoding. Can someone tell how this problem is caused and how I can solve it?

The filter is either directly or indirectly commiting the response and/or accessing the Writer or OutputStream of the HttpServletResponse which causes that the encoding cannot be changed anymore in the JSP. Fix the code in the filter accordingly. The filter should in any way not be writing anything to the response body. There the JSP (for HTML) or Servlet (for other content) is for.
By the way, you don't need to call <% response.setCharacterEncoding("UTF-8");%>. The <%#page pageEncoding="UTF-8"%> already implicitly does that.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.