construct URL query string: character set encode/decode - java

I'm trying to construct an URL with query string containing other characters (hebrew in my case).
However, when my webapp receives the request, the extracted request parameters are gibrish...
How can I resolve this?
new URL("http://localhost:8080/SRV/page.jsp?param=" + URLEncoder.encode("heb text", "UTF-8")).openConnection();
target page:
<%
System.out.println("Receive: " + request.getParameter("param"));
%>
I'm using tomcat6, jdk6, windows7 x64
edit: this is my page declaration:
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

Take a look at the $TOMCAT_HOME/conf/server.xml file and check the encoding setting:
<Connector port="8080" ... URIEncoding="UTF-8" />
It appears that Tomcat needs this setting for UTF-8 to work for HTTP request values, such as request parameters.

Related

Spring mvc checkbox selected value not readable as a string (values are in korean language)

I'm working with Spring MVC develping REST API. I have checkboxes that are created automatically. The value of a check box is displayed on the web page in Korean, when I select one checkbox, it returns an unreadable string. I have added <%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> tag in my jsp page and annotation my function in controlar like this RequestMapping(value = "/filterSubmit", method = RequestMethod.POST, produces = "application/json;charset=UTF-8" ) also I have added URIEncoding="UTF-8" to tomcat server.xml <Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8081" protocol="HTTP/1.1" redirectPort="8443"/> still, I cannot read the selected value?
As far as i know, CJK (Chinese/Korean/Japanese) doesn't use UTF-8 encoding. So maybe your problem is encoding. Try some other pageEncoding
References: https://en.m.wikipedia.org/wiki/Korean_language_and_computers

How to solve UTF-8 in java

I currently use
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
in my jsp page.
And when I get data from textbox using request.getParameter("..."); it retrieves data like that öÉ?É?É?öİ . I saw this problem when I used characters that are not english chars. I add URIEncoding="UTF-8" to server.xml in tomcat. But it retrieved the same (öÉ?É?É?öİ). How to solve it?
Thank you
EDIT
Thanks for your answers. I tried a few things, but nothing has fixed the problem.
Here's what I've done:
I added <Connector URIEncoding="UTF-8" .../> in server.xml.
<meta ... charset=utf-8> tag is ok and I tried request.setCharacterEncoding("UTF-8");
I also tried <filter> tag in web.xml
None of these actions fixes the problem. I'm wondering if there's something else wrong with this...(remembering: I used <form method='post'>. I click submit button and when I get data using request.getParameter("..") the format of this data is not the correct format. )
You can try this code in your Servlet
if(request.getCharacterEncoding() == null) {
request.setCharacterEncoding("UTF-8");
}
May be because the actual character encoding is not UTF-8 ? If the characters itself are encoded in some other format then we just can't label them as UTF-8.
Try decoding them by giving various charset and see which one gives proper result. I think the input character encoding is latin1(ISO-8859-1). If yes, follow below code
String param1 = request.getParameter("...");
if(param1!=null)
{
param1 = new String(param1.getBytes("ISO-8859-1"));
}
UTF 8 should be set at all the layers of the application.
Do following
1) HTML Code
<meta contentType="text/html; charset="UTF-8"/>
2) Browser Setting for IE
View -- Encoding -- Unicode (UTF-8)
3) Tomcat Server
server.xml - In Connector tag added "URIEncoding" attribute as
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8"/>
catalina.sh/catalina.bat - added following
set JAVA_OPTS=--Xms256m -Xmx1024m -Xss268k -server -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit -Djava.awt.headless=true -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8
set CATALINA_OPTS=-Dfile.encoding="UTF-8"
4) MIME type of response should be "application/x-www-form-urlencoded"
There is another place you can check. Did you include following declaration in your JSP file?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I think the problem is that browser still sends requests using default ISO-8859-1, which is the standard charset if not declared.
You can also check the HTTP headers received from server to make sure the correct charset is sent back.
Essentially the cleanest way to do it is to use Unicode in your property files and/or code if need be (not adviced).
This way you avoid all encoding issues, since your programm only has deal with ASCII code, the proper reprenstation is then handeled entierly by the client side and you do not have to worry about the standard os encoding or enviorment encoding.
You can also try adding the following filter at web.xml:
<filter>
<filter-name>Character Encoding Filter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
Hope this help
You should try it
String content= request.getParameter("content");
if(content!=null)
content = new String(content.getBytes("ISO-8859-1"));

Character encoding issue with Tomcat

There is strange character encoding going on. I am using JSP (JSTL) and Struts with Tomat 6.
I have my JSP page encoding as such:
<%# page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
The issue is when I try to pass the url using encodeURI as such:
<script type="text/javascript">
$('#mailer_filter').change(function(){
var val = $(this).val();
console.log(val);
console.log(escape(val));
console.log(encodeURI(val));
location.href = 'mailList.a?' + encodeURI($(this).val());
});
</script>
the parameter on the action (java end) comes out as:
Gaz Métro
however on the front end it is displayed as:
Gaz Métro
which is the correct way. What I can do about this??
Do following
1) HTML Code
<meta contentType="text/html; charset="UTF-8"/>
2) Browser Setting for IE
View -- Encoding -- Unicode (UTF-8)
3) Tomcat Server
server.xml - In Connector tag added "URIEncoding" attribute as
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8"/>
catalina.sh/catalina.bat - added following
set JAVA_OPTS=--Xms256m -Xmx1024m -Xss268k -server -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit -Djava.awt.headless=true -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8
set CATALINA_OPTS=-Dfile.encoding="UTF-8"
4) MIME type of response should be "application/x-www-form-urlencoded"
Have you followed these steps?
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
Copied below:
Using UTF-8 as your character encoding for everything is a safe bet. This should work for pretty much every situation.
In order to completely switch to using UTF-8, you need to make the following changes:
Set URIEncoding="UTF-8" on your in server.xml. References: HTTP Connector, AJP Connector.
Use a character encoding filter with the default encoding set to UTF-8
Change all your JSPs to include charset name in their contentType.
For example, use <%#page contentType="text/html; charset=UTF-8" %> for the usual JSP pages and <jsp:directive.page contentType="text/html; charset=UTF-8" /> for the pages in XML syntax (aka JSP Documents).
Change all your servlets to set the content type for responses and to include charset name in the content type to be UTF-8.
Use response.setContentType("text/html; charset=UTF-8") or response.setCharacterEncoding("UTF-8").
Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses that they generate.
Disable any valves or filters that may read request parameters before your character encoding filter or jsp page has a chance to set the encoding to UTF-8. For more information see http://www.mail-archive.com/users#tomcat.apache.org/msg21117.html.
Try setting the URIEncoding parameter of your tomcat connector (in the server.xml) to UTF-8:
E.g.:
<Connector port="8080" maxHttpHeaderSize="8192"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true"
URIEncoding="UTF-8"/>

Dynamically include HTML in JSP

I want to include html pages dynamically in a JSP page. I'm fetching the html url from HTML forder and using struts2 to pass the value to JSP page but I'm unable to do this on JSP using either jsp:include or #include tags.
For Example,
I have variable html Url like /somepath/variablehtmlname.html in my struts action property. I want to use this path to include the actual html files located at /somepath location.
<%# include ... %> is evaluated when your JSP pages are compiled and have no access to request variables (like Struts 2 action properties.) Use <c:import /> or <s:include /> instead, which include content on a per-request basis. <jsp:include /> should also work, but (as #BalusC requested) without the code, we can't tell why it doesn't.
Reusing Content in JSP Pages
I agree with the first answer (BobG). You can also simply have the JSP page directly serve up an http forwardTo using the refresh tag, where the servlet writes the new url location to a session variable : <meta http-equiv="refresh" content="0; URL=<%=htmlSessionLink>" />**

Why does POST not honor charset, but an AJAX request does? tomcat 6

I have a tomcat based application that needs to submit a form capable of handling utf-8 characters. When submitted via ajax, the data is returned correctly from getParameter() in utf-8. When submitting via form post, the data is returned from getParameter() in iso-8859-1.
I used fiddler, and have determined the only difference in the requests, is that charset=utf-8 is appended to the end of the Content-Type header in the ajax call (as expected, since I send the content type explicitly).
ContentType from ajax:
"application/x-www-form-urlencoded; charset=utf-8"
ContentType from form:
"application/x-www-form-urlencoded"
I have the following settings:
ajax post (outputs chars correctly):
$.ajax( {
type : "POST",
url : "blah",
async : false,
contentType: "application/x-www-form-urlencoded; charset=utf-8",
data : data,
success : function(data) {
}
});
form post (outputs chars in iso)
<form id="leadform" enctype="application/x-www-form-urlencoded; charset=utf-8" method="post" accept-charset="utf-8" action="{//app/path}">
xml declaration:
<?xml version="1.0" encoding="utf-8"?>
Doctype:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
jvm parameters:
-Dfile.encoding=UTF-8
I have also tried using request.setCharacterEncoding("UTF-8"); but it seems as if tomcat simply ignores it. I am not using the RequestDumper valve.
From what I've read, POST data encoding is mostly dependent on the page encoding where the form is. As far as I can tell, my page is correctly encoded in utf-8.
The sample JSP from this page works correctly. It simply uses setCharacterEncoding("UTF-8"); and echos the data you post. http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
So to summarize, the post request does not send the charset as being utf-8, despite the page being in utf-8, the form parameters specifying utf-8, the xml declaration or anything else. I have spent the better part of three days on this and am running out of ideas. Can anyone help me?
form post (outputs chars in iso)
<form id="leadform" enctype="application/x-www-form-urlencoded; charset=utf-8" method="post" accept-charset="utf-8" action="{//app/path}">
You don't need to specify the charset there. The browser will use the charset which is specified in HTTP
response header.
Just
<form id="leadform" method="post" action="{//app/path}">
is enough.
xml declaration:
<?xml version="1.0" encoding="utf-8"?>
Irrelevant. It's only relevant for XML parsers. Webbrowsers doesn't parse text/html as XML. This is only relevant for the server side (if you're using a XML based view technology like Facelets or JSPX, on plain JSP this is superfluous).
Doctype:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Irrelevant. It's only relevant for HTML parsers. Besides, it doesn't specify any charset. Instead, the one in the HTTP response header will be used. If you aren't using a XML based view technology like Facelets or JSPX, this can be as good <!DOCTYPE html>.
meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
Irrelevant. It's only relevant when the HTML page is been viewed from local disk or is to be parsed locally. Instead, the one in the HTTP response header will be used.
jvm parameters:
-Dfile.encoding=UTF-8
Irrelevant. It's only relevant to Sun/Oracle(!) JVM to parse the source files.
I have also tried using request.setCharacterEncoding("UTF-8"); but it seems as if tomcat simply ignores it. I am not using the RequestDumper valve.
This will only work when the request body is not been parsed yet (i.e. you haven't called getParameter() and so on beforehand). You need to call this as early as possible. A Filter is a perfect place for this. Otherwise it will be ignored.
From what I've read, POST data encoding is mostly dependent on the page encoding where the form is. As far as I can tell, my page is correctly encoded in utf-8.
It's dependent on the HTTP response header.
All you need to do are the following three things:
Add the following to top of your JSP:
<%#page pageEncoding="UTF-8" %>
This will set the response encoding to UTF-8 and set the response header to UTF-8.
Create a Filter which does the following in doFilter() method:
if (request.getCharacterEncoding() == null) {
request.setCharacterEncoding("UTF-8");
}
chain.doFilter(request, response);
This will make that the POST request body will be processed as UTF-8.
Change the <Connector> entry in Tomcat/conf/server.xml as follows:
<Connector (...) URIEncoding="UTF-8" />
This will make that the GET query strings will be processed as UTF-8.
See also:
Unicode - How to get characters right? - contains practical background information and detailed solutions for Java EE web developers.
Try this :
How do I change how POST parameters are interpreted?
POST requests should specify the encoding of the parameters and values they send. Since many clients fail to set an explicit encoding, the default is used (ISO-8859-1). In many cases this is not the preferred interpretation so one can employ a javax.servlet.Filter to set request encodings. Writing such a filter is trivial. Furthermore Tomcat already comes with such an example filter.
Please take a look at:
5.x
webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
6.x
webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
For more info , refer to the below URL
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
Have you tried accept-charset="UTF-8"? As you said, the data should be encoded according to the encoding of the page itself; it seems strange that tomcat is ignoring that. What browser are you trying this out on?
Have you tried to specify useBodyEncodingForURL="true" in your conf/server.xml for HTTP connector?
I implemented a filter based on the information in this post and it is now working. However, this still doesn't explain why even though the page was UTF-8, the charset used by tomcat to interpret it was ISO-9951-1.

Categories