Java check - charset, encoding of html page - like browsers do - java

How to check what really charset, encoding of some html page ?
For example, the charset of some html page is iso-8859-1, but the content of the html written with utf8
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
...
here is content with utf8
...
</html>
How to check it, Is it possible to check with java charset, encoding of html page,
like it's done in browsers ?
Thank you !

Related

Saving Chinese characters using Java HtmlEditorKit

I'm trying to save HtmlDocument(saved with UTF-8 encoding) which contains Chinese character 𠜎 using HtmlEditorKit in the following way:
try (OutputStreamWriter f = new OutputStreamWriter(fileOutputStream, "UTF-8")) {
    htmlEditorKit.write(f, htmlDocument, 0, htmlDocument.getLength());
} catch (BadLocationException e) {
    logger.error("Could not save", e);
}
In output HTML doc I'm getting two 2 bytes characters(amp#55361;amp#57102;) instead of one 4 bytes character. Java can understand which symbol is it by combining both of them, but HTML can't.
Any suggestion on how to save it, so HTML page could be correctly displayed?
Here is output html:
<html>
<head>
<meta content="text/html" charset="utf-8">
</head>
<body>
<p>𠜎</p>
</body>
</html>

character encoding of the plain text document was not declared

I am getting "The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature" on hitting my REST endpoint
my jsp header
<%# page contentType="text/html;charset=UTF-8" language="java" %>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"
</head>
my endpoint
#Path("/create")
#POST
#Consumes(MediaType.APPLICATION_FORM_URLENCODED)
As mentioned by #lucumt you have a syntax error in your code. the meta tag is not ending correctly in /> - also I corrected the lower case of utf-8 to uppercase:
<%# page contentType="text/html; charset=UTF-8" language="java" %>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
</head>

Chinese character gets scrambled when going from JSP to server in Java

I have already set
<%#page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8" %>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This in my JSP. But, after doing as
xmlHttp.setRequestHeader("SEARCH_TEXT", srctxt);
or
passing as a parameter in the AJAX url,
I am still getting Chinese words as scrambled letters or '????' marks.
Required some insight regarding this. Please help.
#Mena, After your comment, I checked the 'encodeURIComponent' and as I encoded the Chinese string and decoded it my server side code, it got resolved. Thanx. Pasting code for reference,
Client Side code,
xmlHttp.setRequestHeader("SEARCH_TEXT", encodeURIComponent(srctxt));
Server Side Code,
CommonUtils.decodedStringValue(request.getHeader("SEARCH_TEXT"));
Hope this helps.

JSP data to be downloaded to Excel sheet using ActiveQuery results in character problems

downloading data using Active query from JSP page with some parameters is leading to character problems. Special characters in the german language as for example, ö, ä, ß are printed as ö, ä and ß.
Debugging the JSP page in Java shows that the result that is returned by the JSP page is correct. So the problem seems to be due to conversion within excel after download, most probably due to a unsopported charset.
I tried to convert the result string in JSP to different charsets, but the problem still persists.
Does anyone know a solution?
Thank You very much in advance!
Did you try setting the encoding of the page?
<%# page contentType="text/html; charset=UTF-8" pageEncoding="UTF8" %>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
...
If you can't find a solution on the Microsoft side, I'd recommend this alternative here:
http://poi.apache.org/

Issue in MetaKeyword,MetaDescription information in JSP using Java

<meta name="description" content="${metaDescription}" />
In case the user is in the french culture, When I view the page source
<meta name="description" content="Trouvez des pneus fiables et s�curitaires pour votre auto, VUS ou camionnette. Canadian Tire offre un grand choix de pneus d'hiver, toute saison et performants"/>
In place of ?, It should be é
I tried to put equivalent UTF-8 code for é. I got the same UTF-8 code in view page source.
Does anyone know what I've done wrong?
This normally indicates that you are looking at a UTF-8 encoded document using ASCII decoding. You might be missing he correct content type definition in your html file, try adding
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
to the <head> in the HTML document.
Hope that helps.
You need to set the JSP page encoding to the desired charset. Add the following to the top:
<%#page pageEncoding="UTF-8" %>
This will do two things:
It tells the server that it should treat the characters in JSP as UTF-8 by response.setCharacterEncoding("UTF-8").
It tells the browser that it should interpret the characters from the server as UTF-8 by response.setContentType("text/html;charset=UTF-8").
See also:
Unicode - How to get the characters right?

Categories