Can't show certain UTF-8 characters in android webview

Can't show certain UTF-8 characters in android webview - java

I know this problem has been documented elsewhere but the solutions don't seem to work for me. Other similar questions:
Android WebView with garbled UTF-8 characters.
Android WebView UTF-8 not showing
I'm essentially trying to show the minus/plus character (∓) in an android webview. I tested several other characters 'around' the minus plus character in the UTF-8 table but some of them didn't work either
Here is the java im using:
final WebView w = (WebView) findViewById(R.id.webview1);
w.getSettings().setJavaScriptEnabled(true);
w.getSettings().setDefaultTextEncodingName("utf-8");
InputStream is;
try {
is = getAssets().open("test5.html");
int size = is.available();
byte[] buffer = new byte[size];
is.read(buffer);
is.close();
String str = new String(buffer);
w.loadData(str, "text/html; charset=utf-8", "utf-8");
Here is the html test5.html
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
char: ∓ <br/>
char: ∔ <br/>
char: ∕ <br/>
char: ∖ <br/>
char: ∗ <br/>
char: ∘ <br/>
</body>
The only characters that show up are the "∕" and "∗". I've also tried
w.loadDataWithBaseURL("file:///doesnotmatter", str, "text/html", "utf-8", "");
with no success. I'm not too familiar with the input stream thing so I don't know if there's something wrong there. Please help, its taken me awhile =\
-Teneth

When you call loadData(), pass the encoding as uppercase "UTF-8", because it is case sensitive and that's the standard representation according to the documentation.

use html codes for that characters in your html file:
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
char: ∓<br/>
char: ∔<br/>
char: ∕<br/>
char: ∖<br/>
char: ∗<br/>
char: ∘<br/>
</body>

Related

Decode AES encryption in web application in IBM Liberty server

Placed the AES encryption password in liberty server jvm.options file as below
-DencKey={aes}{aes}ANRib/ITz7RTc2YB+VXWZqINrjZ15vSBeg== ........
while retrieving in java application by using System.getProperty("encKey").
getting the exact value not the decrypted one.
Should we do decrypt manually or through configurations we can achieve the decrypted value ?

First, you need to enable the passwordUtilities-1.0 feature:
<featureManager>
<feature>passwordUtilities-1.0</feature>
</featureManager>
Then, you can use the com.ibm.websphere.crypto.PasswordUtil API to decode the password:
String encodedPassword = System.getProperty("encKey");
String decodedPassword = PasswordUtil.decode(encodedPassword);

Your encoded string is malformed
-DencKey={aes}{aes}ANRib/ITz7RTc2YB+VXWZqINrjZ15vSBeg==...
it should only have a single encryption preamble like
-DencKey={aes}ANRib/ITz7RTc2YB+VXWZqINrjZ15vSBeg==...

for ppl who find this on their search to decode the PW:
here an example to decode with passwordUtilities feature
https://github.com/TiloGit/liberty-pw-util/tree/master/dyn-pw
https://gitlab.com/pppoudel/public_shared/-/tree/master/WLibertyPwdUtil
blog post of it:
https://purnapoudel.blogspot.com/2017/10/how-to-use-wlp-passwordutilities.html
important to have the enc key set.
 <variable name="wlp.password.encryption.key" value="myEncKey123">
code example:
<!DOCTYPE HTML>
<%#page language="java"
contentType="text/html; charset=ISO-8859-1" pageEncoding="ISO-8859-1"%>
<html>
<head>
<title>index</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>
<body>
<%! String password="t3mp_pwD"; %>
<h2> This page shows how to use com.ibm.websphere.crypto.PasswordUtil to encrypt/decrypt password using xor or aes.</h2>
<hr/>
<h3> Provided plain text password is:<%= password %></h3>
Note: encryption key "replaceM3" is being used for "aes" encryption/decryption. <br/>
<br/>
<%! String xorEncodedVal=com.ibm.websphere.crypto.PasswordUtil.passwordEncode(password, "xor");
String aesEncodedVal=com.ibm.websphere.crypto.PasswordUtil.passwordEncode(password, "aes"); %>
<h3> xor encoded value is: <%= xorEncodedVal %> </h3>
<h3> aes encrypted value is: <%= aesEncodedVal %> </h3>
<h3> xor decoded value is: <% out.println(com.ibm.websphere.crypto.PasswordUtil.passwordDecode(xorEncodedVal)); %> </h3>
<h3> aes decrypted value is: <% out.println(com.ibm.websphere.crypto.PasswordUtil.passwordDecode(aesEncodedVal)); %> </h3>
</body>
</html>

Saving Chinese characters using Java HtmlEditorKit

I'm trying to save HtmlDocument(saved with UTF-8 encoding) which contains Chinese character 𠜎 using HtmlEditorKit in the following way:
try (OutputStreamWriter f = new OutputStreamWriter(fileOutputStream, "UTF-8")) {
    htmlEditorKit.write(f, htmlDocument, 0, htmlDocument.getLength());
} catch (BadLocationException e) {
    logger.error("Could not save", e);
}
In output HTML doc I'm getting two 2 bytes characters(amp#55361;amp#57102;) instead of one 4 bytes character. Java can understand which symbol is it by combining both of them, but HTML can't.
Any suggestion on how to save it, so HTML page could be correctly displayed?
Here is output html:
<html>
<head>
<meta content="text/html" charset="utf-8">
</head>
<body>
<p>𠜎</p>
</body>
</html>

Matching multiline text using regular expression in java

my input sample is:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 15">
<meta name=Originator content="Microsoft Word 15">
<link rel=File-List href="detailedFoot_files/filelist.xml">
What i want to do is i want to select the whole html tag and replace it with something. So i am using the regular expression
<html.*>
If i use this regular expression in a Mather.DOTALL manner, the whole text input is replaced.
I cant figure out how to do it. Any kind of help will be appreciated.

This regex seems to capture what you're looking for.
pattern = "\<html[^>]*>?(.*)"
Sample Here

If you want to replace only the starting html tag the following will replace it:
String replaced = Pattern.compile("<html[^>]+>", Pattern.DOTALL)
.matcher(input).replace("my replacement for html tag");

Servlet doesn't parse uploaded file as UTF-8

I'm having troubles with uploading and parsing a file as UTF-8 strings. I use the following code:
protected void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
Part filePart = request.getPart("file");
InputStream filecontent = filePart.getInputStream();
// ...
}
And my webpage looks like this:
<%# page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<form action="UploadServlet" method="post" enctype="multipart/form-data">
<input type="file" name="file" />
<input type="submit" />
</form>
</body>
</html>
I found a great post about UTF-8 encoding in java webapps, but unfortunately it didn't work for me. I still have random symbols in strings in NetBeans debugger, and when I display them on a webpage, although most of them get displayed correctly, some cyrillic letters (я, с, Н, А) get replaced by '�?'

The file upload with a HTML form doesn't use any character encoding. The file is transferred byte by byte as is. See here under "multipart/form-data".
So if the original file at client side is a text file with UTF-8 character encoding, then on the server side it is also UTF-8.
Then you can use an InputStreamReader to decode the bytes as UTF-8 text:
InputStreamReader reader = new InputStreamReader(filecontent, "UTF-8");
That's it.

javax.servlet.http.Part, what you use in the very first line of your code, has a method on it getContentType() which will tell you what the content type of the uploaded form data is. Nothing you have written to date would constrain the uploaded form data to any particular character set; ergo you need to determine the character set and deal with it accordingly.

Java check - charset, encoding of html page - like browsers do

How to check what really charset, encoding of some html page ?
For example, the charset of some html page is iso-8859-1, but the content of the html written with utf8
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
...
here is content with utf8
...
</html>
How to check it, Is it possible to check with java charset, encoding of html page,
like it's done in browsers ?
Thank you !

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Can't show certain UTF-8 characters in android webview - java

When you call loadData(), pass the encoding as uppercase "UTF-8", because it is case sensitive and that's the standard representation according to the documentation.

use html codes for that characters in your html file: <html> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> </head> <body> char: ∓<br/> char: ∔<br/> char: ∕<br/> char: ∖<br/> char: ∗<br/> char: ∘<br/> </body>

Related

Decode AES encryption in web application in IBM Liberty server

Saving Chinese characters using Java HtmlEditorKit

Matching multiline text using regular expression in java

Servlet doesn't parse uploaded file as UTF-8

Java check - charset, encoding of html page - like browsers do

Categories

Resources