Converting byte object into string supports ulmaut characters - java

Data is coming from IBM mainframe interface. Need to convert object message into String for further processing.
public EventBody processICMessage(final Object incomingMsg) throws Exception
{
String inMsg = "";
if (incomingMsg instanceof String)
{
inMsg = (String) incomingMsg;
}
else
{
byte[] incomingMsgArr = (byte[]) incomingMsg;
inMsg = new String((byte[]) incomingMsg, "UTF-8");
}
}
Firstly the encoding was Cp1047 which was unable to handle umlaut characters. To mention data consists of German umlaut characters like ä, c̈, p̈ etc. In order to support umlaut chars we changed encoding to UTF-8 which causing blank data in 'imMsg' String. So ArrayIndexOutOfBoundException is coming for further substring operation.

Related

How to encode Japanese characters javamail

So basically I'm trying to send an email with Japanese characters, something like "𥹖𥹖𥹖" and then I got "???" what should I do to encode this? I have looked over a bunch of solutions but none of them have helped me solve this.
here's the method I've been trying to do the encode:
public String encoding(String str) throws UnsupportedEncodingException{
String Encoding = "Shift_JIS";
return this.changeCharset(str, Encoding);
}
public String changeCharset(String str, String newCharset) throws UnsupportedEncodingException {
if (str != null) {
byte[] jis = str.getBytes("Shift_JIS");
return new String(bs, newCharset);
}
return null;
}
You're making this too complicated...
First, make sure you have the Japanese text in a proper Java String object, using proper Unicode characters.
Then, set the content of the body part using this method:
htmlPart.setText(japaneseString, "Shift_JIS", "html");

How to convert a string to another so that specific characters aren't allowed in the output?

I have a constraint: I cannot save some chars (like & and =) in a some special storage.
The problem is that I have strings (user input) that contain these not allowed special chars, which I'd like to save to that storage .
I'd like to convert such string to another string that wouldn't contain these special characters.
I'd like to still be able to convert back to the original string without creating ambiguity.
Any idea how to implement the de/convert? Thanks.
Convert the user input to Hex and save. And convert the hex value back to string. Use these methods.
public static String stringToHex(String arg) {
return String.format("%x", new BigInteger(1, arg.getBytes(Charset.forName("UTF-8"))));
}
public static String hexToString(String arg) {
byte[] bytes = DatatypeConverter.parseHexBinary(arg);
return new String(bytes, Charset.forName("UTF-8"));
}
Usage:
String h = stringToHex("Perera & Sons");
System.out.println(h);
System.out.println(hexToString(h));
OUTPUT
506572657261202620536f6e73
Perera & Sons
Already pointed out in the comments but URL Encoding looks like the way to go.
In Java done simply URLEncoder and URLDecoder
String encoded = URLEncoder.encode("My string &with& illegal = characters ", "UTF-8");
System.out.println("Encoded String:" + encoded);
String decoded = URLDecoder.decode(encoded, "UTF-8");
System.out.println("Decoded String:" + decoded);
URLEncoder
URLDecoder

Java convert encoding

I have a string which used to be an xml tag where mojibakes are contained:
<Applicant_Place_Born>Москва</Applicant_Place_Born>
I know that exactly the same string but in correct encoding is:
<Applicant_Place_Born>Москва</Applicant_Place_Born>
I know this because using Tcl utility I can convert it into proper string:
# The original string
set s "Москва"
# substituting the html escapes
set t "Ð\x9cоÑ\x81ква"
# decode from utf-8 into Unicode
encoding convertfrom utf-8 "Ð\x9cоÑ\x81ква"
Москва
I tried different variations of this:
System.out.println(new String(original.getBytes("UTF-8"), "CP1251"));
but I always got other mojibakes or question marks instead of characters.
Q: How can I do the same as Tcl does but using Java code?
EDIT:
I have tried #Joop Eggen's approach:
import org.apache.commons.lang3.StringEscapeUtils;
public class s {
static String s;
public static void main(String[] args) {
try {
System.setProperty("file.encoding", "CP1251");
System.out.println("JVM encoding: " + System.getProperty("file.encoding"));
s = "Москва";
System.out.println("Original text: " + s);
s = StringEscapeUtils.unescapeHtml4(s);
byte[] b = s.getBytes(StandardCharsets.ISO_8859_1);
s = new String(b, "UTF-16BE");
System.out.println("Result: " + s);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
The converted string was something Chineese:
JVM encoding: CP1251
Original text: Москва
Result: 킜킾톁킺킲킰
A String in java should always be correct Unicode. In your case you seem to have UTF16BE interpreted as some single-byte encoding.
A patch would be
String string = new StringEscapeUtils().UnescapeHTML4(s);
byte[] b = string.getBytes(StandardCharsets.ISO_8859_1);
string = new String(b, "UTF-16BE");
Now s should be a correct Unicode String.
System.out.println(s);
If the operating system for instance is in Cp1251 the Cyrillic text should be converted correct.
The characters in s are actually bytes of UTF-16BE I guess
By getting the bytes of the string in an single-byte encoding hopefully no conversion takes place
Then make a String of the bytes as being in UTF-16BE, internally converted to Unicode (actually UTF-16BE too)
You were pretty close. However, getBytes is used to encode UTF-8 rather than decode. What you want is something along the lines of
String string = "Ð\x9cоÑ\x81ква";
byte[] bytes = string.getBytes("UTF-8");
System.out.println(new String(bytes, "UTF-8"));

Java Convert strange string to Burmese language string

Hi my example code is like ;
String ln="á€á€­á€•á€¹á€•á€¶á€”ဲ့";
try {
byte[] b = ln.getBytes("UTF-8");
String s = new String(b, "US-ASCII");
System.out.println(s);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
when I run it, it does not print Brumese, Is there a sloution for that ? Thanks
The real problem is that the server is sending back content either with the wrong charset, or double-encoded. If at all possible, you should get that fixed.
In the meantime, you have the right idea—converting the mis-encoded text to the correct charset.
Each character in your String was apparently supposed to be a single byte which was part of an UTF-8 byte sequence. What you're actually seeing is each of those single bytes being treated as a character in the Windows cp1252 charset, and converted to a Java char accordingly.
So, you first want to convert the chars from cp1252 back into the proper bytes:
byte[] b = ln.getBytes("cp1252");
Now you have a true UTF-8 byte sequence, which you can convert into the proper String:
String s = new String(b, StandardCharsets.UTF_8);
// In Java 6, you must use:
//String s = new String(b, "UTF-8");
You should never use US-ASCII if you are decoding, or trying to generate, Burmese characters, or any non-English characters. ASCII consists of codepoints 0 through 127 only.

Sending Non-latin query string in URL in JavaME

I want to make am HTTP GET request from my J2ME application using HttpConnection class.
The problem is that I cannot send russian text in the query string.
Here is the example of how I'm sending the request
c = (HttpConnection)Connector.open("http://127.0.0.1:1418/zp.ashx?тест");
InputStream s = c.openInputStream();
The receiving asp.net script receives the query part of the url as %3f%3f%3f%3f
That is 4 identical codes. Definately that's not what I'm sending
So how can I send non-latin text in an http query in J2ME?
Thank you in advance
Your code
Connector.open("http://127.0.0.1:1418/zp.ashx?тест");
is processed by a java.nio.CharsetDecoder for the ASCII character set, and this decoder replaces all unknown characters with its replacement.
To get the behavior you want, you have to encode the URL before sending it. For example, when your server expects the URLs to be UTF8-encoded:
String encodedParameter = URLEncoder.encode("тест", "UTF-8");
Connector.open("http://127.0.0.1:1418/zp.ashx?" + encodedParameter);
Note that if you have multiple parameters, you have to encode both the parameter names and the parameter values individually, before putting them together with "=" and concatenating them with "&". If you need to encode multiple parameters, this class may be helpful to you:
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
public class UrlParamGenerator {
private final String encoding;
private final StringBuilder sb = new StringBuilder();
private String separator = "?";
public UrlParamGenerator(String charset) {
this.encoding = charset;
}
public void add(String key, String value) throws UnsupportedEncodingException {
sb.append(separator);
sb.append(URLEncoder.encode(key, encoding));
sb.append("=");
sb.append(URLEncoder.encode(value, encoding));
separator = "&";
}
#Override
public String toString() {
return sb.toString();
}
public static void main(String[] args) throws UnsupportedEncodingException {
UrlParamGenerator gen = new UrlParamGenerator("UTF-8");
gen.add("test", "\u0442\u0435\u0441\u0442");
gen.add("x", "0");
System.out.println(gen.toString());
}
}
You might need to explicitly set a character set in the HTTP header that supports the cyrillic alphabet. You could either use UTF-8 or another charset, such as windows-1251 (although UTF-8 should be the preferred choice).
c.setRequestProperty("Content-type", "application/x-www-form-urlencoded;charset=utf-8");
c = (HttpConnection)Connector.open("http://127.0.0.1:1418/zp.ashx?тест");
If you use an appropriate charset, the server should be able to properly handle the cyrillic request parameter - provided it too supports this charset.
URL can contain only ASCII chars and a few punctuation chars. For other chars, you must %-encode them before adding them in the URL. Use URLEncoder.encode("тест", enc) where the enc parameter is the encoding scheme that the server expects.

Categories