Unable to decode russing string with encodeURIComponent and java.net.decode - java

I have a russian string "этикетка". This is need to send to a web service, before sending to the web service i use encodeURIComponent to encode the string and it gives me:
'%D1%8D%D1%82%D0%B8%D0%BA%D0%B5%D1%82%D0%BA%D0%B0'
On the web service side is receive the string and decode it using the following code:
String strLbl = java.net.URLDecoder.decode(label);
but i don't get the string properly. It looses formatting and I get ѿтикетка.
Can you please suggest how can i overcome this or what is the ideal way to send russian string
Thanks and regards

As explained in the link given by NULL, decode(string) is now Deprecated in the favour of decode(string, encoding)
I would guess that the encoding and decoding method are not using the same page code.
Did you try to force UTF-8 during both process?

I misunderstood your question be the formatting of it.
Use decodeURIComponent to decode url encoded strings in JavaScript:
> decodeURIComponent('%D1%8D%D1%82%D0%B8%D0%BA%D0%B5%D1%82%D0%BA%D0%B0')
"этикетка";

Related

Convert com.google.protobuf.ByteString to String

I have a Bytestring that I need to display to the console in java.
The Bytestring is of type com.google.protobuf.ByteString,
I am using:
System.out.println(myByteString);
however, when it is printed out in the terminal it is in this form:
\n\325\a\nk\b\003\032\v\b\312\371\336\343\005\020\254\200\307S\
How can I display the string in ASCII characters instead of this encoding?
I have tried using System.out.println(myByteString.toString());
Thanks
Try
System.out.println(myByteString.toString("UTF-8"));
or whatever encoding you are using.
Check out this link:
Google Developers: Class ByteString
If you are sure that you will be using UTF-8 then you can simply use
myByteString.toStringUtf8()
or
If you are not sure about the charset refer this page and use something similar to Luk's answer
myByteString.toString("US-ASCII")
you need call:
Base64.encodeToString(myByteString.toByteArray(), Base64.DEFAULT)

How to decode a text from a charset and convert to another charset in Java?

I am using the Java mail API to retrieve the emails from gmail via Imap and show it in the Webpage powered with AngularJS.
When I get the data for an email using javax.mail.Message.getContent() return as Object with charset - gb2312.
But my web page is using the UTF-8 charset, so while i am facing strange characters in the web page for some.
I need to convert from gb2312(or any) charset to the utf-8 to show correctly in the webpage.
Can anyone help with this ?
You can create a new String like this and convert it to UTF-8:
String s = new String(bytes, "OriginalCharset");
byte[] utfBytes = s.getBytes("UTF-8");
I think Java uses UTF-8 natively, but it's better to do it explicitly.

Play Framework - receiving email through SendGrid - character encoding of email body

I am developing a small mail client in the Java Play Framework and I'm using SendGrid for the e-mails. When an e-mail is received, it gets posted to a url and I then parse the posted form using JsonNode. Now the problem is the "to", "from", "subject" fields of that form are automatically converted by SendGrid to UTF-8. Now comes the problem: apparently, the email message body is encoded in "ISO-8859-1". And I need to convert that String to "UTF-8". I already tried several ways of doing so, but most probably I'm doing something very wrong, since I always get strange characters for French or German words containing accents/umlauts (Example "Zürich" comes out as "Z?rich". The code I'm using for the conversion is the following:
byte[] msg = message.getBytes("ISO-8859-1");
byte[] msg_utf8 = new String(msg, "ISO-8859-1").getBytes("UTF-8");
message = new String(msg_utf8, "UTF-8");
Could you, please, suggest a solution? Thank you very much in advance!
Ok so I managed to get the raw byte request from SendGrid using the annotation and created the java String with the correct encodings:
#BodyParser.Of(BodyParser.Raw.class)
public static Result getmail() {
...
}
Now the problem is that for retrieving the file attachments from the request I would need the request to be parsed as MultipartFormData. With the annotation above set, I get a NullPointerException when calling, which was predictable:
request().body().asMultipartFormData().getFiles()
Does any of you have any idea on how I could get the same request again, but parsed with the #BodyParser.Of(Bodyparser.MultipartFormData.class) ? So I kind of need to combine the two annotations or find a way to convert the byte[] I get from the Raw parser to a MultiFormData. Thanks!

Converting from Java String to Windows-1252 Format

I want to send a URL request, but the parameter values in the URL can have french characters (eg. è). How do I convert from a Java String to Windows-1252 format (which supports the French characters)?
I am currently doing this:
String encodedURL = new String (unencodedUrl.getBytes("UTF-8"), "Windows-1252");
However, it makes:
param=Stationnement extèrieur into param=Stationnement extérieur .
How do I fix this? Any suggestions?
Edit for further clarification:
The user chooses values from a drop down. When the language is French, the values from the drop down sometimes include French characters, like 'è'. When I send this request to the server, it fails, saying it is unable to decipher the request. I have to figure out how to send the 'è' as a different format (preferably Windows-1252) that supports French characters. I have chosen to send as Windows-1252. The server will accept this format. I don't want to replace each character, because I could miss a special character, and then the server will throw an exception.
Use URLEncoder to encode parameter values as application/x-www-form-urlencoded data:
String param = "param="
+ URLEncoder.encode("Stationnement extr\u00e8ieur", "cp1252");
See here for an expanded explanation.
Try using
String encodedURL = new String (unencodedUrl.getBytes("UTF-8"), Charset.forName("Windows-1252"));
As per McDowell's suggestion, I tried encoding doing:
URLEncoder.encode("stringValueWithFrechCharacters", "cp1252") but it didn't work perfectly. I replayced "cp1252" with HTTP.ISO_8859_1 because I believe Android does not have the support for Windows-1252 yet. It does allow for ISO_8859_1, and after reading here, this supports MOST of the French characters, with the exception of 'Œ', 'œ', and 'Ÿ'.
So doing this made it work:
URLEncoder.encode(frenchString, HTTP.ISO_8859_1);
Works perfectly!

Java+HtmlUnit — problem with cyrillic urlencode

I am trying to send some HTTP POST parameters to some web server and one of parameters contains cyrillic characters. So the problem is that if I use this code:
wc.getPage(requestSettings);
requestSettings.setHttpMethod(HttpMethod.POST);
requestSettings.setRequestParameters(new ArrayList());
requestSettings.getRequestParameters().add(new NameValuePair("username", "Друже бобер"));
wc.getPage(requestSettings);
Server will recieve the next urlencoded parameter:
And this is wrong decoded string "Друже бобер".
So I think that HtmlUnit encode url in core with using ASCII not Unicode. How to disable url encoding or how to fix this bug? If I'll encode this string and set to NameValuePair so all percent characters will be encoded by HtmlUnit to.
I think you need to set the charset using the setCharset method.

Categories