Convert com.google.protobuf.ByteString to String - java

I have a Bytestring that I need to display to the console in java.
The Bytestring is of type com.google.protobuf.ByteString,
I am using:
System.out.println(myByteString);
however, when it is printed out in the terminal it is in this form:
\n\325\a\nk\b\003\032\v\b\312\371\336\343\005\020\254\200\307S\
How can I display the string in ASCII characters instead of this encoding?
I have tried using System.out.println(myByteString.toString());
Thanks

Try
System.out.println(myByteString.toString("UTF-8"));
or whatever encoding you are using.
Check out this link:
Google Developers: Class ByteString

If you are sure that you will be using UTF-8 then you can simply use
myByteString.toStringUtf8()
or
If you are not sure about the charset refer this page and use something similar to Luk's answer
myByteString.toString("US-ASCII")

you need call:
Base64.encodeToString(myByteString.toByteArray(), Base64.DEFAULT)

Related

How to get UTF-8 conversion for a string

Frédéric in java converted to Frédéric.
However i need to pass the proper string to my client.
How to achieve this in Java ?
Did tried
String a = "Frédéric";
String b = new String(a.getBytes(), "UTF-8");
However string b also contain same value as a.
I am expecting string should able to store value as : Frédéric
How to pass this value properly to client.
If I understand the question correctly, you're looking for a function that will repair strings that have been damaged by others' encoding mistakes?
Here's one that seems to work on the example you gave:
static String fix(String badInput) {
byte[] bytes = badInput.getBytes(Charset.forName("cp1252"));
return new String(bytes, Charset.forName("UTF-8"));
}
fix("Frédéric") == "Frédéric"
The answer is quite complicated. See http://www.joelonsoftware.com/articles/Unicode.html for basic understanding.
My first suggestion would be to save your Java file with utf-8. Default for Eclipse on Windows would be cp1252 which might be your problem. Hope I could help.
Find your language code here and use that.
String a = new String(yourString.getBytes(), YOUR_ENCODING);
You can also try:
String a = URLEncoder.encode(yourString, HTTP.YOUR_ENCODING);
If System.out.println("Frédéric") shows the garbled output on the console it is most likely that the encodings used in your sourcecode (seems to be UTF-8) is not the same as the one used by the compiler - which by default is the platform-encoding, so probably some flavor of ISO-8859. Try using javac -encoding UTF-8 to compile your source (or set the appropriate property of your build environment) and you should be OK.
If you are sending this to some other piece of client software it's most likely an encoding issue on the client-side.

Unable to decode russing string with encodeURIComponent and java.net.decode

I have a russian string "этикетка". This is need to send to a web service, before sending to the web service i use encodeURIComponent to encode the string and it gives me:
'%D1%8D%D1%82%D0%B8%D0%BA%D0%B5%D1%82%D0%BA%D0%B0'
On the web service side is receive the string and decode it using the following code:
String strLbl = java.net.URLDecoder.decode(label);
but i don't get the string properly. It looses formatting and I get ѿтикетка.
Can you please suggest how can i overcome this or what is the ideal way to send russian string
Thanks and regards
As explained in the link given by NULL, decode(string) is now Deprecated in the favour of decode(string, encoding)
I would guess that the encoding and decoding method are not using the same page code.
Did you try to force UTF-8 during both process?
I misunderstood your question be the formatting of it.
Use decodeURIComponent to decode url encoded strings in JavaScript:
> decodeURIComponent('%D1%8D%D1%82%D0%B8%D0%BA%D0%B5%D1%82%D0%BA%D0%B0')
"этикетка";

Reload Java String in C++

There is a Java application that writes a string with non-english content to a file in this way:
byte bytes = str.getBytes("UTF-8");
writeToFile(bytes);
In the C++ side, how can I read content from that file and save it to a WCHAR[] correctly? For example I need show the string with MessageBox.
Looks like this article describes the process: http://www.codeproject.com/Articles/38242/Reading-UTF-8-with-C-streams
Ok, I think the solution at least for Windows is MultiByteToWideChar()

Converting from Java String to Windows-1252 Format

I want to send a URL request, but the parameter values in the URL can have french characters (eg. è). How do I convert from a Java String to Windows-1252 format (which supports the French characters)?
I am currently doing this:
String encodedURL = new String (unencodedUrl.getBytes("UTF-8"), "Windows-1252");
However, it makes:
param=Stationnement extèrieur into param=Stationnement extérieur .
How do I fix this? Any suggestions?
Edit for further clarification:
The user chooses values from a drop down. When the language is French, the values from the drop down sometimes include French characters, like 'è'. When I send this request to the server, it fails, saying it is unable to decipher the request. I have to figure out how to send the 'è' as a different format (preferably Windows-1252) that supports French characters. I have chosen to send as Windows-1252. The server will accept this format. I don't want to replace each character, because I could miss a special character, and then the server will throw an exception.
Use URLEncoder to encode parameter values as application/x-www-form-urlencoded data:
String param = "param="
+ URLEncoder.encode("Stationnement extr\u00e8ieur", "cp1252");
See here for an expanded explanation.
Try using
String encodedURL = new String (unencodedUrl.getBytes("UTF-8"), Charset.forName("Windows-1252"));
As per McDowell's suggestion, I tried encoding doing:
URLEncoder.encode("stringValueWithFrechCharacters", "cp1252") but it didn't work perfectly. I replayced "cp1252" with HTTP.ISO_8859_1 because I believe Android does not have the support for Windows-1252 yet. It does allow for ISO_8859_1, and after reading here, this supports MOST of the French characters, with the exception of 'Œ', 'œ', and 'Ÿ'.
So doing this made it work:
URLEncoder.encode(frenchString, HTTP.ISO_8859_1);
Works perfectly!

Java string encoding conversion within a webpage

I have a webpage that is encoded (through its header) as WIN-1255.
A Java program creates text string that are automatically embedded in the page. The problem is that the original strings are encoded in UTF-8, thus creating a Gibberish text field in the page.
Unfortunately, I can not change the page encoding - it's required by a customer propriety system.
Any ideas?
UPDATE:
The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.
SECOND UPDATE:
Thanks for all the responses. I've managed to convert th string, and yet, Gibberish. Problem was that XML encoding should be set in addition to the header encoding.
Adam
To the point, you need to set the encoding of the response writer. With only a response header you're basically only instructing the client application which encoding to use to interpret/display the page. This ain't going to work if the response itself is written with a different encoding.
The context where you have this problem is entirely unclear (please elaborate about it as well in future problems like this), so here are several solutions:
If it is JSP, you need to set the following in top of JSP to set the response encoding:
<%# page pageEncoding="WIN-1255" %>
If it is Servlet, you need to set the following before any first flush to set the response encoding:
response.setCharacterEncoding("WIN-1255");
Both by the way automagically implicitly set the Content-Type response header with a charset parameter to instruct the client to use the same encoding to interpret/display the page. Also see this article for more information.
If it is a homegrown application which relies on the basic java.net and/or java.io API's, then you need to write the characters through an OutputStreamWriter which is constructed using the constructor taking 2 arguments wherein you can specify the encoding:
Writer writer = new OutputStreamWriter(someOutputStream, "WIN-1255");
Assuming you have control of the original (properly represented) strings, and simply need to output them in win-1255:
import java.nio.charset.*;
import java.nio.*;
Charset win1255 = Charset.forName("windows-1255");
ByteBuffer bb = win1255.encode(someString);
byte[] ba = new byte[bb.limit()];
Then, simply write the contents of ba at the appropriate place.
EDIT: What you do with ba depends on your environment. For instance, if you're using servlets, you might do:
ServletOutputStream os = ...
os.write(ba);
We also should not overlook the possible approach of calling setContentType("text/html; charset=windows-1255") (setContentType), then using getWriter normally. You did not make completely clear if windows-1255 was being set in a meta tag or in the HTTP response header.
You clarified that you have a UTF-8 file that you need to decode. If you're not already decoding the UTF-8 strings properly, this should no big deal. Just look at InputStreamReader(someInputStream, Charset.forName("utf-8"))
What's embedding the data in the page? Either it should read it as text (in UTF-8) and then write it out again in the web page's encoding (Win-1255) or you should change the Java program to create the files (or whatever) in Win-1255 to start with.
If you can give more details about how the system works (what's generating the web page? How does it interact with the Java program?) then it will make things a lot clearer.
The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.
In this case, use a parser to load the UTF-8 XML. This should correctly decode the data to UTF-16 character data (Java Strings are always UTF-16). Your output mechanism should encode from UTF-16 to Windows-1255.
byte[] originalUtf8;//Here input
//utf-8 to java String:
String internal = new String(originalUtf8,Charset.forName("utf-8");
//java string to w1255 String
byte[] win1255 = internal.getBytes(Charset.forName("cp1255"));
//Here output

Categories