Reload Java String in C++ - java

There is a Java application that writes a string with non-english content to a file in this way:
byte bytes = str.getBytes("UTF-8");
writeToFile(bytes);
In the C++ side, how can I read content from that file and save it to a WCHAR[] correctly? For example I need show the string with MessageBox.

Looks like this article describes the process: http://www.codeproject.com/Articles/38242/Reading-UTF-8-with-C-streams

Ok, I think the solution at least for Windows is MultiByteToWideChar()

Related

Convert com.google.protobuf.ByteString to String

I have a Bytestring that I need to display to the console in java.
The Bytestring is of type com.google.protobuf.ByteString,
I am using:
System.out.println(myByteString);
however, when it is printed out in the terminal it is in this form:
\n\325\a\nk\b\003\032\v\b\312\371\336\343\005\020\254\200\307S\
How can I display the string in ASCII characters instead of this encoding?
I have tried using System.out.println(myByteString.toString());
Thanks
Try
System.out.println(myByteString.toString("UTF-8"));
or whatever encoding you are using.
Check out this link:
Google Developers: Class ByteString
If you are sure that you will be using UTF-8 then you can simply use
myByteString.toStringUtf8()
or
If you are not sure about the charset refer this page and use something similar to Luk's answer
myByteString.toString("US-ASCII")
you need call:
Base64.encodeToString(myByteString.toByteArray(), Base64.DEFAULT)

How to get UTF-8 conversion for a string

Frédéric in java converted to Frédéric.
However i need to pass the proper string to my client.
How to achieve this in Java ?
Did tried
String a = "Frédéric";
String b = new String(a.getBytes(), "UTF-8");
However string b also contain same value as a.
I am expecting string should able to store value as : Frédéric
How to pass this value properly to client.
If I understand the question correctly, you're looking for a function that will repair strings that have been damaged by others' encoding mistakes?
Here's one that seems to work on the example you gave:
static String fix(String badInput) {
byte[] bytes = badInput.getBytes(Charset.forName("cp1252"));
return new String(bytes, Charset.forName("UTF-8"));
}
fix("Frédéric") == "Frédéric"
The answer is quite complicated. See http://www.joelonsoftware.com/articles/Unicode.html for basic understanding.
My first suggestion would be to save your Java file with utf-8. Default for Eclipse on Windows would be cp1252 which might be your problem. Hope I could help.
Find your language code here and use that.
String a = new String(yourString.getBytes(), YOUR_ENCODING);
You can also try:
String a = URLEncoder.encode(yourString, HTTP.YOUR_ENCODING);
If System.out.println("Frédéric") shows the garbled output on the console it is most likely that the encodings used in your sourcecode (seems to be UTF-8) is not the same as the one used by the compiler - which by default is the platform-encoding, so probably some flavor of ISO-8859. Try using javac -encoding UTF-8 to compile your source (or set the appropriate property of your build environment) and you should be OK.
If you are sending this to some other piece of client software it's most likely an encoding issue on the client-side.

Reading UTF-8 .properties files in Java 1.5?

I have a project where everything is in UTF-8. I was using the Properties.load(Reader) method to read properties files in this encoding. But now, I need to make the project compatible with Java 1.5, and the mentioned method doesn't exist in Java 1.5. There is only a load method that takes an InputStream as a parameter, which is assumed to be in ISO-8859-1.
Is there any simple way to make my project 1.5-compatible without having to change all the .properties files to ISO-8859-1? I don't really want to have a mix of encodings in my project (encodings are already a time sink one at a time, let alone when you mix them) or change all my project to ISO-8859-1.
With "a simple way" I mean "without creating a custom Properties class from scratch".
Could you use xml-properties instead? As I understand by the spec .properties files should be in ISO-8859-1, if you want other characters, they should be quoted, using the native2ascii tool.
One strategy that might work for this situation is as follows:
Read the bytes of the Reader into a ByteArrayOutputStream.
Once that is completed, call toByteArray() See below.
With the byte[] construct a ByteArrayInputStream
Use the ByteArrayInputStream in Properties.load(InputStream)
As pointed out, the above failed to actually convert the character set from UTF-8 to ISO-8859-1. To fix that, a tweak.
After the BAOS has been filled, instead of calling toByteArray()..
Call toString("ISO-8859-1") to get an ISO-8859-1 encoded String. Then look to..
Call String.getBytes() to get the byte[]
What you can do is open a thread that would read data using a BufferedReader then write out the data to a PipedOutputStream which is then linked by a PipedInputStream that load uses.
PipedOutputStream pos = new PipedOutputStream();
PipedInputStream pis = new PipedInputStream(pos);
ReaderRunnable reader = new ReaderRunnable(pos, new File("utfproperty.properties"));
Thread t = new Thread(reader);
t.start();
properties.load(pis);
t.join();
The BufferedReader will read the data one character at a time and if it detects it to be a character data not to be within the US-ASCII (i.e. low 7-bit) range then it writes "\u" + the character code into the PipedOutputStream.
ReaderRunnable would be a class that looks like:
public class ReaderRunnable implements Runnable {
public ReaderRunnable(OutputStream os, File f) {
this.os = os;
this.f = f;
}
private final OutputStream os;
private final File f;
public void run() {
// open file
// read file, escape any non US-ASCII characters
}
}
Now after writing all that I was thinking that someone should've had this problem before and solved it, and the best place to look for these things is in Apache Commons. Fortunately, they have an implementation there.
https://commons.apache.org/io/apidocs/org/apache/commons/io/input/ReaderInputStream.html
The implementation from Apache is not without flaws though. Your input file even if it is UTF-8 must only contain the characters from the ISO-8859-1 character set. The design I had provided above can handle that situation.
Depending on your build engine you can \uXXXX-escape the properties into the build target directory. Maven can filter them via the native2ascii-maven-plugin.
What I personally do in my projects is I keep my properties in UTF-8 files with an extension .uproperties and I convert them to ISO at the build time to .properties files using native2ascii.exe. This allows me to maintain my properties in UTF-8 and the Ant script does everything else for me.
What I just now experienced is, Make all .java files also UTF-8 encoding type (not only properties file where you store UTF-8 characters). This way there no need to use for InputStreamReader also. Also, make sure to compile to UTF-8 encoding.
This has worked for me without any added parameter of UTF-8.
To test this, write a simple stub program in eclipse and change the format of that java file by going to properties of that file and Resource section, to set the UTF-8 encoding format.

Blackberry: Read a text file packaged in the project (faster)

I've tried this approach:
http://www.blackberry.com/knowledgecenterpublic/livelink.exe/fetch/2000/348583/800332/800620/How_To_-_Add_plain_text_or_binary_files_to_an_application.html?nodeid=800687&vernum=0
But it's REALLY slow for slightly large text files. Does anyone know of a better way of reading a plain text file that is included in the project? Is there a way to use FileConnection?
Figured it out using a combination of information:
IOUtilities.streamToBytes(is);
Directly on the input stream. So a more complete example would be as follows:
Class classs = Class.forName("com.packagename.stuff.FileDemo");
InputStream is = classs.getResourceAsStream("/test");
byte[] data = IOUtilities.streamToBytes(is);
String result = new String(data);
Deal? Deal.

Java string encoding conversion within a webpage

I have a webpage that is encoded (through its header) as WIN-1255.
A Java program creates text string that are automatically embedded in the page. The problem is that the original strings are encoded in UTF-8, thus creating a Gibberish text field in the page.
Unfortunately, I can not change the page encoding - it's required by a customer propriety system.
Any ideas?
UPDATE:
The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.
SECOND UPDATE:
Thanks for all the responses. I've managed to convert th string, and yet, Gibberish. Problem was that XML encoding should be set in addition to the header encoding.
Adam
To the point, you need to set the encoding of the response writer. With only a response header you're basically only instructing the client application which encoding to use to interpret/display the page. This ain't going to work if the response itself is written with a different encoding.
The context where you have this problem is entirely unclear (please elaborate about it as well in future problems like this), so here are several solutions:
If it is JSP, you need to set the following in top of JSP to set the response encoding:
<%# page pageEncoding="WIN-1255" %>
If it is Servlet, you need to set the following before any first flush to set the response encoding:
response.setCharacterEncoding("WIN-1255");
Both by the way automagically implicitly set the Content-Type response header with a charset parameter to instruct the client to use the same encoding to interpret/display the page. Also see this article for more information.
If it is a homegrown application which relies on the basic java.net and/or java.io API's, then you need to write the characters through an OutputStreamWriter which is constructed using the constructor taking 2 arguments wherein you can specify the encoding:
Writer writer = new OutputStreamWriter(someOutputStream, "WIN-1255");
Assuming you have control of the original (properly represented) strings, and simply need to output them in win-1255:
import java.nio.charset.*;
import java.nio.*;
Charset win1255 = Charset.forName("windows-1255");
ByteBuffer bb = win1255.encode(someString);
byte[] ba = new byte[bb.limit()];
Then, simply write the contents of ba at the appropriate place.
EDIT: What you do with ba depends on your environment. For instance, if you're using servlets, you might do:
ServletOutputStream os = ...
os.write(ba);
We also should not overlook the possible approach of calling setContentType("text/html; charset=windows-1255") (setContentType), then using getWriter normally. You did not make completely clear if windows-1255 was being set in a meta tag or in the HTTP response header.
You clarified that you have a UTF-8 file that you need to decode. If you're not already decoding the UTF-8 strings properly, this should no big deal. Just look at InputStreamReader(someInputStream, Charset.forName("utf-8"))
What's embedding the data in the page? Either it should read it as text (in UTF-8) and then write it out again in the web page's encoding (Win-1255) or you should change the Java program to create the files (or whatever) in Win-1255 to start with.
If you can give more details about how the system works (what's generating the web page? How does it interact with the Java program?) then it will make things a lot clearer.
The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.
In this case, use a parser to load the UTF-8 XML. This should correctly decode the data to UTF-16 character data (Java Strings are always UTF-16). Your output mechanism should encode from UTF-16 to Windows-1255.
byte[] originalUtf8;//Here input
//utf-8 to java String:
String internal = new String(originalUtf8,Charset.forName("utf-8");
//java string to w1255 String
byte[] win1255 = internal.getBytes(Charset.forName("cp1255"));
//Here output

Categories