Google App Engine encoding - java

I'm trying to log some russian text:
LOG.info("тестирование русского");
But I get question symbols instead (viewing from web):
[app-id/app-version].:
15:18:44,753 INFO [class] -
???????????? ????????
Java file saved with UTF-8 encoding. All settings are default.
Even I read file in UTF-8 with russian characters and try to log something from it -- encoding is wrong too.

I had a similar problem with Hebrew text. I found out it was caused by the default encoding.
To check the default encoding, I used this code:
OutputStreamWriter out = new OutputStreamWriter(new ByteArrayOutputStream());
String encoding = out.getEncoding();
On my computer, the encoding is "UTF8". On the GAE server, it is "ASCII".
I solved the problem by replacing all the file readers in my code with:
new InputStreamReader(new FileInputStream(file), "UTF8"));
This tells Java to ignore the default encoding, and open all input files as UTF8.

Try this. Apparently GAE tries to autodetect encoding and fails.
Even constant strings were messed up
public class Util {
public static String FixRussianString(String string){
try {
return new String(string.getBytes("CP1251"), "UTF-8");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return string;
}
}

Related

How do I write chinese charactes in ZipEntry?

I want to export a string(chinese text) to CSV file inside a zip file. Where do I need to set the encoding to UTF-8? Or what approach should I take (based on the code below) to display chinese characters in the exported CSV file?
This is the code I currently have.
ByteArrayOutputStream out = new ByteArrayOutputStream();
ZipOutputStream zipOut = new ZipOutputStream(out, StandardCharsets.UTF_8)
try {
ZipEntry entry = new ZipEntry("chinese.csv");
zipOut.putNextEntry(entry);
zipOut.write("类型".getBytes());
} catch (IOException e) {
e.printStackTrace();
} finally {
zipOut.close();
out.close();
}
Instead of "类型", I get "类型" in the CSV file.
First, you definitely need to change zipOut.write("类型".getBytes()); to zipOut.write("类型".getBytes(StandardCharsets.UTF_8)); Also, when you open your resultant CSV file, the editor might not be aware that the content is encoded in UTF-8. You may need to tell your editor that it is UTF-8 encoding. For instance, in Notepad, you can save your file with "Save As" option and change encoding to UTF-8. Also, your issue might be just wrong display issue rather than actual encoding. There is an Open Source Java library that has a utility that converts any String to Unicode Sequence and vice-versa. This utility helped me many times when I was working on diagnosing various charset related issues. Here is the sample what the code does
result = "Hello World";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);
The output of this code is:
\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
Hello World
The library can be found at Maven Central or at Github It comes as maven artifact and with sources and javadoc
Here is javadoc for the class StringUnicodeEncoderDecoder
I tried your inputs and got this:
System.out.println(StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence("类型"));
System.out.println(StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence("类型"));
And the output was:
\u7c7b\u578b
\u00e7\u00b1\u00bb\u00e5\u017e\u2039
So it looks like you did lose the info, and it is not just a display issue
The getBytes() method is one culprit, without an explicit charset it takes the default character set of your machine. As of the Java String documentation:
getBytes()
Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.
getBytes(string charsetName)
Encodes this String into a sequence of bytes using the given charset, storing the result into a new byte array.
Furthermore, as #Slaw pointed out, make sure that you compile (javac -encoding <encoding>) your files with the same encoding the files are in:
-encoding Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is not specified, the platform default converter is used.
A call to closeEntry() was missing in the OP btw. I stripped the snippet down to what I found necessary to achieve the desired funcitonality.
try (FileOutputStream fileOut = new FileOutputStream("out.zip");
ZipOutputStream zipOut = new ZipOutputStream(fileOut)) {
zipOut.putNextEntry(new ZipEntry("chinese.csv"));
zipOut.write("类型".getBytes("UTF-8"));
zipOut.closeEntry();
}
Finally, as #MichaelGantman pointed out, you might want to check what is in which encoding using a tool like a hex-editor for example, also to rule out that the editor you view the result file in displays correct utf-8 in a wrong way. "类" in utf-8 is (hex) e7 b1 bb in utf-16 (the java default encoding) it is 7c 7b

How to use UTF-8 in resource properties with ResourceBundle

I need to use UTF-8 in my resource properties using Java's ResourceBundle. When I enter the text directly into the properties file, it displays as mojibake.
My app runs on Google App Engine.
Can anyone give me an example? I can't get this work.
Java 9 and newer
From Java 9 onwards property files are encoded as UTF-8 by default, and using characters outside of ISO-8859-1 should work out of the box.
Java 8 and older
The ResourceBundle#getBundle() uses under the covers PropertyResourceBundle when a .properties file is specified. This in turn uses by default Properties#load(InputStream) to load those properties files. As per the javadoc, they are by default read as ISO-8859-1.
public void load(InputStream inStream) throws IOException
Reads a property list (key and element pairs) from the input byte stream. The input stream is in a simple line-oriented format as specified in load(Reader) and is assumed to use the ISO 8859-1 character encoding; that is each byte is one Latin1 character. Characters not in Latin1, and certain special characters, are represented in keys and elements using Unicode escapes as defined in section 3.3 of The Java™ Language Specification.
So, you'd need to save them as ISO-8859-1. If you have any characters beyond ISO-8859-1 range and you can't use \uXXXX off top of head and you're thus forced to save the file as UTF-8, then you'd need to use the native2ascii tool to convert an UTF-8 saved properties file to an ISO-8859-1 saved properties file wherein all uncovered characters are converted into \uXXXX format. The below example converts a UTF-8 encoded properties file text_utf8.properties to a valid ISO-8859-1 encoded properties file text.properties.
native2ascii -encoding UTF-8 text_utf8.properties text.properties
When using a sane IDE such as Eclipse, this is already automatically done when you create a .properties file in a Java based project and use Eclipse's own editor. Eclipse will transparently convert the characters beyond ISO-8859-1 range to \uXXXX format. See also below screenshots (note the "Properties" and "Source" tabs on bottom, click for large):
Alternatively, you could also create a custom ResourceBundle.Control implementation wherein you explicitly read the properties files as UTF-8 using InputStreamReader, so that you can just save them as UTF-8 without the need to hassle with native2ascii. Here's a kickoff example:
public class UTF8Control extends Control {
public ResourceBundle newBundle
(String baseName, Locale locale, String format, ClassLoader loader, boolean reload)
throws IllegalAccessException, InstantiationException, IOException
{
// The below is a copy of the default implementation.
String bundleName = toBundleName(baseName, locale);
String resourceName = toResourceName(bundleName, "properties");
ResourceBundle bundle = null;
InputStream stream = null;
if (reload) {
URL url = loader.getResource(resourceName);
if (url != null) {
URLConnection connection = url.openConnection();
if (connection != null) {
connection.setUseCaches(false);
stream = connection.getInputStream();
}
}
} else {
stream = loader.getResourceAsStream(resourceName);
}
if (stream != null) {
try {
// Only this line is changed to make it to read properties files as UTF-8.
bundle = new PropertyResourceBundle(new InputStreamReader(stream, "UTF-8"));
} finally {
stream.close();
}
}
return bundle;
}
}
This can be used as follows:
ResourceBundle bundle = ResourceBundle.getBundle("com.example.i18n.text", new UTF8Control());
See also:
Unicode - How to get the characters right?
Given that you have an instance of ResourceBundle and you can get String by:
String val = bundle.getString(key);
I solved my Japanese display problem by:
return new String(val.getBytes("ISO-8859-1"), "UTF-8");
look at this : http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)
the properties accept an Reader object as arguments, which you can create from an InputStream.
at the create time, you can specify the encoding of the Reader:
InputStreamReader isr = new InputStreamReader(stream, "UTF-8");
then apply this Reader to the load method :
prop.load(isr);
BTW: get the stream from .properties file :
InputStream stream = this.class.getClassLoader().getResourceAsStream("a.properties");
BTW: get resource bundle from InputStreamReader:
ResourceBundle rb = new PropertyResourceBundle(isr);
hope this can help you !
This problem has finally been fixed in Java 9:
https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9
Default encoding for properties files is now UTF-8.
Most existing properties files should not be affected: UTF-8 and
ISO-8859-1 have the same encoding for ASCII characters, and
human-readable non-ASCII ISO-8859-1 encoding is not valid UTF-8. If an
invalid UTF-8 byte sequence is detected, the Java runtime
automatically rereads the file in ISO-8859-1.
ResourceBundle.Control with UTF-8 and new String methods don't work, if the properties file uses cp1251 charset, for example.
So I recomended using a common method: write in unicode symbols. For this:
IDEA -- has a special "Transparent native-to-ASCII conversion" option (Settings > File Encoding).
Eclipse -- has a plugin "Properties Editor". It can work as separate application.
package com.varaneckas.utils;
import java.io.UnsupportedEncodingException;
import java.util.Enumeration;
import java.util.PropertyResourceBundle;
import java.util.ResourceBundle;
/**
* UTF-8 friendly ResourceBundle support
*
* Utility that allows having multi-byte characters inside java .property files.
* It removes the need for Sun's native2ascii application, you can simply have
* UTF-8 encoded editable .property files.
*
* Use:
* ResourceBundle bundle = Utf8ResourceBundle.getBundle("bundle_name");
*
* #author Tomas Varaneckas <tomas.varaneckas#gmail.com>
*/
public abstract class Utf8ResourceBundle {
/**
* Gets the unicode friendly resource bundle
*
* #param baseName
* #see ResourceBundle#getBundle(String)
* #return Unicode friendly resource bundle
*/
public static final ResourceBundle getBundle(final String baseName) {
return createUtf8PropertyResourceBundle(
ResourceBundle.getBundle(baseName));
}
/**
* Creates unicode friendly {#link PropertyResourceBundle} if possible.
*
* #param bundle
* #return Unicode friendly property resource bundle
*/
private static ResourceBundle createUtf8PropertyResourceBundle(
final ResourceBundle bundle) {
if (!(bundle instanceof PropertyResourceBundle)) {
return bundle;
}
return new Utf8PropertyResourceBundle((PropertyResourceBundle) bundle);
}
/**
* Resource Bundle that does the hard work
*/
private static class Utf8PropertyResourceBundle extends ResourceBundle {
/**
* Bundle with unicode data
*/
private final PropertyResourceBundle bundle;
/**
* Initializing constructor
*
* #param bundle
*/
private Utf8PropertyResourceBundle(final PropertyResourceBundle bundle) {
this.bundle = bundle;
}
#Override
#SuppressWarnings("unchecked")
public Enumeration getKeys() {
return bundle.getKeys();
}
#Override
protected Object handleGetObject(final String key) {
final String value = bundle.getString(key);
if (value == null)
return null;
try {
return new String(value.getBytes("ISO-8859-1"), "UTF-8");
} catch (final UnsupportedEncodingException e) {
throw new RuntimeException("Encoding not supported", e);
}
}
}
}
We create a resources.utf8 file that contains the resources in UTF-8 and have a rule to run the following:
native2ascii -encoding utf8 resources.utf8 resources.properties
Attention: In Java <= 8 java property files should be encoded in ISO 8859-1!
ISO 8859-1 character encoding.
Characters that cannot be directly
represented in this encoding can be
written using Unicode escapes ; only
a single 'u' character is allowed in
an escape sequence.
#see Properties Java Doc
If you still really want to do this: have a look at:
Java properties UTF-8 encoding in Eclipse -- there are some code samples
Since Java 9: property files are encoded in UTF-8, so there should be no problem/doubt
In Java SE 9, properties files are loaded in UTF-8 encoding. In previous releases, ISO-8859-1 encoding was used for loading property resource bundles.
(https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9.htm#JSINT-GUID-9DCDB41C-A989-4220-8140-DBFB844A0FCA)
http://sourceforge.net/projects/eclipse-rbe/
as already stated property files should be encoded in ISO 8859-1
You can use the above plugin for eclipse IDE to make the Unicode conversion for you.
Here's a Java 7 solution that uses Guava's excellent support library and the try-with-resources construct. It reads and writes properties files using UTF-8 for the simplest overall experience.
To read a properties file as UTF-8:
File file = new File("/path/to/example.properties");
// Create an empty set of properties
Properties properties = new Properties();
if (file.exists()) {
// Use a UTF-8 reader from Guava
try (Reader reader = Files.newReader(file, Charsets.UTF_8)) {
properties.load(reader);
} catch (IOException e) {
// Do something
}
}
To write a properties file as UTF-8:
File file = new File("/path/to/example.properties");
// Use a UTF-8 writer from Guava
try (Writer writer = Files.newWriter(file, Charsets.UTF_8)) {
properties.store(writer, "Your title here");
writer.flush();
} catch (IOException e) {
// Do something
}
As one suggested, i went through implementation of resource bundle.. but that did not help.. as the bundle was always called under en_US locale... i tried to set my default locale to a different language and still my implementation of resource bundle control was being called with en_US... i tried to put log messages and do a step through debug and see if a different local call was being made after i change locale at run time through xhtml and JSF calls... that did not happend... then i tried to do a system set default to a utf8 for reading files by my server (tomcat server).. but that caused pronlem as all my class libraries were not compiled under utf8 and tomcat started to read then in utf8 format and server was not running properly... then i ended up with implementing a method in my java controller to be called from xhtml files.. in that method i did the following:
public String message(String key, boolean toUTF8) throws Throwable{
String result = "";
try{
FacesContext context = FacesContext.getCurrentInstance();
String message = context.getApplication().getResourceBundle(context, "messages").getString(key);
result = message==null ? "" : toUTF8 ? new String(message.getBytes("iso8859-1"), "utf-8") : message;
}catch(Throwable t){}
return result;
}
I was particularly nervous as this could slow down performance of my application... however, after implementing this, it looks like as if my application is faster now.. i think it is because, i am now directly accessing the properties instead of letting JSF parse its way into accessing properties... i specifically pass Boolean argument in this call because i know some of the properties would not be translated and do not need to be in utf8 format...
Now I have saved my properties file in UTF8 format and it is working fine as each user in my application has a referent locale preference.
For what it's worth my issue was that the files themselves were in the wrong encoding. Using iconv worked for me
iconv -f ISO-8859-15 -t UTF-8 messages_nl.properties > messages_nl.properties.new
I tried to use the approach provided by Rod, but taking into consideration BalusC concern about not repeating the same work-around in all the application and came with this class:
import java.io.UnsupportedEncodingException;
import java.util.Locale;
import java.util.ResourceBundle;
public class MyResourceBundle {
// feature variables
private ResourceBundle bundle;
private String fileEncoding;
public MyResourceBundle(Locale locale, String fileEncoding){
this.bundle = ResourceBundle.getBundle("com.app.Bundle", locale);
this.fileEncoding = fileEncoding;
}
public MyResourceBundle(Locale locale){
this(locale, "UTF-8");
}
public String getString(String key){
String value = bundle.getString(key);
try {
return new String(value.getBytes("ISO-8859-1"), fileEncoding);
} catch (UnsupportedEncodingException e) {
return value;
}
}
}
The way to use this would be very similar than the regular ResourceBundle usage:
private MyResourceBundle labels = new MyResourceBundle("es", "UTF-8");
String label = labels.getString(key)
Or you can use the alternate constructor which uses UTF-8 by default:
private MyResourceBundle labels = new MyResourceBundle("es");
Properties prop = new Properties();
String fileName = "./src/test/resources/predefined.properties";
FileInputStream inputStream = new FileInputStream(fileName);
InputStreamReader reader = new InputStreamReader(inputStream,"UTF-8");
Open the Settings / Preferences dialog (Ctrl + Alt + S), then click Editor and File Encodings.
Then, on the bottom, you will fing default encodings for properties files. Choose your encoding type.
Alternatively you can use unicode symbols instead of text in your resource bundle (for example "ів" equals \u0456\u0432)
From Java 9, the default to load properties file has been changed to UTF-8. https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9.htm
Speaking for current (2021-2) Java versions there is still the old ISO-8859-1 function utils.Properties#load.
If you use Properties.load you must use ISO-8859-1.
If you use ResourceBundle than UTF-8 should be fine.
Allow me to quote from the official doc.
PropertyResourceBundle
PropertyResourceBundle can be constructed either from an InputStream or a Reader, which represents a property file. Constructing a PropertyResourceBundle instance from an InputStream requires that the input stream be encoded in UTF-8. By default, if a MalformedInputException or an UnmappableCharacterException occurs on reading the input stream, then the PropertyResourceBundle instance resets to the state before the exception, re-reads the input stream in ISO-8859-1, and continues reading. If the system property java.util.PropertyResourceBundle.encoding is set to either "ISO-8859-1" or "UTF-8", the input stream is solely read in that encoding, and throws the exception if it encounters an invalid sequence. If "ISO-8859-1" is specified, characters that cannot be represented in ISO-8859-1 encoding must be represented by Unicode Escapes as defined in section 3.3 of The Java™ Language Specification whereas the other constructor which takes a Reader does not have that limitation. Other encoding values are ignored for this system property. The system property is read and evaluated when initializing this class. Changing or removing the property has no effect after the initialization.
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/PropertyResourceBundle.html
Properties#load
Reads a property list (key and element pairs) from the input byte stream. The input stream is in a simple line-oriented format as specified in load(Reader) and is assumed to use the ISO 8859-1 character encoding; that is each byte is one Latin1 character. Characters not in Latin1, and certain special characters, are represented in keys and elements using Unicode escapes as defined in section 3.3 of The Java™ Language Specification.
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/Properties.html#load(java.io.InputStream)

Why is text in Swedish from a resource bundle showing up as gibberish? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to use UTF-8 in resource properties with ResourceBundle
I want to allow internationalization to my Java Swing application. I use a bundle file to keep all labels inside it.
As a test I tried to set a Swedish title to a JButton. So in the bundle file I wrote:
nextStepButton=nästa
And in the Java code I wrote:
nextStepButton.setText(bundle.getString("nextStepButton"));
But the title characters of the button appear wrong at runtime:
I am using the Tahoma font, which supports Unicode.
When I set the button title manually through code it appears fine:
nextStepButton.setText("nästa");
Any idea why it fails in bundle file ?
--------------------------------------------> Edit: Encoding the title:
I have tried encoding the text coming from the bundle file using the code:
nextStepButton.setText(new String(bundle.getString("nextStepButton").getBytes("UTF-8")));
And still the result is:
As per the javadoc, properties files are read using ISO-8859-1.
.. the input/output stream is encoded in ISO 8859-1 character encoding. Characters that cannot be directly represented in this encoding can be written using Unicode escapes ; only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings.
Apart from using the native2ascii tool to convert UTF-8 properties files to ISO-8859-1 properties files, you can also use a custom ResourceBundle.Control so that you can control the loading of properties files and use UTF-8 there. Here's a kickoff example:
public class UTF8Control extends Control {
public ResourceBundle newBundle
(String baseName, Locale locale, String format, ClassLoader loader, boolean reload)
throws IllegalAccessException, InstantiationException, IOException
{
// The below is a copy of the default implementation.
String bundleName = toBundleName(baseName, locale);
String resourceName = toResourceName(bundleName, "properties");
ResourceBundle bundle = null;
InputStream stream = null;
if (reload) {
URL url = loader.getResource(resourceName);
if (url != null) {
URLConnection connection = url.openConnection();
if (connection != null) {
connection.setUseCaches(false);
stream = connection.getInputStream();
}
}
} else {
stream = loader.getResourceAsStream(resourceName);
}
if (stream != null) {
try {
// Only this line is changed to make it to read properties files as UTF-8.
bundle = new PropertyResourceBundle(new InputStreamReader(stream, "UTF-8"));
} finally {
stream.close();
}
}
return bundle;
}
}
Use it as follows:
ResourceBundle bundle = ResourceBundle.getBundle("com.example.i18n.text", new UTF8Control());
This way you don't need to hassle with native2ascii tool and you end up with better maintainable properties files.
See also:
Unicode - How to get the characters right?
Take a look at Java Internationalization FAQ. If you've put non ASCII characters in your .properties file, you must convert it using the native2ascii tool. Then everything should work.
The problem is that the resource bundle properties file is encoded in UTF-8 but your application is loading it using Latin-1.
If you take "LATIN SMALL A WITH DIAERESIS" (E4 in Latin-1 or 0000E4 as a Unicode codepoint) and represent it as UTF-8, you get C3 A4. If you then treat those as Latin-1 bytes you get "LATIN CAPITAL LETTER A WITH TILDE" and the square "CURRENCY SIGN" character ... which is how the characters are showing in your screenshot of the button!!
(Incidentally, here's a neologism for the mangling you get as a result of using the wrong character encoding ... mojibake. Baffle your friends by using it in conversation.)

File is not saved in UTF-8 encoding even when I set encoding to UTF-8

When I check my file with Notepad++ it's in ANSI encoding. What I am doing wrong here?
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file), "UTF8");
try
{
out.write(text);
out.flush();
} finally
{
out.close();
}
UPDATE:
This is solved now, reason for jboss not understanding my xml wasn't encoding, but it was naming of my xml. Thanx all for help, even there really wasn't any problem...
If you're creating an XML file (as your comments imply), I would strongly recommend that you use the XML libraries to output this and write the correct XML encoding header. Otherwise your character encoding won't conform to XML standards and other tools (like your JBoss instance) will rightfully complain.
// Prepare the DOM document for writing
Source source = new DOMSource(doc);
// Prepare the output file
File file = new File(filename);
Result result = new StreamResult(file);
// Write the DOM document to the file
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(source, result);
There's no such thing as plain text. The problem is that an application is decoding character data without you telling it which encoding the data uses.
Although many Microsoft apps rely on the presence of a Byte Order Mark to indicate a Unicode file, this is by no means standard. The Unicode BOM FAQ says more.
You can add a BOM to your output by writing the character '\uFEFF' at the start of the stream. More info here. This should be enough for applications that rely on BOMs.
UTF-8 is designed to be, in the common case, rather indistinguishable from ANSI. So when you write text to a file and encode the text with UTF-8, in the common case, it looks like ANSI to anyone else who opens the file.
UTF-8 is 1-byte-per-character for all ASCII characters, just like ANSI.
UTF-8 has all the same bytes for the ASCII characters as ANSI does.
UTF-8 does not have any special header characters, just as ANSI does not.
It's only when you start to get into the non-ASCII codepoints that things start looking different.
But in the common case, byte-for-byte, ANSI and UTF-8 are identical.
If there is no BOM (and Java doesn't output one for UTF8, it doesn't even recognize it), the text is identical in ANSI and UTF8 encoding as long as only characters in the ASCII range are being used. Therefore Notepad++ cannot detect any difference.
(And there seems to be an issue with UTF8 in Java anyways...)
The IANA registered type is "UTF-8", not "UTF8". However, Java should throw an exception for invalid encodings, so that's probably not the problem.
I suspect that Notepad is the problem. Examine the text using a hexdump program, and you should see it properly encoded.
Did you try to write a BOM at the beginning of the file? BOM is the only thing that can tell the editor the file is in UTF-8. Otherwise, the UTF-8 file can just look like Latin-1 or extended ANSI.
You can do it like this,
public final static byte[] UTF8_BOM = {(byte)0xEF, (byte)0xBB, (byte)0xBF};
...
OutputStream os = new FileOutputStream(file);
os.write(UTF8_BOM);
os.flush();
OutputStreamWriter out = new OutputStreamWriter(os, "UTF8");
try
{
out.write(text);
out.flush();
} finally
{
out.close();
}

Java properties UTF-8 encoding in Eclipse

I've recently had to switch encoding of webapp I'm working on from ISO-xx to utf8. Everything went smooth, except properties files. I added -Dfile.encoding=UTF-8 in eclipse.ini and normal files work fine. Properties however show some strange behaviour.
If I copy utf8 encoded properties from Notepad++ and paste them in Eclipse, they show and work fine. When I reopen properties file, I see some Unicode characters instead of proper ones, like:
Zur\u00EF\u00BF\u00BDck instead of Zurück
but app still works fine.
If I start to edit properties, add some special characters and save, they display correctly, however they don't work and all previously working special characters don't work any more.
When I compare local version with CVS I can see special characters correctly on remote file and after update I'm at start again: app works, but Eclipse displays Unicode chars.
I tried changing file encoding by right clicking it and selecting „Other: UTF8” but it didn't help. It also said: „determined from content: ISO-8859-1”
I'm using Java 6 and Jboss Developer based on Eclipse 3.3
I can live with it by editing properties in Notepad++ and pasting them in Eclipse, but I would be grateful if someone could help me with fixing this in Eclipse.
Answer for "pre-Java-9" is below. As of Java 9, properties files are saved and loaded in UTF-8 by default, but falling back to ISO-8859-1 if an invalid UTF-8 byte sequence is detected. See the Java 9 release notes for details.
Properties files are ISO-8859-1 by definition - see the docs for the Properties class.
Spring has a replacement which can load with a specified encoding, using PropertiesFactoryBean.
EDIT: As Laurence noted in the comments, Java 1.6 introduced overloads for load and store which take a Reader/Writer. This means you can create a reader for the file with whatever encoding you want, and pass it to load. Unfortunately FileReader still doesn't let you specify the encoding in the constructor (aargh) so you'll be stuck with chaining FileInputStream and InputStreamReader together. However, it'll work.
For example, to read a file using UTF-8:
Properties properties = new Properties();
InputStream inputStream = new FileInputStream("path/to/file");
try {
Reader reader = new InputStreamReader(inputStream, "UTF-8");
try {
properties.load(reader);
} finally {
reader.close();
}
} finally {
inputStream.close();
}
Don't waste your time, you can use Resource Bundle plugin in Eclipse
Old Sourceforge page
It is not a problem with Eclipse. If you are using the Properties class to read and store the properties file, the class will escape all special characters.
From the class documentation:
When saving properties to a stream or loading them from a stream, the ISO 8859-1 character encoding is used. For characters that cannot be directly represented in this encoding, Unicode escapes are used; however, only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings.
From the API, store() method:
Characters less than \u0020 and characters greater than \u007E are written as \uxxxx for the appropriate hexadecimal value xxxx.
Properties props = new Properties();
URL resource = getClass().getClassLoader().getResource("data.properties");
props.load(new InputStreamReader(resource.openStream(), "UTF8"));
Works like a charm
:-)
There are too many points in the process you describe where errors can occur, so I won't try to guess what you're doing wrong, but I think I know what's happening under the hood.
EF BF BD is the UTF-8 encoded form of U+FFFD, the standard replacement character that's inserted by decoders when they encounter malformed input. It sounds like your text is being saved as ISO-8859-1, then read as if it were UTF-8, then saved as UTF-8, then converted to the Properties format using native2ascii using the platform default encoding (e.g., windows-1252).
ü => 0xFC // save as ISO-8859-1
0xFC => U+FFFD // read as UTF-8
U+FFFD => 0xEF 0xBF 0xBD // save as UTF-8
0xEF 0xBF 0xBD => \u00EF\u00BF\u00BD // native2ascii
I suggest you leave the "file.encoding" property alone. Like "file.separator" and "line.separator", it's not nearly as useful as you would expect it to be. Instead, get into the habit of always specifying an encoding when reading and writing text files.
Properties props = new Properties();
URL resource = getClass().getClassLoader().getResource("data.properties");
props.load(new InputStreamReader(resource.openStream(), "UTF8"));
this works well in java 1.6. How can i do this in 1.5, Since Properties class does not have a method to pars InputStreamReader.
There is much easier way:
props.load(new InputStreamReader(new FileInputStream("properties_file"), "UTF8"));
Just another Eclipse plugin for *.properties files:
Properties Editor
You can define UTF-8 .properties files to store your translations and use ResourceBundle, to get values. To avoid problems you can change encoding:
String value = RESOURCE_BUNDLE.getString(key);
return new String(value.getBytes("ISO-8859-1"), "UTF-8");
This seems to work only for some characters ... including special characters for German, Portuguese, French. However, I ran into trouble with Russian, Hindi and Mandarin characters. These are not converted to Properties format 'native2ascii', instead get saved with ?? ?? ??
The only way I could get my app to display these characters correctly is by putting them in the properties file translated to UTF-8 format - as \u0915 instead of क, or \u044F instead of я.
Any advice?
I recommend you to use Attesoro (http://attesoro.org/). Is simple and easy to use. And is made in java.
I found a solution to this problem. You need to write file (*.properties) use standard "Properties", example:
Properties properties = new Properties();
properties.put("DB_DRIVER", "com.mysql.cj.jdbc.Driver");
properties.put("DB_URL", "jdbc:mysql://localhost:3306/world");
properties.put("DB_USERNAME", "root");
properties.put("DB_PASSWORD", "1111");
properties.put("DB_AUTO_RECONNECT", "true");
properties.put("DB_CHARACTER_ENCODING", "UTF-8");
properties.put("DB_USE_UNICODE", "true");
try {
properties.store(new FileWriter("src/connectionDB/base/db.properties"), "Comment writes");
} catch (IOException e) {
System.out.println(e.getMessage());
}
then, you can read file without mistakes:
try {
properties.load(new FileReader("src\\connectionDB\\base\\db.properties"));
properties.list(System.out);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
or
try {
String str = new String(Files.readAllBytes(Paths.get("src/connectionDB/base/db.properties")), StandardCharsets.UTF_8);
properties.load(new StringReader(str));
properties.list(System.out);
} catch (IOException e) {
System.out.println(e.getMessage());
}
or
InputStream inputStream = getClass().getClassLoader().getResourceAsStream("connectionDB/base/db.properties");
try {
Reader reader = new InputStreamReader(inputStream, "UTF-8");
try {
properties.load(reader);
properties.list(System.out);
} catch (IOException e) {
System.out.println(e.getMessage());
}
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
never mind....
then close the code that creates this file and use file *.properties
If the properties are for XML or HTML, it's safest to use XML entities. They're uglier to read, but it means that the properties file can be treated as straight ASCII, so nothing will get mangled.
Note that HTML has entities that XML doesn't, so I keep it safe by using straight XML: http://www.w3.org/TR/html4/sgml/entities.html

Categories