Saving special characters in properties file

Saving special characters in properties file - java

I have property with string that contains some special characters. After I save it to properties file I have:
BB\u0161BB=0
I don't like character represented with \u0161 . Why it can't save like character I see it on the screen and type from keyboard?
UPD
What is the easiest way to read ini-file architecture file that contains special character?

That's how Properties files are defined to behave. Any other system is likely to use UTF-8 which might not be readable either.
As your character is outside the range of an ISO-8859-1 encoding, it has to use unicode instead.

You can!
Encode your file in unicode (UTF-8, for instance) using another application (Notepad++ is nice, for instance) and read your properties file like this:
File file = ... ;
Properties properties = new Properties();
try (FileInputStream in = new FileInputStream(file);
Reader reader = new InputStreamReader(in, StandardCharsets.UTF_8)) {
properties.load(reader);
}
// use properties
There you go, properties file in any charset that you can read and use.

You can't. From javadoc:
...the input/output stream is encoded in ISO 8859-1 character
encoding. Characters that cannot be directly represented in this
encoding can be written using Unicode escapes as defined in section
3.3 of The Java™ Language Specification; only a single 'u' character is allowed in an escape sequence.
Please refer to this How to use UTF-8 in resource properties with ResourceBundle

Properties files are ISO-8859-1 encoded, so characters outside of that set need to be \uXXXX escaped.

Related

Really true that .properties files must be encoded in ISO-8859-1 before java 9?

I often read that .properties file should/must be encoded in ISO-8859-1
before Java 9. I never had problems to load and store .properties in UTF-8.
Or is it recommended for Resource Bundle files only to use 8859-1 ?
public static void utf8_test()
{
Path pfile = Paths.get("C:\\temp_\\properties_utf8.properties");
Path pfileOut = Paths.get("C:\\temp_\\properties_utf-8_out.properties");
Properties props = new Properties();
try ( BufferedReader br = Files.newBufferedReader(pfile,StandardCharsets.UTF_8);
BufferedWriter bw = Files.newBufferedWriter(pfileOut,StandardCharsets.UTF_8)
)
{
props.load(br);
props.setProperty("test","test prop;Гадание по телефону;öüµäß#;Euro sign;€");
props.list(System.out);
props.store(bw,"comment");
} catch (Exception e) { e.printStackTrace(); }
} //--- end
The input file is:
foo = aaa
utf_test = Гадание
bar = bbb
The output file is:
#comment
#Wed Dec 25 15:11:28 CET 2019
utf_test=Гадание
bar=bbb
foo=aaa
test=test prop;Гадание по телефону;öüµäß#;Euro sign;€

Properties File
Here's the class documentation of Properties (Java 13):
The load(Reader) / store(Writer, String) methods load and store properties from and to a character based stream in a simple line-oriented format specified below. The load(InputStream) / store(OutputStream, String) methods work the same way as the load(Reader) / store(Writer, String) pair, except the input/output stream is encoded in ISO 8859-1 character encoding [emphasis added]. Characters that cannot be directly represented in this encoding can be written using Unicode escapes as defined in section 3.3 of The Java™ Language Specification; only a single 'u' character is allowed in an escape sequence.
The loadFromXML(InputStream) and storeToXML(OutputStream, String, String) methods load and store properties in a simple XML format. By default the UTF-8 character encoding is used, however a specific encoding may be specified if required. Implementations are required to support UTF-8 and UTF-16 and may support other encodings. An XML properties document has the following DOCTYPE declaration: [...]
And here's the documentation of Properties#load(InputStream) (again, Java 13):
Reads a property list (key and element pairs) from the input byte stream. The input stream is in a simple line-oriented format as specified in load(Reader) and is assumed to use the ISO 8859-1 character encoding [emphasis added]; that is each byte is one Latin1 character. Characters not in Latin1, and certain special characters, are represented in keys and elements using Unicode escapes as defined in section 3.3 of The Java™ Language Specification.
The specified stream remains open after this method returns.
Resource Bundles
While here's the class documentation of PropertyResourceBundle (again, Java 13):
API Note:
PropertyResourceBundle can be constructed either from an InputStream or a Reader, which represents a property file. Constructing a PropertyResourceBundle instance from an InputStream requires that the input stream be encoded in UTF-8. By default, if a MalformedInputException or an UnmappableCharacterException occurs on reading the input stream, then the PropertyResourceBundle instance resets to the state before the exception, re-reads the input stream in ISO-8859-1, and continues reading. If the system property java.util.PropertyResourceBundle.encoding is set to either "ISO-8859-1" or "UTF-8", the input stream is solely read in that encoding, and throws the exception if it encounters an invalid sequence. If "ISO-8859-1" is specified, characters that cannot be represented in ISO-8859-1 encoding must be represented by Unicode Escapes as defined in section 3.3 of The Java™ Language Specification whereas the other constructor which takes a Reader does not have that limitation. Other encoding values are ignored for this system property. The system property is read and evaluated when initializing this class. Changing or removing the property has no effect after the initialization.
It's this class that has changed. If you look at the Java 8 documentation of PropertyResourceBundle you'll see it only accepts ISO 8859-1 encoding when constructed with an InputStream.

throw exception when string is not encoded in UTF-8

I've got method where one of input attributes is String xml. I just want to create control for encoding of that xml. If any character is in other encoding that UTF-8, error will be thrown.
can you please tell me the easiest way how to create and test it?
I've used something like this:
String xml = IOUtils.toString(new FileInputStream("c:/encoding.xml"));
Document doc = builder.parse(IOUtils.toInputStream(xml, "UTF-8"));
added letters like Ľ,Š,Ť,Ž,ľ,š,ť,ž and save it as cp1250 file.
but no error.
what am I doing wrong?

This cannot be done natively in Java. A file is just a string of bytes, they can be interpreted however you feel like, Java by default has no way to add meaning. I recommend using this library (no I didn't write it):
http://code.google.com/p/juniversalchardet/
Follow these instructions (copy pasted from that link):
How to use it
Construct an instance of org.mozilla.universalchardet.UniversalDetector.
Feed some data (typically several thousands bytes) to the detector by calling UniversalDetector.handleData().
Notify the detector of the end of data by calling UniversalDetector.dataEnd().
Get the detected encoding name by calling UniversalDetector.getDetectedCharset().
Don't forget to call UniversalDetector.reset() before you reuse the detector instance.

String xml = IOUtils.toString(new FileInputStream("c:/encoding.xml"));
If this IOUtils is org.apache.commons.io.IOUtils then its Javadoc says
"Get the contents of an InputStream as a String using the default character encoding of the platform."
As you are saving as cp1250, I guess cp1250 is also your platform character encoding. What your code would be doing is
Read the file as a byte stream
Convert the byte stream to chars using cp1250 (platform encoding)
Transform the chars to Java internal representation (UTF-16)
Convert from UTF-16 to UTF-8
Create XML document
That will always work as cp1250 really is your file encoding, UTF-16 has every character in cp1250 and UTF-8 has every character in UTF-16.
If you want to read the bytes as UTF-8 and avoid automatic conversions, you should use one of the two-parameter variant of IOUtils.toString():
public static String toString(InputStream input, Charset encoding)
public static String toString(InputStream input, String encoding)
So I would try:
// Helper import: I always forget if the constant is "UTF8" or "UTF-8"
import org.apache.commons.lang.CharEncoding;
String xml = IOUtils.toString(new FileInputStream("c:/encoding.xml"), CharEncoding.UTF_8);
Document doc = builder.parse(IOUtils.toInputStream(xml, CharEncoding.UTF_8));
The rule of thumb here is: NEVER do any byte-to-string / string-to-byte conversion without specifying the source / destination encoding.
A minor rule of thumb would be: Unless you need to use some other encoding, use UTF-8 everywhere.
Both of those rules of thumb are independent of your programming language of choice.

Java: How to write "Arabic" in properties file?

I want to write "Arabic" in the message resource bundle (properties) file but when I try to save it I get this error:
"Save couldn't be completed
Some characters cannot be mapped using "ISO-85591-1" character encoding. Either change encoding or remove the character ..."
Can anyone guide please?
I want to write:
global.username = اسم المستخدم
How should I write the Arabic of "username" in properties file? So, that internationalization works..
BR
SC

http://sourceforge.net/projects/eclipse-rbe/
You can use the above plugin for eclipse IDE to make the Unicode conversion for you.

As described in the class reference for "Properties"
The load(Reader) / store(Writer, String) methods load and store properties from and to
a character based stream in a simple line-oriented format specified below.
The load(InputStream) / store(OutputStream, String) methods work the same way as the
load(Reader)/store(Writer, String) pair, except the input/output stream is encoded in
ISO 8859-1 character encoding. Characters that cannot be directly represented in this
encoding can be written using Unicode escapes ; only a single 'u' character is allowed
in an escape sequence. The native2ascii tool can be used to convert property files to
and from other character encodings.

Properties-based resource bundles must be encoded in ISO-8859-1 to use the default loading mechanism, but I have successfully used this code to allow the properties files to be encoded in UTF-8:
private static class ResourceControl extends ResourceBundle.Control {
#Override
public ResourceBundle newBundle(String baseName, Locale locale,
String format, ClassLoader loader, boolean reload)
throws IllegalAccessException, InstantiationException,
IOException {
String bundlename = toBundleName(baseName, locale);
String resName = toResourceName(bundlename, "properties");
InputStream stream = loader.getResourceAsStream(resName);
return new PropertyResourceBundle(new InputStreamReader(stream,
"UTF-8"));
}
}
Then of course you have to change the encoding of the file itself to UTF-8 in your IDE, and can use it like this:
ResourceBundle bundle = ResourceBundle.getBundle(
"package.Bundle", new ResourceControl());

new String(ret.getBytes("ISO-8859-1"), "UTF-8"); worked for me.
property file saved in ISO-8859-1 Encodiing.

If you are using Eclipse, you can choose "Window-->Preferences" and then filter on "content types". Then you should be able to set the default encoding. There's a screen shot showing this at the top of this post.

This is mainly an editor configuration issue. If you're working in Windows, you can edit the text in an editor that supports UTF-8. Notepad or Eclipse built-in editor should be more than enough, provided you've saved file as UTF-8. In Linux, I've used gedit and emacs successfully. In Notepad, you can do this by clicking 'Save As' button and choosing 'UTF-8' encoding. Other editors should have similar feature. Some editors might require font change in order to display letters correctly, but it seems that you don't have this issue.
Having said that, there are other steps to consider when performing i18n for arabic. You can find some useful links below. Make sure to use native2ascii on properties file before using it otherwise it might not work. I spent a lot of time until I figured this one out.
Tomcat webapps
Using nativ2ascii with properties files

Besides native2ascii tool mentioned in other answers there is a java Open Source library that can provide conversion functionality to be used in code
Library MgntUtils has a Utility that converts Strings in any language (including special characters and emojis to unicode sequence and vise versa:
result = "Hello World";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);
The output of this code is:
\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
Hello World
The library can be found at Maven Central or at Github It comes as maven artifact and with sources and javadoc
Here is javadoc for the class StringUnicodeEncoderDecoder

Use cyrillic .properties file in eclipse project

I'm developing a small project and I'd like to use internationalization for it. The problem is that when I try to use .properties file with cyrillic symbols inside, the text is displayed as rubbish. When I hard-code the strings it's displayed just fine.
Here is my code:
ResourceBundle labels = ResourceBundle.getBundle("Labels");
btnQuit = new JButton(labels.getString("quit"));
And in my .properties file:
quit = Изход
And I get rubbish. When i try
btnQuit = new JButton("Изход);
It is displayed correctly. As far as I am aware, UTF-8 is the encoding used for the files.
Any ideas?

AnyEdit is an eclipse-plugin that allows you to easily convert your your properties files from and to unicode notation. (avoiding the use of command-line tools like native2ascii)
If you were using the Properties class alone (without resource bundle), since Java 1.6 you have the option to load the file with a custom encoding, using a Reader (rather than an InputStream)
I'd guess you can also use new PropertyResourceBundle(reader), rather than ResourceBundle.getBundle(..), where reader is:
Reader reader = new BufferedReader(new InputStreamReader(
getClass().getResourceAsStream("messages.properties"), "utf-8")));

Properties are ISO-8859-1 encoded by default. You must use native2ascii to convert your UTF-8 properties to a valid ISO-8859-1 properties file containing unicode escape sequences for all non-ISO-8859-1 characters.

File is not saved in UTF-8 encoding even when I set encoding to UTF-8

When I check my file with Notepad++ it's in ANSI encoding. What I am doing wrong here?
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file), "UTF8");
try
{
out.write(text);
out.flush();
} finally
{
out.close();
}
UPDATE:
This is solved now, reason for jboss not understanding my xml wasn't encoding, but it was naming of my xml. Thanx all for help, even there really wasn't any problem...

If you're creating an XML file (as your comments imply), I would strongly recommend that you use the XML libraries to output this and write the correct XML encoding header. Otherwise your character encoding won't conform to XML standards and other tools (like your JBoss instance) will rightfully complain.
// Prepare the DOM document for writing
Source source = new DOMSource(doc);
// Prepare the output file
File file = new File(filename);
Result result = new StreamResult(file);
// Write the DOM document to the file
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(source, result);

There's no such thing as plain text. The problem is that an application is decoding character data without you telling it which encoding the data uses.
Although many Microsoft apps rely on the presence of a Byte Order Mark to indicate a Unicode file, this is by no means standard. The Unicode BOM FAQ says more.
You can add a BOM to your output by writing the character '\uFEFF' at the start of the stream. More info here. This should be enough for applications that rely on BOMs.

UTF-8 is designed to be, in the common case, rather indistinguishable from ANSI. So when you write text to a file and encode the text with UTF-8, in the common case, it looks like ANSI to anyone else who opens the file.
UTF-8 is 1-byte-per-character for all ASCII characters, just like ANSI.
UTF-8 has all the same bytes for the ASCII characters as ANSI does.
UTF-8 does not have any special header characters, just as ANSI does not.
It's only when you start to get into the non-ASCII codepoints that things start looking different.
But in the common case, byte-for-byte, ANSI and UTF-8 are identical.

If there is no BOM (and Java doesn't output one for UTF8, it doesn't even recognize it), the text is identical in ANSI and UTF8 encoding as long as only characters in the ASCII range are being used. Therefore Notepad++ cannot detect any difference.
(And there seems to be an issue with UTF8 in Java anyways...)

The IANA registered type is "UTF-8", not "UTF8". However, Java should throw an exception for invalid encodings, so that's probably not the problem.
I suspect that Notepad is the problem. Examine the text using a hexdump program, and you should see it properly encoded.

Did you try to write a BOM at the beginning of the file? BOM is the only thing that can tell the editor the file is in UTF-8. Otherwise, the UTF-8 file can just look like Latin-1 or extended ANSI.
You can do it like this,
public final static byte[] UTF8_BOM = {(byte)0xEF, (byte)0xBB, (byte)0xBF};
...
OutputStream os = new FileOutputStream(file);
os.write(UTF8_BOM);
os.flush();
OutputStreamWriter out = new OutputStreamWriter(os, "UTF8");
try
{
out.write(text);
out.flush();
} finally
{
out.close();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.