How to handle multiple languages in Java apps? - java

I am writing a program use JSP and Java. How can I use property files to support multiple languages?
And by the way, there are always some things like \u4345.
What is this? How do they come?

For the multiple languages, check out the ResourceBundle class.
About the \u4345, this is one of the dark and very annoying legacy corners of Java. The property files need to be in ASCII, so that all non-ASCII characters need to encoded as \uxxxx (their Unicode value). You can convert a file to use this encoding with the native2ascii command line tool.
If you are using an IDE or a build tool, there should be an option to invoke this automatically.
If the property file is something you have full control over yourself, you can starting from Java6 also use UTF-8 (or any other character set) directly in the property file, and specify that encoding when you load it:
// new in Java6
props.load(new InputStreamReader(new FileInputStream(file), 'UTF-8'));
Again, this only works if you load the Properties yourself, not if someone else does it, such as a ResourceBundle (used for internationalization).

there is an entire tutorial on http://java.sun.com/docs/books/tutorial/i18n/index.html
This specifies and explains about anything you need to know.

The Java tutorial on i18n has been mentioned already by Peter. If you are building JSPs you probably want to look at the JSTL which basically allows you to use the functionality of ResourceBundle through JSP tags.

Related

Using Freemarker to generate Java .properties files

I am currently using Freemarker to generate a number of configuration files. So far these have been either xml files or properietary format text files. I now would like to generate some Java .properties files but have hit a couple of issues.
The first is character encoding. As far as I can see simply adding
<#ftl encoding="8859_1">
to the start of the file should sort this out.
The second issue is the escaping of the keys and values. The keys are probably ok as I would be hardcoding these in the template anyway so I can escape them in the template. The values will be coming from my data model and so will need escaping.
I can see how I can create my own user defined directive and by installing it as a shared variable use it in my template.
Is this the best or only way to do this? I would have thought generating .properties files is something that has been tackled many times before and was hoping something may already exist before I start writing my own code.
The class java.util.Properties got various store methods to save properties to OutputStreams or files. This seems more preferable than trying to adapt freemarker.
I don't get what are the charset issues that are specific to generating properties files. But note that the charset of the template and the charset of the output are independent, so you might as well use the same charset for these templates as for the others (like maybe UTF-8).
As of escaping, always use auto-escaping if you can. In 2.3.24 that will be especially sleek, but unless you are allowed to use unreleased versions, you had to wait for that until the end of February or so. (If you can use unreleased/unofficial versions, you can find out about the internal testing releases in the developer list archive.) Before 2.3.24, there's <#escape x as propEsc(x)>all the template content here</#escape>, where propEsc is a TemplateMethodModelEx (not a TemplateDirectiveModel) that you have added as shared variable or such. And so all ${...}-s will be magically escaped.

What's the purpose of Properties.loadFromXML() and Properties.storeToXML() methods?

I am designing a simple library that deals with properties files.
I noticed that since JRE 1.5 the class Properties defines methods like:
public synchronized void loadFromXML(InputStream in)
public void storeToXML(OutputStream os, String comment)
I am questioning the fact that this is a real enhancement in the API of this class. Properties files have been, since JRE 1.5 text based files, and the newly introduced XML format is not adding anything to the functionalities, other than the possibility to use a different forma which is
more verbose
more complex (to understand, to change, to parse)
more inefficient (it uses dom internally to parse into an hastable: it consumes more memory, it requires helper classes in the implementation, and most likely is also slower)
more fragile (xml requires escaping of characters <>&"' while properties only need to escape backslashes, since it also supports Java backslash escaping)
it breaks backward compatibility of the programs using it, since users running JDK 1.4 won't be able to read xml properties. (ok, who cares...)
So I fail to understand the reason behind why engineers in Sun added this feature.
The question is:
Does anybody finds some advantage of using an XML-based properties files over a traditional text based one?
I need to evaluate this problem, since I don't want to add a useless feature to my simple library that I cited before.
Did you ever used an XML-based properties file over a Java Properties file? And why?
Note: same question can be made for Log4J xml file format, but at least Log4J xml format adds nesting ability and some sort of syntax which has some meaning, and I do understand that. But with this xml format for properties, I don't.
If staying within the Java environment, using a Java properties file works great. Even if you expect other programming languages to interact with your library, you'll probably be ok with a 'regular' properties file. However, for hierarchical data, XML is the standard. The reason you may want to support this change, and possibly the reason why Sun included it, is that other programming languages have extensive libraries for parsing XML files for hierarchical data.
The reason I'm answering is because I have actually used this feature before! But not for a great reason. In one program I'm working on now, I've found it easiest to keep a set of data in a properties object and I output the object to XML so that it can later be read by Python. At the moment, the data is further manipulated in a Python script and more children are added to the XML file. Without being able to output easily to XML, this would be a little more painful.
If I had the time, I wouldn't bother outputting to XML though. The main reason I'm using the Python code that takes in the XML is because somebody else wrote it and I'm temporarily using it until I have the time to reevaluate that section of my program and re-code it.
So there's a reason for using the XML! It isn't a good one, but it's a reason.
I imagine there are other cases like this where having the properties outputted as an XML aids in compatibility with other languages, since most languages have a robust XML parsing library and it makes it easier to manipulate hierarchical data. And in scientific programming, it seems you rarely get the luxury of sticking to one language.
Some points:
You can use standard, cross-platform tools to create it
You don't need to worry about peculiarities of escaping and character encoding, as you can use standard tools, which actually makes it more robust. The old properties file format is poorly specified.
Standard, cross-platform tools can use the data.
For most applications Java is used in, a bit of start up time isn't going to make much difference (particularly given the start up time of the rest of the system).
Java SE 1.6 is a bout to complete its end-of-life. Pre-1.5 isn't particularly relevant for Java SE (or EE).
But no, I've never seen it actually used.
Afaik the XML format is encouraged because of the encoding: (by specs) strictly ASCII for plain files (may I suggest you http://mojo.codehaus.org/native2ascii-maven-plugin/), UTF-8 (default) for XML property files as stated in http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Properties.html
edit: I beg your pardon: ISO-8859-1 for property plain files

Java problems with UTF-8 in different OS

I'm programing with other people an application to college homework, and sometime we use non-english characters in comments or in Strings displayed in the views. The problem is that everyone of use is using a different OS and sometimes different IDE's to program.
Concretely, one is using MacOS, another Windows7, and another and me Ubuntu Linux. Furthermore, all of them use Eclipse and I use gedit. We have no idea if Eclipse or gedit are configurable to work propertly with UTF8 bussiness, at least I don't found nothing for mine.
The fact is that what I write with non-english characters appears in Windows & MacOS virtual machines with strange symbols and vice-versa , and sometimes, what my non-linux friends write provokes compilation warnings like this: warning: unmappable character for encoding UTF8.
Do you have any Idea to solve this? It is not really urgent but it will be a help.
Thank you.
Not sure about gedit, but you can certainly configure eclipse to use whatever encoding you like for source code. It's part of the project properties (and saved in the .settings directory within the project).
Eclipse works fine with UTF-8. See Michael's answer about configuring it. Maybe for Windows and/or MacOS it is really necessary. Ubuntu uses UTF-8 as the default encoding so I don't think it's necessary to configure Eclipse there.
As for Gedit, this picture shows that it is possible to change the encoding when saving a file in Gedit.
Anyway, you need to make sure that all of you use UTF-8 for your sources. This is the only reasonable way to achieve cross-platform portability of your sources.
You could avoid the issue in Strings by using character escape sequences, and using only ASCII encoding for the files.
For example, an en dash can be expressed as "\u2013".
You can quickly search for the Java code for individual characters here.
As Sergey notes below, this works best for small numbers of non-ASCII characters. An alternative is to put all the UTF-8 strings in resource files. Eclipse provides a handy wizard for this.
If your UTF8 file contains a BOM (byte order mark) then you will have a problem. It is a known bug , see here and here.
The BOM is optional with UTF8 and most of the time it is not there because it breaks many tools (like Javadoc, XML parser,...).
More info here.

FileReader not found in Java ME

The class java.io.FileReader not found in Java ME.
I need this in order to get the file and then parse it with an xml parser.
Anyone know any alternatives for this class?
*added
using CLDC profile. The xml file to be read is in the JAR.
That's because Java ME provides only a limited subset of the java.io package. You need to use the java.microedition.io package instead.
For actual file I/O you'll need to use the FileConnection class provided by JSR-75.
What Java ME profile are you using? The CLDC does not support the concept of files at all.
In general, FileReader is nothing but a convenience class that wraps an InputStreamReader areound a FileInputStream. It's also very broken because it does not allow specifying the encoding, and should therefore almost never be used.
It would be especially wrong to use it to read XML because proper XML data specifies its encoding, and a proper XML parser will handle that, so you really should pass binary data to the XML parser.
So if you're on the CDC profile, just use a FileInputStream directly.
the question is a bit ambiguous. I think Joachim's answer might be only partial if you are trying to read a local file. I'm certainly not sure though.
If the file is stored as a resource in your JAR, you can access it through the getResourceAsStream method in Class.
If the file is a local file on the file system and if I recall correctly, you need JSR-75 support. Over at Sun's developer page there is an introduction to JSR 75 and the fileconnection
API.

Localizing a JSF 1.2 application with UTF-8 resources

(WARNING: this is my first java application, coming from .NET, so don't bash me if I write too much garbage)
I'm developing a simple JSF 1.2 web application which should support Russian, Chinese, and other languages outside ISO 8859-1, which is automatically used in Properties.load().
Is there a way to use the Properties loaded from XML files, with Properties.loadFromXml(), inside JSF, without writing too much code?
I know there are alternative ways to do so (writing my own loader, escaping the characters...), but I'd really love to find a simple solution, and I don't see it in all the forums I checked.
Thanks in advance for any help
I think the most widely used approach is to encode your .properties files with unicode escape sequences. This can easily be done with the AnyEdit plugin for Eclipse.
The problem is that ResourceBundle uses the Properties(inputStream) constructor, rather than Properties(reader).
You can use your own LoadBundle component instead of f:loadBundle to overcome this, but you'll have to:
extend the original one
define it as custom component (facelets and/or jsp)
define a new ResourceBundle implementation
instantiate it, using new InputStreamReader(classloader.getResourceAsStream(..))

Categories