Internationalizing java software

Internationalizing java software - java

For my internship i've been asked to do some research on software internationalization and the current practices and solutions.
I've done some research and have come to no viable solution. My project manager has asked that I ask on stackoverflow,
What are the current practices that you guys at your job do in order to internationalize your Java software?
EDIT
The following is a summary of my research in case any other person is interested in my findings:
As the software is written in Java, RessourceBundles are obviously used. RessourceBundles provide good key value lookup with fallback to default values if no specific translation for the current locale exists. ResourceBundles are also not limited to translation of text but to internationalization of, well, resources. For example, color or images mean differente things for different cultures.
While all that is nice, just purely using Java PropertiesResourceBundles fails to provide metadata for the translator and fails to handle plural forms.
GNU Gettext takes an alternate approche to internationalization. Messages are written in source code in english and then extracted and stored into a file. The extraction program searches for function calls and extracts the parameters. For example, tr("Hello, World!") the command line utility xgettext would search for occurences of the function "tr" and extract all string literals.
Java implementations of gettext exist, such as:
https://code.google.com/p/gettext-commons/
https://github.com/jhorstmann/i18n
What gettext provides that ResourceBundles don't is plural handling and context for translations.

Have a read of this trail as it should answer most of your questions.
For web applications we use the standard facilities offered by JavaEE. That essentially means passing a message bundle into a JSF page and then using mark up that looks like this #{msg.hello} in the page. "msg" is the name of the message bundle and "hello" is the key that will be used to look up the translated string.
The translations are all held in properties files which have a standardized format and naming convention. The process works in much the same way for client applications although I don't feel it's quite as smooth
As I understand it professional translators have software that will load properties files and assist them in producing the translations. Adding comments to your properties files is useful so the translators have some context when translating.

In addition to other answers I would suggest using some technique/software that can analyze/check that all localization resources in your project are in sync.
That usually should be done during build time, so you can find/catch errors earlier.
One of such tools that I personally use and would recommend is i18n-maven-plugin
Hope this helps.

Related

Good practice for layered application with internationalization

I'm designing a new application in JSE which I want to internationalize.
I've never done such an application. I'm looking for the best practices about the internationalization. The application while be writing the translated data in files or DB. I've searched about best practices but I didn't found anything about my main question(the first one).
Should I put all the internationalization data in some layer or next to the object they are about ?
Could I directly use the properties files as a kind of enum to do a switch case ?
Or can I reverse engineer the data catched and know the default internationalize value and work with it?

I did encounter several strategies. I would start with a properties file.
One factor is that the data must be professionally maintained:
keep it in version control.
keep a version number for us humans, "1.0.23"
keep the texts ordered and nice, to help translation.
keep a second properties file with a glossary for consistent translation.
Undermore I did see generating properties or java ListResourceBundles from DocBook XML, Excel, translation memories. And yes, database.
Maintenance of data must be done careful, as several different parties will use the text at different times.
Programming tools, consistency checks and preparing data, communicating are tasks not to neglect.
Properties files are not entirely ideal, but IDEs have generally some support for them.
Set up everything for UTF-8, though take notice that properties files use ISO-8859-1, but you can use \uXXXX escaping or do a encoding conversion in your build process. ListResourceBundle java sources, generated than, would be an alternative.

MessageResouce files in JSF

My application suppose to work with different languages. In order to provide this feature, I have created different messageResource files for each language.
Each resource files contain the same keys but values are different according to which language application is running. I load the specific language resource file on application load up.
This is working very fine.
However, as we are adding more features, the resource files for each language becoming very long, which makes it difficult to manage (edit) for non-techy person like Content editor guy.
Therefore, I would like to know, how can I redesign or remodularise in such a way that it will be easy to manage for Content Writers?
I hope I clear the scenario but please shout, if any thing needs?

I do not know a i18n standard mechanism in the Java world to split message properties in chunks. However, I have made good experiences with a standalone properties-edit-tool like jLokalize. It's pretty user-friendly even for non-programmers: hand over the properties files to the content writer and let him/her load the files in the editor.

What's the purpose of Properties.loadFromXML() and Properties.storeToXML() methods?

I am designing a simple library that deals with properties files.
I noticed that since JRE 1.5 the class Properties defines methods like:
public synchronized void loadFromXML(InputStream in)
public void storeToXML(OutputStream os, String comment)
I am questioning the fact that this is a real enhancement in the API of this class. Properties files have been, since JRE 1.5 text based files, and the newly introduced XML format is not adding anything to the functionalities, other than the possibility to use a different forma which is
more verbose
more complex (to understand, to change, to parse)
more inefficient (it uses dom internally to parse into an hastable: it consumes more memory, it requires helper classes in the implementation, and most likely is also slower)
more fragile (xml requires escaping of characters <>&"' while properties only need to escape backslashes, since it also supports Java backslash escaping)
it breaks backward compatibility of the programs using it, since users running JDK 1.4 won't be able to read xml properties. (ok, who cares...)
So I fail to understand the reason behind why engineers in Sun added this feature.
The question is:
Does anybody finds some advantage of using an XML-based properties files over a traditional text based one?
I need to evaluate this problem, since I don't want to add a useless feature to my simple library that I cited before.
Did you ever used an XML-based properties file over a Java Properties file? And why?
Note: same question can be made for Log4J xml file format, but at least Log4J xml format adds nesting ability and some sort of syntax which has some meaning, and I do understand that. But with this xml format for properties, I don't.

If staying within the Java environment, using a Java properties file works great. Even if you expect other programming languages to interact with your library, you'll probably be ok with a 'regular' properties file. However, for hierarchical data, XML is the standard. The reason you may want to support this change, and possibly the reason why Sun included it, is that other programming languages have extensive libraries for parsing XML files for hierarchical data.
The reason I'm answering is because I have actually used this feature before! But not for a great reason. In one program I'm working on now, I've found it easiest to keep a set of data in a properties object and I output the object to XML so that it can later be read by Python. At the moment, the data is further manipulated in a Python script and more children are added to the XML file. Without being able to output easily to XML, this would be a little more painful.
If I had the time, I wouldn't bother outputting to XML though. The main reason I'm using the Python code that takes in the XML is because somebody else wrote it and I'm temporarily using it until I have the time to reevaluate that section of my program and re-code it.
So there's a reason for using the XML! It isn't a good one, but it's a reason.
I imagine there are other cases like this where having the properties outputted as an XML aids in compatibility with other languages, since most languages have a robust XML parsing library and it makes it easier to manipulate hierarchical data. And in scientific programming, it seems you rarely get the luxury of sticking to one language.

Some points:
You can use standard, cross-platform tools to create it
You don't need to worry about peculiarities of escaping and character encoding, as you can use standard tools, which actually makes it more robust. The old properties file format is poorly specified.
Standard, cross-platform tools can use the data.
For most applications Java is used in, a bit of start up time isn't going to make much difference (particularly given the start up time of the rest of the system).
Java SE 1.6 is a bout to complete its end-of-life. Pre-1.5 isn't particularly relevant for Java SE (or EE).
But no, I've never seen it actually used.

Afaik the XML format is encouraged because of the encoding: (by specs) strictly ASCII for plain files (may I suggest you http://mojo.codehaus.org/native2ascii-maven-plugin/), UTF-8 (default) for XML property files as stated in http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Properties.html
edit: I beg your pardon: ISO-8859-1 for property plain files

Automatically build resource class based of XML in Java

In Android applications, resources are specified in xml documents, which automatically are built into the R class, readily accessible within the source code as strongly typed.
Is there any way I could use a similar approach for a regular Java desktop application?
What I'd like to accomplish, is both the removal of strings from the code (as a separation of "layers", more or less) and to make it easy to add support for localization, by simply telling the program to choose the xml file corresponding to the desired language.
I've googled around a bit, but the things I'm looking for seem to be drowning in results about parsing or outputting xml, rather than tools utilizing xml to generate code.

Eclipse's message bundle implementation (used by plugins for example) integrates with the Externalize Strings feature and generates both a static class and a resource properties file for your strings:
http://www.eclipse.org/eclipse/platform-core/documents/3.1/message_bundles.html
For this integration to work Eclipse needs to see org.eclipse.osgi.util.NLS on the class path. From memory, the dependencies of the libraries it was available in were a little tricky for the project I used this approach in, so I just got the source and have it as a stand-alone class in my core module (see the comments for more on that).
It provides the type safety you're looking for and the IDE features save a lot of time. I've found no downsides to the approach so far.
Edit: this is actually what ghostbust555 mentioned in the comments, but not clear in that article that this isn't limited to Eclipse plugins and you refer to your resources via static members of a messages class.
I haven't seen any mention of others using this approach with their own applications, but to me it makes complete sense given the IDE integration and type safety.

I'm not sure if this is what you mean but check out internationalization- http://netbeans.org/kb/docs/java/gui-automatic-i18n.html

Are you looking for something that parses XML files and generates Java instances of similar "struct-like" objects, like JAXP, and JAXB?

I came across ResGen which, given resource bundle XML files generates Java files that can be used to access the resources in a type-safe way.
http://eigenbase.sourceforge.net/resgen/

Java; Runtime Interpretation; Strategies To Add Plugins

I'm beginning to start on my first large project. It will be a program very similar to Rosetta Stone. It will be a program, used for learning a foreign language, written in Java using Swing. In my program I plan on the user being able to select downloaded courses to learn from. I will be able to create an English course since I am a native English speaker. However, I want people who speak other languages to be able to write courses for users to use as well (this is an essential part for my program to work).
Since I want the users to be able to download courses of languages they want, having it hard-coded into the program is out of the question. The courses needed to be interpreted during the runtime. Also since I want others to collaborate with my work (ie make courses), I need to make it easy for them to do so.
What would be the best way to go about doing this?
The idea I have come up with is having a strict empty course outline (hard-coded) with a simple xml file which details the text and sounds to be used. The drawback to this is that it extremely limits the author. Different languages may need to start out with learning different parts.
Any advice on the problem at hand as well as the project as a whole will be greatly appreciated. Any links to any relevant resources or information would also be greatly appreciated.
Think you for your time and effort,
Joseph Pond

Simply, you should base your program on a system such as Eclipse RCP, or the Netbeans Platform. Both of these systems already deal with exactly this problem, and both are perfectly adequate for this task. They're not just for IDEs.
It's a larger first step as you will need to learn one of these platforms beyond simply just Swing.
But, they solve the problem, and their overall organization and technique will serve your program well anyway.
Don't reinvent this wheel, just learn one of these instead.

If you are set on doing this from scratch (Will's idea isn't bad), What I would do is first lay down the file format that would be easiest to create your language course in. It could be XML, plaintext or some other format you come up with yourself.
You will probably need some flexibility in the language format because you will want to actually be able to specify things like questions and answers. XML is a pain because of all the extra terminators, but it gives a good amount of meta-data. If you like XML for that, you may consider defining your language file in YML, it gives you the data of XML but uses whitespace delineators instead of angle brackets.
You probably also want to define your file in the language it's created for, so you might or might not want to require english words as keys. If you don't want any english, you may have to skip both XML and YML and come up with your own file format--possibly where the layout and/or special symbols define the flow and "functionality".
Once you have defined the file format, you won't have to worry about hard-coding anything... you won't be able to because it will already be in the file.
Plug-in functionality would be nice as well... This is where your definition file also contains information that tells you what class to instantiate (reflectively) and use to parse/display the data. In that way you could add new types of questions just by delivering a new jar file.
If this is confusing, sorry, this is difficult in a one-way forum because I can't look at your face and see if you're following me or if I'm even going in the right direction. If you think I'm on the right track and want more details (I've done a bit of this stuff before) feel free to leave a follow-up question (or an email address) in a comment and I'd be glad to discuss it with you further.

If I was doing this, I'd seriously consider using Eclipse EMF to model the "language" for defining courses. EMF is rather daunting to start with, but it gives you:
A high-level model that can be entered/edited in a variety of ways.
An automatic mechanism for serializing "instances" (i.e. courses) to XML. (And you can tinker with the serialization if you choose.)
Automatically generated Java classes for in-memory representations of your instances. These provide APIs that are tuned to your model, an generic ones that are the EMF equivalent of Java reflection ... but based on EMF model classes rather than Java classes.
An automatically generated tree editor for your "instances".
Hooks for implementing your own constraints / validation rules to say what is a valid "course".
Related Eclipse plugins offer:
Mappings to text-based languages with generation of parsers/unparsers
Mappings to graphical languages; e.g. notations using boxes / arrows / etc
Various more advanced persistence mechanisms
Comparisons/differencing, model-to-model transformations, constraints in OCL, etc
I've used EMF in a couple of largish projects, and the main point that keeps me coming back for more is ease of model evolution ... compared with building everything at a lower level of abstraction. If my model (language) needs to be extended / changed, I can make the necessary changes using the EMF Model editor, regenerate the code, extend my custom code to do the right stuff with the extensions, and I'm pretty much done (modulo conversion of stored instances).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.