what localization changes are needed for Arabic with Java Applet - java

How big task is it to implement support for Arabic localization, our Java 1.5 Applet was designed as fully localizable (european languages) but now we plan to add also arabic as a new language.
We are using custom GUI text i/o components inherited from Component class using e.g. Drawstring, how well is arabic supported within Component class ?
The keyboard input is done with KeyListener getKeyChar, getKeyCode etc.

It depends on the quality of the original internationalization work. If everything is implemented correctly, then it will be similar to adding support for a new European language - most of the work will be translation and testing.
However, if you've only tested the software with European languages, you might find a lot of problems with your original internationalization work. In particular you might need to consider:
bi-directional text
ligatures (joinining the characters)
rendering (characters change shape depending on their position in the word)
number and date formats formats
specialized input methods
cultural differences (for icons etc)
file encodings
testing
If you have custom code that implements software features in a way that isn't fully localizable then you need to budget for fixing this too.
If you have manuals, help text and other collateral that also needs to be translated, then the software cost might not be such a large proportion of the total budget.
Also, if you have plans to perform localization for any Far Eastern languages (Japanese, Chinese, Korean, ...) you might consider sharing the cost across those projects, since many of the issues will be similar.
One final point - maintaining the localization for future releases might cost substantially more than providing it in the first place.

Related

How to realistically implement model for user generated forms in swing

One of the projects I'm currently working on includes a Java Swing application for field users to input data about equipment parts scattered all over the companies facilities.
Most of the data collected is measured (field validation is required) or some value the operator can choose from a predefined list.The software then does some calculation and displays back instructions to the operator.
so for example for part number 3435Af-B, engineering requires the operator to measure
the diameter of some bolt and choose the part maker from a list. the application then compares the measured diameter to the stock diameter and tells the operator if it should be replaced (this is obviously not a real world example, but you get the idea)
The problem is there are over 200 known equipment parts and these are pretty old and heterogeneous so the engineering team has a limited idea of what they would like to measure on each part in the future. It doesn't seam reasonable to have the development team write and maintain over 200 different classes but doesn't seem realistic to have the engineers use a complicated system of form builders and BPM like drools (i'm getting sweaty just thinking about it).
We're currently trying to make a poor man's solution that would allow the engineers to build forms with a simple GUI having limited features and outputting the forms to XML files. The few complicated cases not covered by the GUI solution would be custom made by the developers (very few cases). For the calculation part, we would use Java Expression library (JEL), the expressions to evaluate would be generated by the same GUI.
Before we commit to this solution, I was wondering if there is something we could do differently to have a more robust system. I know that some people will consider this too much soft-coding and I agree that it is going to be difficult extracting and treating the data.

Internationalization of distances in java

Is it possible in Java without any extra library to internationalize distances?
I mean it is possible to handle that with date, time, currencies, numbers...
I would have expected to find a NumberFormat.getDistanceInstance or something.
Is there something like that already embedded or should i make my own internationalization system for distances (mostly miles vs kilometers)
I would love to hear about such formatter but unfortunately I never did. The problem is, there is no such data in CLDR yet, so it is not to easy to do.
That is to say that people actually think about this for quite a while – see ICU's Measure class. Unfortunately for now, it seems as close you can get is to determine measurement system – see LocaleData and LocaleData.MeasurementSystem.
After that you are on your own. You would need to leave this for translators (they need to actually translate units as well as formatting pattern).
No, there's nothing in the JDK to i18n distances, weights and most other measurement units, except for calendars (I know it's not really a unit, but the lunar calendar is quite different from the Gregorian calendar). Even OSs don't have that kind of information.
The only i18n you can do with time, currencies, numbers is the formatting. There's no feature to change the measurement unit.
So you'll need to build your own for distances :S.

Finding UnicodeBlock set for a given Locale

I'm currently trying to figure out how to get a Character.UnicodeBlock set for a given Locale.
Languages need differents characters from one to another.
What I'm exactly trying to achieve is having a String containing every character needed to write in a specific language. I can then use this String to precompute a set of OpenGL textures from a TrueTypeFont file, so I can easily write any text in any language.
Precaching every single character and having around 1000000 textures is of course not an option.
Does anyone have an idea ? Or does anyone see a flaw in this procedure ?
It's not as simple as that. Text in most European languages can often be written with a simple set of precomposed Unicode characters, but for many more complex scripts you need to handle composing characters. This starts fairly easily with combining accents for Western alphabets, progresses through Arabic letters that are context-sensitive (they have different shapes depending on whether they are first, last, or in the middle of a word), and ends with the utter madness that is found in many Indic scripts.
The Unicode Standard has chapters about the intricacies involved in rendering the various scripts it can encode. Just sample, for example, the description of Tibetan early in chapter 10, and if that doesn't scare you away, flip back to Devanagari in chapter 9. You will quickly drop your ambition of being able to "write text in any language". Doing so correctly requires specialized rendering software, written by experts deeply familiar with the scripts in question.

Java localization best practices

I have a Java application with server and Swing client. Now I need to localize the user interface and possibly also some of the data needs to be locale specific. There are few things in specific I would like to hear your opinions on.
How should I distribute the localized strings for the UI into properties files? In my application there are several views and each has several panels. Should I have one localization file per language for each panel or view or should I keep all translations for one language in the same file? I'm currently leaning towards one file per view and language, but I'm not sure how I should handle some domain specific terms which appear in many places. Having the same translation on several files does not sound too good.
The server throws some exceptions that contain a message that should be displayed to the user. I could get the selected locale from the session and handle the localization at the server, but I feel it would be more elegant to keep all localization files at the client. I have been thinking about sending only a localization key from the server with some kind of placeholders for error specific information, which would be sent with the exception. Then the client could construct the message based on the localization key and replace the placeholders with the error specific information. Does that sound like a good way to handle it, or are there other options? Typically my exception messages contain some additional information that changes for each case. It could be for example "A user with username Khilon already exists", in which case the string in the properties file would be something like "A user with username {0} already exists".
The localization of the data is the area that is the most unclear to me. As I'm not sure if it will be ever required, I have so far not planned it very much. The database part sounds straightforward enough, you basically just need an additional table for the strings and a column to tell for which locale the string is. Though I'm not sure if it would be best to have a localization table for each data table (eg Product and Product_names), or could I use one table for localization strings for all the data tables. The really tricky part is how to handle the UI, as to some degree it would be required for an user to enter text for an object in multiple languages. In practice this could mean for example, that a worker in Finland would give the object a name in Finnish and English, and then a worker in another country could translate it to her own language. If any of you has done something similar, I'd be happy to hear how you did it.
I'm very grateful to everybody who can share their experiences with me.
P.S. If you happen to know any exceptionally good websites or books on the subject, I would be happy to hear of them. I have of course done some googling and read some articles about localization, but nothing mind blowing yet.
Actually, what you are talking about is Internationalization (i18n), not Localization (L10n).
From my experience, you are on the right path.
ad 1). One properties file per view and locale (not necessary language, as you may want to use different translations for certain languages depending on country, i.e. using different strings for British an American English thus different locales) is the right approach. Since applications tend to evolve, it could save a good deal of money when you want to modify just one view (as translators will charge you even for something they won't touch - they will have to actually find strings that need to be updated/newly translated). It would be also easier to use with Translation Memory tools if you do it right (new strings at the end of the file all the time).
ad 2). The best idea is to sent out only the resource key from server or other process; other approach could be attaching a resource key and possibly the data (i.e. numeric value) using delimiters, so the message could be recreated and reformatted into local language.
ad 3). I have seen several approaches to localizing Databases, but the best (and it is not only my opinion, but also IEEE members) is to store resource keys and recreate the data on client side using appropriate locale. Of course this goes for pre-installed data, if you let users to enter the data, other issues will arose... There is no silver bullet, one need to think what works best in his/her context. I would lean to including a foreign key column that will identify the language, but it really depends on kind of data that will be stored.
Unfortunately i18n doesn't end here, please remember about correctly formatting dates and numbers so that they will be understandable for people using your program. And also, if you happen to have some list of strings, the sorting order should also depend on locale (it's called collation).
Sun used to have (now our beloved Oracle) has quite good i18n trail which you can find here: http://download.oracle.com/javase/tutorial/i18n/index.html .
If you want to read good book on the subject of i18n and L10n, that will save you years of learning these topics (although not necessary will teach you how to program it), there is great book from Microsoft Press: "Developing International Software" - http://www.amazon.com/Developing-International-Software-Dr/dp/0735615837 . It still relevant, although quite old.
1) I usually keep everything in one file and use names that signify where the properties are used. For example, I prefix with things like "view" and "menu"
view.add_request.title
view.add_request.contact_information.sectionheader
view.add_request.contact_information.first_name.label
view.add_request.contact_information.last_name.label
menu.admin.user_management.add_user.label
menu.admin.user_management.add_role.label
2) Yes, passing around the key makes things simpler and makes the server code easier to test. It also avoids having to pass locale information to the server to have it decide on a language for the client. Its a thick client, so let it handle the localization.
3) I haven't localized data before (usually just labels, and static UI verbage), but I would probably lean towards having a single table with all the localized strings and locales to start with (just to keep it simple). I'm not sure what you're asking about in reference to the UI, but I would suggest you make sure that whatever character-set you're using allows all the languages you want to support. Make sure you read Joel Spolsky's article entitled: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Best practices in internationalizing text with lots of markup?

I'm working on a web project that will (hopefully) be available in several languages one day (I say "hopefully" because while we only have an English language site planned today, other products of my company are multilingual and I am hoping we are successful enough to need that too).
I understand that the best practice (I'm using Java, Spring MVC, and Velocity here) is to put all text that the user will see in external files, and refer to them in the UI files by name, such as:
#in messages_en.properties:
welcome.header = Welcome to AppName!
#in the markup
<title>#springMessage("welcome.header")</title>
But, having never had to go through this process on a project myself before, I'm curious what the best way to deal with this is when you have some segments of the UI that are heavy on markup, such as:
<p>We are excited to announce that Company1 has been acquired by
Division X,
a fast-growing division of Company 2, Inc.
(Nasdaq: BLAH), based in...
One option I can think of would be to store this "low-level" of markup in messages.properties itself for the message - but this seems like the worst possible option.
Other options that I can think of are:
Store each non-markup inner fragment in messages.properties, such as acquisitionAnnounce1, acquisitionAnnounce2, acquisitionAnnounce3. This seems very tedious though.
Break this message into more reusable components, such as Company1.name, Company2.name, Company2.ticker, etc., as each of these is likely reused in many other messages. This would probably account for 80% of the words in this particular message.
Are there any best practices for dealing with internationalizing text that is heavy with markup such as this? Do you just have to bite down and bear the pain of breaking up every piece of text? What is the best solution from any projects you've personally dealt with?
Typically if you use a template engine such as Sitemesh or Velocity you can manage these smaller HTML building blocks as subtemplates more effectively.
By so doing, you can incrementally boil down the strings which are the purely internationalized ones into groups and make them relevant to those markup subtemplates. Having done this sort of work using templates for an app which spanned multi-languages in the same locale, as well as multiple locales, we never ever placed markup in our message bundles.
I'd suggest that a key good practice would be to avoid placing markup (even at a low-level as you put it) inside message properties files at all costs! The potential this has for unleashing hell is not something to be overlooked - biting the bullet and breaking things up correctly, is far less of a pain than having to manage many files with scattered HTML markup. Its important you can visualise markup as holistic chunks and scattering that everywhere would make everyday development a chore since:
You would lose IDE color highlighting and syntax validation
High possibility that one locale file or another can easily be missed when changes to designs / markup filter down
Breaking things down (to a realistic point, eg logical sentence structures but no finer) is somewhat hard work upfront but worth the effort.
Regarding string breakdown granularity, here's a sample of what we did:
comment.atom-details=Subscribe To Comments
comment.username-mandatory=You must supply your name
comment.useremail-mandatory=You must supply your email address
comment.email.notification=Dear {0}, the comment thread you are watching has been updated.
comment.feed.title=Comments on {0}
comment.feed.title.default=Comments
comment.feed.entry.title=Comment on {0} at {1,date,medium} {2,time,HH:mm} by {3}
comment.atom-details=Suscribir a Comentarios
comment.username-mandatory=Debes indicar tu nombre
comment.useremail-mandatory=Debes indicar tu direcci\u00f3n de correo electr\u00f3nico
comment.email.notification=La conversaci\u00f3n que estas viendo ha sido actualizada
comment.feed.title=Comentarios sobre {0}
comment.feed.title.default=Comentarios
comment.feed.entry.title=Comentarios sobre {0} a {1,date,medium} {2,time,HH:mm} por {3}
So you can do interesting things with how you string replace in the message bundle which may also help you preserve it's logical meaning but allow you to manipulate it mid sentence.
As others have said, please never split the strings into segments. You will cause translators grief as they have to coerce their language syntax to the ad-hoc rules you inadvertently create. Often it will not be possible to provide a grammatically correct translation, especially if you reuse certain segments in different contexts.
Do not remove the markup, either.
Please do not assume professional translators work in Notepad :) Computer-aided translation (CAT) tools, such as the Trados suite, know about markup perfectly well. If the tagging is HTML, rather than some custom XML format, no special preparation is required. Trados will protect the tags from accidental modification, while still allowing changes where necessary. Note that certain elements of tags often need to be localized, e.g. alt text or some query strings, so just stripping all the markup won't do.
Best of all, unless you're working on a zero-budget personal project, consider contacting a localization vendor. Localization is a service just like web design. A competent vendor will help you pick the optimal solution/format for your project and guide you through the preparation of the source material and incorporating the localized result. And of course they and their translators will have all the necessary tools. (Full disclosure: I am a translator / localization specialist. And don't split up strings :)
First off, don't split up your strings. This makes it much harder for localizers to translate text because they can't see the entire string to translate.
I would probably try to use placeholders around the links:
Division X
That's how I did it when I was localizing a site into 30 languages. It's not perfect, but it works.
I don't think it's possible (or easy) to remove all markup from strings, you need to have a way to insert the urls and any extra markup.
You should avoid breaking up your strings. Not only does this become a nightmare to translate, but it also makes grammatical assumptions which may not be correct in the target language.
While placeholders can be helpful for many things, I would not recommend using placeholders for URLs. This allows you to customize the URL for different locales. After all, no sense sending them to an English language page when their locale is Argentine Spanish!

Categories