How do you handle internationalization for "Your input 'xyz' is excellent!"

How do you handle internationalization for "Your input 'xyz' is excellent!" - java

I would like to know what is the right way to handle internationalization for statements with runtime data added to it. For example
1) Your input "xyz" is excellent!
2) You were "4 years" old when you switched from "Barney and Freinds" to "Spongebob" shows.
The double quoted values are user data obtained or calculated at run time. My platforms are primarily Java/Android. A right solution for Western languages are preferred over weaker universal one.

Java
To retrieve localized text (messages), use the java.util.ResourceBundle API. To format messages, use java.text.MessageFormat API.
Basically, first create a properties file like so:
key1 = Your input {0} is excellent!
key2 = You were {0} old when you switched from {1} to {2} shows.
The {n} things are placeholders for arguments as you can pass in by MessageFormat#format().
Then load it like so:
ResourceBundle bundle = ResourceBundle.getBundle("filename", Locale.ENGLISH);
Then to get the messages by key, do:
String key1 = bundle.getString("key1");
String key2 = bundle.getString("key2");
Then to format it, do:
String formattedKey1 = MessageFormat.format(key1, "xyz");
String formattedKey2 = MessageFormat.format(key2, "4 years", "Barney and Friends", "Spongebob");
See also:
Trail: internationalization
Android
With regards to Android, the process is easier. You just have to put all those String messages in the res/values/strings.xml file. Then, you can create a version of that file in different languages, and place the file in a values folder that contains the language code. For instance, if you want to add Spanish support, you just have to create a folder called res/values-es/ and put the Spanish version of your strings.xml there. Android will automatically decide which file to use depending on the configuration of the handset.
See also:
Developer guide - Localization

One non-technical consideration. Embedding free data inside English phrases isn't going to look very smooth in many cultures (including Western ones), where you need grammatical agreement on e.g. number, gender or case. A more telegraphic style usually helps (e.g. Excellent input: "xyz") -- then at least everybody gets the same level of clunkiness!

I think one will probably have to define one's format string to include a "1-of-N" feature, preferably defined so as to make common cases easy (e.g. plurals). For example, define {0#string1/string2/string3} to output string1 if parameter 0 is zero or less, string2 if it's precisely 1, and string3 if it's greater than 1}. Then one could say "You have {0} {0#knives/knife/knives} in the drawer."

Related

What is the best way to represent Middle Earth using Java's Locale framework?

I'm in the process of putting together an android app which will have a Quenya translation out of the box (an Elvish dialect).
If we would like to maintain the maximum compliance with the ISO standards while representing this fantasy world, how should we go about it?
Also, if there is a standard for representing Middle Earth that has already been agreed on by the community, what is it?
Perhaps we would:
require more letters for the language or country codes (like "TME" for "Tolkien's Middle Earth" or "MEGN" for "Middle Earth, Gondor")

I am not clear if there are community-agreed standard for such country code yet.
However, for your suggestion of "more letter for language or country code", that will surely be a bad idea. ISO standard already defined how many characters a country / language code can be. For example ISO 3166-1 alpha-3 standard is 3-char long country code while ISO 3166-1 alpha-2 is 2-char long.
I think you best bet is to choose code that is not used by any country, or choose from some deleted codes so that there is supposed no one using or going to use it. (For example, MID is a deleted code which looks a good fit for Middle Earth grin )

Quenya has already been registered in iso 639-3 with the code "qya" : http://www-01.sil.org/iso639-3/documentation.asp?id=qya

As Kevin Gruber pointed out Quenya is registered. There are a few reserved user-assignable codes available in the 2 character space, these might be acceptable as well, to keep from colliding with valid country codes.
https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#ZZ
The following alpha-2 codes can be user-assigned: AA, QM to QZ, XA to
XZ, and ZZ.
ZZ is unofficially used to represent "Unknown or Invalid Territory", which is probably the best fit in this case.

Configure Play framework messages to not format numbers

In the Play Framework, you can define files that have a key=value list of text.
For instance
myerror.number=The number {0} is not a double
Which I can then get using: Messages.get("myerror.number", 5);
And it will say:
The number 5 is not a double
When going above 1000 though, the output will format it with a group separator, like this: "1,000"
How can I correct the settings to not do this? No separators at all for whole numbers?

In accordance with the documentation Play uses the MessageFormat class for formatting and it generates its output based on the provided Locale object. Hence, the thousand separator is used in your example.
The simplest solution is to pass the string value instead the number.
Messages.get("myerror.number", String.valueOf(5));
Play doesn't provide any global configuration for that purpose.

How do you convert a java String to a mailing address object?

As input I am getting an address as a String. It may say something like "123 Fake Street\nLos Angeles, CA 99988". How can I convert this into an object with fields like this:
Address1
Address2
City
State
Zip Code
Or something similar to this? If there is a java library that can do this, all the better.
Unfortunately, I don't have a choice about the String as input. It's part of a specification I'm trying to implement.
The input is not going to be very well structured so the code will need to be very fault tolerant. Also, the addresses could be from all over the world, but 99 out of 100 are probably in the US.

You can use JGeocoder
public static void main(String[] args) {
Map<AddressComponent, String> parsedAddr = AddressParser.parseAddress("Google Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043");
System.out.println(parsedAddr);
Map<AddressComponent, String> normalizedAddr = AddressStandardizer.normalizeParsedAddress(parsedAddr);
System.out.println(normalizedAddr);
}
Output will be:
{street=Amphitheatre, city=Mountain View, number=1600, zip=94043, state=CA, name=Google Inc, type=Parkway}
{street=AMPHITHEATRE, city=MOUNTAIN VIEW, number=1600, zip=94043, state=CA, name=GOOGLE INC, type=PKWY}
There is another library International Address Parser you can check its trial version. It supports country as well.
AddressParser addressParser = AddressParser.getInstance();
AddressStandardizer standardizer = AddressStandardizer.getInstance();//if enabled
AddressFormater formater = AddressFormater.getInstance();
String rawAddress = "101 Avenue des Champs-Elysées 75008 Paris";
//you can try to detect the country
CountryDetector detector = CountryDetector.getInstance();
String countryCode = detector.getCountryCode("7580 Commerce Center Dr ALABAMA");
System.out.println("detected country=" + countryCode);
Also, please check Implemented Countries in this library.
Cheers !!

I work at SmartyStreets where we develop address parsing and extraction algorithms.
It's hard.
If most of your addresses are in the US, you can use an address verification service to provide guaranteed accurate parse results (since the addresses are checked against a master list).
There are several providers out there, so take a look around and find one that suits you. Since you probably won't be able to install the database locally (not without a big fee, because address data is licensed by the USPS), look for one that offers a REST endpoint so you can just make an HTTP request. Since it sounds like you have a lot of addresses, make sure the API is high-performing and lets you do batch requests.
For example, with ours:
Input:
13001 Point Richmond Dr NW, Gig Harbor WA
Output:
Or the more specific breakdown of components, if needed:
If the input is even messier, there are a few address extraction services available that can handle a little bit of noise within an address and parse addresses out of text and turn them into their components. (SmartyStreets offers this also, as a beta API. I believe some other NLP services do similar things too.)
Granted, this only works for US addresses. I'm not as expert on UK or Canadian addresses, but I believe they may be slightly simpler in general.
(Beyond a small handful of well-developed countries, international data is really hit-and-miss. Reliable data sets are hard to obtain or don't exist. But if you're on a really tight budget you could write your own parser for all the address formats.)

If you are sure on the format, you can use regular expressions to get the address out of the string. For the example you provided something like this:
String address = "123 Fake Street\\nLos Angeles, CA 99988";
String[] parts = address.split("(.*)\\n(.*), ([A-Z]{2}) ([0-9]{5})");

I assume the sequence of information is always the same, as in the user will never enter postal code before State. If I got your question correctly you need logic to process afdress that may be incomplete (like missing a portion).
One way to do it is look for portions of string you know are correct. You can treat the known parts of Address as separators. You will need City and State names and address words (Such as "Street", "Avenue", "Road" etc) in an array.
Perform Index of with cities,states and the address words (and store them).
Substring and cut out the 1st line of address (from start to the index of address signifying word +it's length).
Check index of city name (index found in step 1). If it's -1 skip this step. If it's 0 Take it out (0 also means address line 2 is not in string). If it's more than 0, Substring and cut out anything from start of string to index of city name as the 2nd line of address.
Check the index of state name. Once again if -1 skip this step. If 0 substring and cut out as state name.
Whatever remains is your postal code
Check the strings you just extracted for left over separators (commas, dots, new lines etc) and extract them;
If the address is missing both state and city you would actually need an a list of zip codes too, so better ensure the user enters at least 1 of them.
It's not impossible to implement what you need, but you probably don't want to waste all that time doing it. It's easier to just ensure user enters everything correctly.

Maybe you can use Regular Expression

Localizing a string containing list of names

I have string containing a list of name like below:
"John asked Kim, Kelly, Lee and Bob about the new year plans". The number of names in the list can very.
How can I localize this in Java?
I am thinking about ResourceBundle and MessageFormat. How will I write the pattern for this in MessageFormat?
Is there any better approach?

Localizing an (inline) list is more than just translating the word “and.” CLDR deals with the issue of formatting lists, check out their page on lists. I’m afraid ICU doesn’t have support to this yet, so you might need to code it separately.
Another issue is that you cannot expect to be able to use names as such in sentences like this. Many languages require the object to be in an inclined form, for example. In Finnish, your sample sentence would read as “John kysyi Kimiltä, Kellyltä, Leeltä ja Bobilta uudenvuoden suunnitelmista.” So you may need to find out and include different inclined forms of the names. Moreover, if the language used does not have Latin alphabet, you may need transliterated forms of the names (e.g., in Arabic, John is جون). There are other problems as well. In Russian, the verb corresponding to “asked” depends on the gender of the subject (e.g., спросила vs. спросил).
I know this sounds complex, but localization is often complex. If you target a limited set of languages only, things can be much easier, so it is important to defined your goals—perhaps accepting some simplifications that may result in grammatically incorrect expressions. But for localization that is to cover a wide range languages, you may need to make the generating function localized. That is, you would have, for each language, a function that accepts a list of names as arguments and returns a string representing the statement, possibly using resource files containing information (transliterated form, different inclined form, gender) about proper names that may appear.
In some situations, you might even consider generating the sentence in English, then sending it to an online translator. For example, Google Translator can deal with some of the issues that I mentioned. It surely produces wrong translations a lot, but for sentences with grammatically very simple structure, it might be a pragmatic solution, if you can accept some amount of errors. If you consider trying this, make sure you test sufficiently how the automatic translator can handle the specific sentences you will use. Quite often you can improve the results by reformulating the sentences. Dividing a sentence with several clauses into separate sentences often helps. But even your simple sentence causes problems in automatic translation.
You might avoid some complications if you can reformulate the sentence structure, e.g. so that all the nouns appear in the subject position and you avoid “packed” expressions like “new year plans.” For example, “John asked what plans Kim, Kelly, Lee, and Bob have for the new year” would be simpler, both for automatic translation and for pattern-based localization.

You could do something like:
"{0} asked {1} about the new year plans"
where 0 is the first name and 1 is a comma-separated list of the other names.
Hope this helps.

I see an answer was already accepted, I'm just adding this here as an alternative. The code has hard coded values for the data, but is only meant to present an idea that can be refined:
MessageFormat people = new MessageFormat("{0} asked {1,choice,0#no one|1#{2}|2#{2} and {3}|2<{2}, and {3}} about the new year plans");
String john = "John";
Object[][] parties = new Object[][] { {john, 0}, {john, 1, "Kim"}, {john, 2, "Kim", "Kelly}, {john, 4, "Kim, Kelly, Lee", "Bob"}};
for (final Object[] strings : parties) {
System.out.println(people.format(strings));
}
This outputs the following:
John asked no one about the new year plans
John asked Kim about the new year plans
John asked Kim and Kelly about the new year plans
John asked Kim, Kelly, Lee, and Bob about the new year plans
Determining the number of names that is used for the 2nd argument and creating the comma-delimited string for the 3rd argument isn't displayed in that sample, but can easily be done instead of using the hard coded values I used.

For localization, the normal approach is to use external language packs, which is a file contains the text you're going to display, assign each text a name/key, then load the text in the program by the key.

You could combine your ResourceBundle (for I18N) with a MessageFormat (to replace placeholders with the names) : "{0} asked {1} about the new year plans"
It would be up to you to prepare the names beforehand, though.

Parse BigDecimal from String containing a number in arbitrary format

We read data from XLS cells formatted as text.
The cell hopefully contains a number, output will be a BigDecimal (because of arbitrary precision).
Problem is, the cell format is also arbitrary, which means it may contain numbers like:
with currency symbols ($1000)
leading and trailing whitespaces, or whitespaces in between digits (eg. 1 000 )
digit grouping symbols (eg. 1,000.0)
of course, negative numbers
'o's and 'O's as zeros (eg. 1,ooo.oo)
others I can't think of
It's mostly because of this last point that I'm looking for a standard library that can do all this, and which is configurable, well tested etc.
I looked at Apache first, found nothing but I might be blind... perhaps it's a trivial answer for someone else...
UPDATE: the domain of the question is financial applications. Actually I'm expecting a library where the domain could be an input parameter - financial, scientific, etc. Maybe even more specific: financial with currency symbols? With stock symbols? With distances and other measurement units? I can't believe I'm the first person to think of something like this...

I don't know any library, but you can try that:
Put your number on a string. (ex: $1,00o,oOO.00)
Remove all occurrences of $,white-spaces or any other strang symbols you can think of...
Replace occurrences of o and O.
Try to parse the number =]
That should solve 99% of the entrys...

Buy bunch photos or even better videos with legal adult content. Create a web site with these resources but limit the access with captcha which will be displaying unsolved number formats. Create a set of number decoders out of known number formats and create an algorithm which will add new ones based on user solved captchas.

I think this is what I've been looking for:
http://site.icu-project.org/
Very powerful library, although at the moment it's not clear whether it can only format or all the formatted stuff can be parsed back as well.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.