How to extend the list of available Java Locales

How to extend the list of available Java Locales - java

I am looking for a way to add more Locales to the Locales available in Java 1.6. But the Locales I want to create do not have ISO-3166 country codes, nor ISO-639 language codes. Is there any way to do this anyways? The Locales I want to add only differ in the language names, but the smaller an ethnic group is, the more picky they get about their identity ;-)
So I thought about extending an existing Locale, something like
UserDefinedLocale extends Locale {
UserDefinedLocale (Locale parentLocale) {...}
}
but java.util.Locale is final, which makes it especially hard to hack something around...
So, is the idea that the list of Java Locales is exhaustive? Am I the first to miss some more Locales?

Read the javadoc for java.util Locale.
It says :
"Create a Locale object using the constructors in this class: "
It also says :
"Because a Locale object is just an identifier for a region, no validity check is performed when you construct a Locale"
It also says :
"A Locale is the mechanism for identifying the kind of object (NumberFormat) that you would like to get. The locale is just a mechanism for identifying objects, not a container for the objects themselves"
And finally, the javadoc for the getAvailableLocales() method says :
"The returned array represents the union of locales supported by the Java runtime environment and by installed LocaleServiceProvider implementations"
So you just have to invent a language code which is not in the standard list, and use it as an identifier for your locale.

See this answer:
...You can plug in support for additional locales via the SPIs (described here). For example, to provide a date formatter for a new locale, you would do it by implementing a DateFormatProvider service. You might be able to do this by decorating an existing implementation - I'd have a look at the ICU4J library to see if it provides the support you want.

Related

Java currency display name internationalization

Is there any reason why certain display names are displayed in English despite the locales not being in English (ie: not translated).
For example:
Locale: "ru" is not translated
Locale locale = new Locale("ru");
Currency curr = Currency.getInstance("USD");
System.out.println(curr.getDisplayName(locale));
// US Dollar
Locale: "es" is translated
Locale locale = new Locale("es");
Currency curr = Currency.getInstance("USD");
System.out.println(curr.getDisplayName(locale));
// dólar estadounidense
Is this intentional?
Or has Java not gotten around to translating it?
Or am I doing something incorrectly?
I tried to locate the files where these translations are stored but couldn't find them. If someone could point me to that resource as well that would be helpful.
Thanks.

Locale Service Providers
Java uses an extensible mechanism to provide data (such as strings,
formatters, etc) that is localized.
Classes can implement LocaleServiceProvider to be a factory of
local-sensitive data. A lot of classes in java.util and java.text
rely on these providers to work properly with different Locale by
delegating the creating of objects to them.
You can find examples of Local Service Providers in the java.util.spi package, that are usually used to display text or numbers in a way that is dependent on the Locale. It includes CurrencyNameProvider which is used by Currency when calling Currency#getDisplayName.
Finding implementations
Classes wanting to use a specific LocaleServiceProvider (such as CurrencyNamePovider) use LocaleServiceProviderPool to find the instances of the provider that support a specific Locale.
LocaleServiceProviderPool tries first to use a default implementation included in the JRE. If it's not found, it relies on the simple Service Provider Interface (SPI) mechanism in Java and uses a ServiceLoader1 to try to find implementations provided by third party libraries.
This is what is written in the tutorial about Locale:
These methods first check whether the Java runtime environment supports the requested locale; if so, they use that support. Otherwise, the methods call the getAvailableLocales() methods of installed providers for the appropriate interface to find a provider that supports the requested locale.
The default implementations of the providers that come with the JRE can be found in the package sun.util.locale.provider. It's complex, but it essentially gets the data from the jar localedata.jar. In Oracle JDK, it is located in java_home/jre/lib/ext/localedata.jar. If you list the files that are in this jar, and check the files in sun.util.resources.es and in sun.util.resources.ru, you will see that there is many more currency names defined for Spanish than for Russian.
Here are the files for OpenJDK: Russian vs Spanish.
What if it's not defined at all
Locales are organized in a hierarchy. For example, a specific region of a country can have a Locale which reflects some local differences from the Locale of the country. If data is not found for a Locale, LocaleServiceProviderPool will try to use the Locale's parent.
The root of the tree of Locales is basically a "fallback" fictional Locale which provides default values for all localized data.
That's what probably happening when you ask for the display name of USD in russian.
Extensibility
Any program can provide additional Locale information. They need to define a service provider1 by creating the metadata file and implement a CurrencyNameProvider. You can fill in the missing localized data in your own jar.
Conclusion
Or has Java not gotten around to translating it?
That's pretty much the case.
Or am I doing something incorrectly?
No, you can either rely on the defaults or provide the localized data yourself.
1 The ServiceLoader will find them by asking the classloader to load the resource META-INF/services/java.text.spi.DateFormatProvider. If such file is found, it should have in it the specific class name of the implementation. It then tries to create an instance of it through the classloader.

BreakIterator API Java

The documentation for BreakIterator.getWordInstance() has options to use it with the Locale parameter, presumably because different locales' end results may vary for methods like (WordInstance, LineInstance, SentenceInstance, CharacterInstance)
But, when I do not use this parameter, I still get the same results as I get when calling it with any Locale in getAvailableLocales().
Is there some pattern, String, or Locale which actually causes these methods to give different results?

I believe all "western" languages have the same rules.
Cursory scan shows that locale th (Thai) has it's own rules, given in file /sun/text/resources/th/WordBreakIteratorData_th inside .../jre/lib/ext/localedata.jar.
It's a binary file, so I don't know what it says, and even if I could understand the file, not knowing Thai, I still wouldn't understand it.

Purpose of String.toLowerCase() with default locale?

Java has two overloads each for String.toLowerCase and toUpperCase. One of the overloads takes a Locale as a parameter while the other one takes no parameters and uses the default locale (Locale.getDefault()).
The parameterless variants might not work as expected because case conversion respects internationalization, and the default locale is system dependent. Most notably, the lower case i is converted to an upper case dotted İ in the Turkish locale.
What is the purpose of these methods? Do the parameterless variants have any legitimate use? Or perhaps they were just a design mistake? (Not unlike several I/O APIs that use the system default character encoding by default.)

I think they're just convenience methods that will work most of the time, since apps that really need I18n are probably a small minority in the universe of java apps in the world.
If you hardcode a unix path for a File name in a java program and try to run in a windows box, you will also get wrong results and it's not java's fault.

I guess that's an implementation of write once run anywhere principle.
It makes sense, cause you can provide the default locale at JVM startup as one of the runtime parameters.
Furthermore, Java runtime has got a bunch of similar formatting methods for Dates and Numbers. (SimpleDateFormat, NumberFormat etc)

Several blog posts suggest that default locales and charsets indeed were a design mistake and have no meaningful use.

Possible to dynamically customize locale in Java?

I have a situation in which a client for obscure reasons wants a specific locale to be in place, except for the modification that month names in lower case as per the locale should be shown in upper case (which is not a standard variant of the locale in question). I already have SimpleDateFormatter code in place referencing an instance of Locale.
My question is whether it is possibly to dynamically construct an instance of Locale based on a designated country code, but with specifically given modifications? Or, alternatively, whether it is possible to build a locale instance from scratch, specifying all details at runtime, such that a SimpleDateFormatter referencing it would change its casing of months accordingly?
Thanks in advance.

The Javadoc for LocaleServiceProvider should get you started.

Parsing Joda-Time Partials

I'd like to produce Partials from Strings, but can't find anything in the API that supports that. Obviously, I can write my own parser outside of the Joda-Time framework and create the Partials, but I can't imagine that the API doesn't already have the ability to do this.
Use of threeten (JSR-310) would be an acceptable solution, but it doesn't seem to support Partials. I don't know whether that is due to its alpha status, or whether the Partial concept is handled in a different manner, which I haven't discovered.
What is the best way to convert a String (2011, 02/11, etc) into a Partial?

I've extended DateTimeParserBucket. My extended class intercepts calls to the saveField() methods, and stores the field type and value before delegating to super. I've also implemented a method that uses those stored field values to create a Partial.
I'm able to pass my bucket instance to DateTimeParser.parseInto(), and then ask it to create the Partial.
It works, but I can't say I'm impressed with Joda-Time - given that it doesn't support parsing Partials out of the box. The lack of DateTimeFormatter.parsePartial(String) is a glaring omission.

You have to start by defining the valid format for Partials which you will be accepting. There is no class which will just take text and infer the best possible match for a Partial. It's way too subjective based on locale, user preference, etc. So there's no way of getting around making a list of all of the valid formats for input. It will be very difficult to make these all mutually exclusive for each other, so there should be priorities. For example, you might want mm/dd and mm/yy to both be valid formats. If I give you the string 02/11, which one should have priority?
Once you've determined exactly the valid formats, you should use DateTimeFormat.forPattern to create a DateTimeFormatter for each one. Then you can use each formatter to try to parseInto a MutableDateTime. Then, go through each field in the MutableDateTime and transfer the value into a Partial.
Unfortunately, there is no better way to handle this in the Joda library.

The ISODateTimeFormat class allows partial printing. As you say, there is no parsing method on DateTimeFormatter (although you can parse to a LocalDate and interpret that).
ThreeTen/JSR-310 has the DateTimeFields class which replaces Partial. Parsing of partials into a CalendricalMerger is supported, however that may not be convertable back into a DateTimeFields yet.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.