Convert language code to (dojo) locale code - java

I have a system that returns me language codes, and I wish to use the same to compute the corresponding locale code that Dojo accepts.
Is there a neat way to do this?
(not sure what kind of mapping can be used)

Given that you've tagged this with Java, I'm guessing you're getting a java.util.Locale. Convert the underscores to dashes and make everything lowercase.

Related

Getting the alphabet of a language from the Locale in Java

I am making an internationalized app in Java. I need a list of all the letters in a language, starting from the Locale. There are some questions like Alphabet constant in Java? or Create Alphabet List from list : Java which touch on the issue, but I'm wondering is there a Utils class or something where it's already defined and where I can get a list of chars or a String containing all the letters in the alphabet of a language by it's Locale.
You can refer this library and methods in detail, com.ibm.icu.util.LocaleData. Pass argument as Locale.ENGLISH to get alphabets of English.
There are several issues here.
First, I have to point out that there are many languages that aren't alphabetic. Obviously, Chinese, or Japanese are examples of ideographic languages. Unfortunately, it will be very hard, next to impossible to create a list of all the characters in these languages.
Second, although Common Locale Data Repository and as a consequence ICU have predefined sets of index exemplars and example characters this information is far from being complete.
Third, there are languages that use more than one script (aka writing system). Depending on the source of your locale you may or may not know which characters needs to be displayed.
Finally, it is hard to give you right answer when you haven't provided your use case. The design of your application may impose serious limitations on usability or localizability...

Is it possible to code java in another language?

I was curious as to whether or not a person could code in another language.
I DON'T mean naming your variables in different languages like this:
String[] tableau = {"Janvier", "Fevrier"};
System.out.println(String[0].length);
But more like
Chaîne[] tableau = {"Janvier", "Fevrier"};
Système.sortie.imprimeln(Chaîne[0].longueur);
Is that doable or would you need to write your own french or [insert language] based coding language?
If you were to use a Preprocessor to accomplish this - I believe it would work perfectly well. Java does not ship with one, but C and C++ did (e.g. cpp) - So, you could add a step in your build chain to perform preprocessing and then your code would be translated into the "hosted" English Java before being compiled. For another example, consider the language CofeeScript; a language that translates itself into JavaScript. So, as long as your mapping is one for one equivalent I believe the answer is Oui.
No.
You could write your own frontend to convert "French Java source" to "English Java" (either bytecode or source) for the base language, but you are still going to have problems with all the libraries and any 3rd party tools which will still be English based.
Java syntax should be written in plain English. JVM will use English syntax to convert into byte code. Surely you can display labels, message in other languages using Locale.
Programming languages need not change their library identifiers, and built-in keywords, from one language to another.
If you're programming in Java, you can use UTF-8 for encoding your source files.
You can then use Unicode symbols such as characters from languages other than English in your own identifiers.
You can name your own type Chaîne; but the String type stays String, and keywords like if or for or public stay English.
The concept of localizing the keywords of a language has been tried. For instance in the obscure language Protium. (I'd give a link if all leads weren't defunct; but Rosetta Code has some examples.) In Protium, all symbols are made up of character trigraphs to create semi-readable abbreviations. For instance, this code snippet which is rendered in English:
<# SAI>
<# ITEFORLI3>2121|2008|
<# LETVARCAP>Christmas Day|25-Dec-<# SAYVALFOR>...</#></#>
<# TSTDOWVARLIT>Christmas Day|1</#>
<# IFF>
<# SAYCAP>Christmas Day <# SAYVALFOR>...</#> is a Sunday</#><# SAYKEY>__Newline</#>
</#>
</#>
</#>
Now the idea is that these trigraphs, like LET VAR CAP, which make up an identifier like LETVARCAP, individually map to some corresponding trigraphs in other languages. Or perhaps, in the case of languages with complex characters in their writing system like Chinese or Japanese, to a single ideographic character.
Make of it what you will.

Java library for cleaning up user-entered title to make it show up in a URL?

I am doing a web application. I would like to have a SEO-friendly link such as the following:
http://somesite.org/user-entered-title
The above user-entered-title is extracted from user-created records that have a field called title.
I am wondering whether there is any Java library for cleaning up such user-entered text (remove spaces, for example) before displaying it in a URL.
My target text is something such as "stackoverflow-is-great" after cleanup from user-entered "stackoverflow is great".
I am able to write code to replace spaces in a string with dashes, but not sure what are other rules/ideas/best practices out there for making text part of a url.
Please note that user-entered-title may be in different languages, not just English.
Thanks for any input and pointers!
Regards.
What you want is some kind of "SLUGifying" the prhase into a URL, so it is SEO-friendly.
Once I had that problem, I came to use a solution provided in maddemcode.com. Below you'll find its adapted code.
The trick is to properly use the Normalize JDK class with some little additional cleanup. The usage is simple:
// casingchange-aeiouaeiou-takesexcess-spaces
System.out.println(slugify("CaSiNgChAnGe áéíóúâêîôû takesexcess spaces "));
// these-are-good-special-characters-sic
System.out.println(slugify("These are good Special Characters šíč"));
// some-exceptions-123-aeiou
System.out.println(slugify(" some exceptions ¥123 ã~e~iõ~u!##$%¨&*() "));
// gonna-accomplish-yadda
System.out.println(slugify("gonna accomplish, yadda, 완수하다, 소양양)이 있는 "));
Function code:
public static String slugify(String input) {
return Normalizer.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "")
.replaceAll("[^ \\w]", "").trim()
.replaceAll("\\s+", "-").toLowerCase(Locale.ENGLISH);
}
In the source page (http://maddemcode.com/java/seo-friendly-urls-using-slugify-in-java/) you can take a look at where this comes from. The small snippet above, though, works the same.
As you can see, there are some exceptional chars that aren't converted. To my knowledge, everyone that translates them, uses some kind of map, like Djago's urlify (see example map here). You need them, I believe your best bet is making one.
It seems you want to URL-encode a string. It's possible in core Java, without using external libraries. URLEncoder is the class you need.
Languages other than English shouldn't be a problem as the class allows you to specify the character encoding, which takes care of special characters like accents, etc.

Replacing Java unicode encodings with actual characters

When I make web queries, for accented characters, I get special character encodings back as strings such as "\u00f3" , but I need to replace it with the actual character, like "ó" before making another query.
How would I find these cases without actually looking for each one, one by one?
It seems you're handling JSON formatted data.
Use any of the many freely available JSON libraries to handle this (and other parsing issues) for you instead of trying to do it manually.
The one from JSON.org is pretty widely used, but there are surely others that work just as well.

How can I determine what the alphabet for a locale is in java?

I would like to determine what the alphabet for a given locale is, preferably based on the browser Accept-Language header values. Anyone know how to do this, using a library if necessary ?
take a look at [LocaleData.getExemplarSet][1]
for example for english this returns abcdefghijklmnopqrstuvwxyz
[1]: http://icu-project.org/apiref/icu4j/com/ibm/icu/util/LocaleData.html#getExemplarSet(com.ibm.icu.util.ULocale, int)
If you just want to know the name of an appropriate character set for a users locale then you might try the nio.CharSet class.
If you really want to use the Accept-Language header, then there's an old O'Reilly article on this matter which introduces a pretty handy class called LanguageNegotiator.
I think one of those will give you a decent enough start.
It depends on how specific you want to get. One place to look would be at the "Suppress-Script" properties in the IANA language registry.
Some languages have multiple "alphabets" that can be used for writing. For example, Azerbaijani can be written in Latin or Arabic script. Most languages, like English, are written almost exclusively in a single script, so the correct script goes without saying, and should be "suppressed" in language codes.
So, looking at the entry for Russian, you can tell that the preferred script is Cyrillic, while for Ethiopian, it is Amharic. But German, Norwegian, and English aren't more specific than "Latin". So, with this method, you'd have a hard time hiding umlauts and thorns from Americans, or offering any script to a Kashmiri writer.
This is an English answer written in Århus. Yesterday, I heard some Germans say 'Blödheit, à propos, ist dumm'. However, one of them wore a shirt that said 'I know the difference between 文字 and الْعَرَبيّة'.
What's the answer to your question for this text? Is it allowed? Isn't this an English text?
The International Components for Unicode might help here. Specifically the UScript class looks promising.
Out of curiosity: What do you need it for?

Categories