How to get a equivalent localized string of a number ? - java

I'm trying to convert numbers into a localized equivalent string (for an android app).
For example I would like to convert 25 into twenty five if the locale is US.
If the locale is FR I would like 237 to be converted into deux cent trente sept.
I searched a lot in the Android documentation without finding anything. ( Locale, TextUtils, ... )
I also looked around into other library such as Apache Commons LocaleUtils, without success.
I'm wondering if such a library even exists.
Any ideas ?

I think you are mixing up localization with translation here. Locales in Java are typically used for formatting.
You could have a look at google-api-translate-java.

Related

How to use unsupported Locale in Java 11 and numbers in String.format()

How can I use an unsupported Locale (eg. ar-US) in JAVA 11 when I output a number via String.format()?
In Java 8 this worked just fine (try jdoodle, select JDK 1.8.0_66):
Locale locale = Locale.forLanguageTag("ar-US");
System.out.println(String.format(locale, "Output: %d", 120));
// Output: 120
Since Java 11 the output is in Eastern Arabic numerals (try jdoodle, use default JDK 11.0.4):
Locale locale = Locale.forLanguageTag("ar-US");
System.out.println(String.format(locale, "Output: %d", 120));
// Output: ١٢٠
It seems, this problem comes from the switch in the Locale Data Providers form JRE to CLDR (source: Localization Changes in Java 9 by #mcarth). Here is a list of supported locales: JDK 11 Supported Locales
UPDATE
I updated the questions example to ar-US, as my example before didn't make sense. The idea is to have a format which makes sense in that given country. In the example it would be the United States (US).
The behavior conforms to the CLDR being treated as the preferred Locale. To confirm this, the same snippet in Java-8 could be executed with
-Djava.locale.providers=CLDR
If you step back to look at the JEP 252: Use CLDR Locale Data by Default, the details follow :
The default lookup order will be CLDR, COMPAT, SPI, where COMPAT
designates the JRE's locale data in JDK 9. If a particular provider
cannot offer the requested locale data, the search will proceed to the
next provider in order.
So, in short if you really don't want the default behaviour to be that of Java-11, you can change the order of lookup with the VM argument
-Djava.locale.providers=COMPAT,CLDR,SPI
What might help further is understanding more about picking the right language using CLDR!
I'm sure I'm missing some nuance, but the problem is with your tag, so fix that. Specifically:
ar-EN makes no sense. That's short for:
language = arabic
country = ?? nobody knows.
EN is not a country. en is certainly a language code (for english), but the second part in a language tag is for country, and EN is not a country. (for context, there is en-GB for british english and en-US for american english).
Thus, this is as good as ar (as in, language = arabic, not tied to any particular country). Even if you did tie it to some country, that is mostly immaterial here; that would affect things like 'what is the first day of the week' ,'which currency symbol is to be presumed' and 'should temperatures be stated in Kelvin or Fahrenheit' perhaps. It has no bearing on how to show digits, because that's all based on language.
And language is arabic, thus, ١٢٠ is what you get when you try ar as a language tag when printing the number 120. The problem is that you expect this to return "120" which is a bizarre wish1, combined with the fact that java, unfortunately, shipped with a bug for a long long time that made it act in this bizarre fashion, thinking that rendering the number 120 in arabic is best done with "120", which is wrong.
So, with that context, in order of preference:
Best solution
Find out why your system ends up with ar-EN and nevertheless expects '120', and fix this. Also fix ar-EN in general; EN is not a country.
More generally, 'unsupported locale' isn't really a thing. the ar part is supported, and it's the only relevant part of the tag for rendering digits.
Alternatives
The most likely best answer if the above is not possible is to explicitly work around it. Detect the tag yourself, and write code that will just respond with the result of formatting this number using Locale.ENGLISH instead, guaranteeing that you get Output: 120. The rest seems considerably worse: You could try to write a localization provider which is a ton of work, or you can try to tell java to use the JRE version of the provider, but that one is obsoleted and will not be updated, so you're kicking the can down the road and setting yourself up for a maintenance burden later.
1.) Given that the JRE variant actually printed 120, and you're also indicating you want this, I get that nagging feeling I'm missing some political or historical info and the expectation that ar-EN results in rendering the number 120 as "120" is not so crazy. I'd love to hear that story if you care to provide it!

How to identify date from a string in Java

Recently I am being challenged by quite an "easy" problem. Suppose that there is sentences (saved in a String), and I need to find out if there is any date in this String. The challenges is that the date can be in a lot of different formats. Some examples are shown in the list:
June 12, 1956
London, 21st October 2014
13 October 1999
01/11/2003
Worth mentioning that these are contained in one string. So as an example it can be like:
String s = "This event took place on 13 October 1999.";
My question in this case would be how can I detect that there is a date in this string. My first approach was to search for the word "event", and then try to localize the date. But with more and more possible formats of the date this solution is not very beautiful. The second solution that I tried is to create a list for months and search. This had good results but still misses the cases when the date is expressed all in digits.
One solution which I have not tried till now is to design regular expressions and try to find a match in the string. Not sure how much this solution might decrease the performance.
What could be a good solution that I should probably consider? Did anybody face a similar problem before and what solutions did you find?
One thing is for sure that there are no time, so the only interesting part is the date.
Using the natty.joestelmach.com library
Natty is a natural language date parser written in Java. Given a date expression, natty will apply standard language recognition and translation techniques to produce a list of corresponding dates with optional parse and syntax information.
import com.joestelmach.natty.*;
List<Date> dates =new Parser().parse("Start date 11/30/2013 , end date Friday, Sept. 7, 2013").get(0).getDates();
System.out.println(dates.get(0));
System.out.println(dates.get(1));
//output:
//Sat Nov 30 11:14:30 BDT 2013
//Sat Sep 07 11:14:30 BDT 2013
You are after Named Entity Recognition. I'd start with Stanford NLP. The 7 class model includes date, but the online demo struggles and misses the "13". :(
Natty mentioned above gives a better answer.
If it's only one String you could use the Regular Expression as you mentioned. Having to find the different date format expressions. Here are some examples:
Regular Expressions - dates
In case it's a document or a big text, you will need a parser. You could use a Lexical analysis approach.
Depending on the project using an external library as mentioned in some answers might be a good idea. Sometimes it's not an option.
I've done this before with good precision and recall. You'll need GATE and its ANNIE plugin.
Use GATE UI tool to create a .GAPP file that will contain your
processing resources.
Use the .GAPP file to use the extracted Date
annotation set.
Step 2 can be done as follows:
Corpus corpus = Factory.newCorpus("Gate Corpus");
Document gateDoc = Factory.newDocument("This event took place on 13 October 1999.");
corpus.add(gateDoc);
File pluginsHome = Gate.getPluginsHome();
File ANNIEPlugin = new File(pluginsHome, "ANNIE");
File AnnieGapp = new File(ANNIEPlugin, "Test.gapp");
AnnieController =(CorpusController) PersistenceManager.loadObjectFromFile(AnnieGapp);
AnnieController.setCorpus(corpus);
AnnieController.execute();
Later you can see the extracted annotations like this:
AnnotationSetImpl ann = (AnnotationSetImpl) gateDoc.getAnnotations();
System.out.println("Found annotations of the following types: "+ gateDoc.getAnnotations().getAllTypes());
I'm sure you can do it easily with the inbuilt annotation set Date. It is also very enhancable.
To enhance the annotation set Date create a lenient annotation rule in JAPE say 'DateEnhanced' from inbuilt ANNIE annotation Date to include certain kinds of dates like "9/11" and use a Chaining of Java regex on R.H.S. of the 'DateEnhanced' annotations JAPE RULE, to filter some unwanted outputs (if any).

Common datetime formats in log files

I'm looking for a list of common datetime formats used in logs (e.g. webserver, database, etc).
Even better would be a (java) library that can extract date and time from a given string ( < 10KB).
Does anyone know a good one?
this library is likely a good place to start: SimpleDateFormat
The docs contains the an introduction to the standard datetime format strings. But as #Olaf points out, you're going to need to specify what the format is beforehand or there is literally no way differentiate certain dates from one another.
Looks like what you'd want to do is construct a range of date formats that might match, apply all of them to a date string, then see which date is closest to Datetime.now().
Although this doesn't answer your question directly, but Java includes libraries for working with regular expressions. It would be pretty easy to write a library of your own based on that. I've has a lot of success extracting all sorts of data using regular expression. It would certainly be less than 10kb and would require no external dependencies other than the JDK.

how to convert english number to chinese in java

I have to convert english number to chines number. but as chinese number system is different than english. I there any way to convert english number getting at run time to convert in to chinese.
Thank You.
Vikram
Instead of rolling your own, it is advisable to use ICU4J NumberFormat as #mcdowell's answer.
The only thing different is the Numbering Systems ID "hansfin" should be replace with "hans" if you wish converting 61305 into "六万一千三百零五".
Locale chineseNumbers = new Locale("C#numbers=hans");
com.ibm.icu.text.NumberFormat formatter =
com.ibm.icu.text.NumberFormat.getInstance(chineseNumbers);
System.out.println(formatter.format(61305));
Here is the results for different Numbering Systems IDes.
hans 六万一千三百零五
hant 六萬一千三百零五
hansfin 陆万壹仟叁佰零伍
hansfin 陸萬壹仟參佰零伍
The hans is the abbreviation of "Han Simplified" (i.e. Simplified Chinese), while the hant is "Han Traditional" (i.e. Traditional Chinese) and the fin is "Finance".
ICU4J has support for this:
Locale chineseNumbers = new Locale("en_US#numbers=hansfin");
com.ibm.icu.text.NumberFormat formatter =
com.ibm.icu.text.NumberFormat.getInstance(chineseNumbers);
System.out.println(formatter.format(100));
Tested with version 4.8.
In that case,
i suggest you build a hash table for it.
It's not that difficult to start with.
we know that chinese 'numerals' are pretty much defined by:
See: http://en.wikipedia.org/wiki/Chinese_numerals
With that, i think you are more than capable to build a table in your programming lang preference, java.

Need some help with String.format

I'm trying to find a complete tutorial about formatting strings in java.
I need to create a receipt, like this:
HEADER IN MIDDLE
''''''''''''''''''''''''''''''
Item1 Price
Item2 x 5 Price
Item3 that has a very
long name.... Price
''''''''''''''''''''''''''''''
Netprice: xxx
Grossprice: xxx
VAT: xxx
Shipping cost: xxx
Total: xxx
''''''''''''''''''''''''''''''
FOOTER IN MIDDLE
The format to pass to string.format is documented here:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/Formatter.html#syntax
From the page:
The format specifiers for general,
character, and numeric types have the
following syntax:
%[argument_index$][flags][width][.precision]conversion
The optional argument_index is a
decimal integer indicating the
position of the argument in the
argument list. The first argument is
referenced by "1$", the second by
"2$", etc.
The optional flags is a set of
characters that modify the output
format. The set of valid flags depends
on the conversion.
The optional width is a non-negative
decimal integer indicating the minimum
number of characters to be written to
the output.
The optional precision is a
non-negative decimal integer usually
used to restrict the number of
characters. The specific behavior
depends on the conversion.
The required conversion is a character
indicating how the argument should be
formatted. The set of valid
conversions for a given argument
depends on the argument's data type.
formating string is some what complicated, for this kind of requirement.
so its better to go for some reporting tool using the format you have given.
which would be the better approach.
Either a crystal report or some others which are easy to implement.
Trying to do this with formatting a string will cost you to much time and nerves. I would suggest a templating engine like Stringtemplate or something similar.
with doing these you will separate the presentation from the data and that will be a very good thing in the long run.
See if these classes in java.text package can help..
Format
MessageFormat
Yea as solairaja said if you are planning to create reports or receipts you can go for reporting tools as Crystal reports
Crystal Report Crystal Report Tutorial
Or if you plan to use StringFormatting itself then "StringBuffer" would be the best option coz u can play around with it.
You should probably look at Java templating tools for this sort of multi-line reporting formatting.
Velocity is simple and forgiving of errors. Freemarker is very powerful but more intolerant. I would perhaps look at Velocity initially, and if you have to do more of this sort of work, take a further look at Freemarker.
Looks like the general advice from the community as a better approach to solve your problem is using a reporting tool.
Here you have a detailed list of open source Java charting and reporting tools:
http://java-source.net/open-source/charting-and-reporting
The most well known is, in my opinion, Jasper Reports. A lot of resources about it are available on the web

Categories