BreakIterator behaving differently in Android API 29 and API 30

BreakIterator behaving differently in Android API 29 and API 30 - java

I have made the below function to break String into Hindi Chars. But It behaves differently android API 29 and API 30. In Android 29 Hindi word चक्की is broken into च क् की But in Android 30 it is correctly broken into च क्की.
public List<String> breakIntoHindiChar(String textAnswer) {
List<String> ansCharList = new ArrayList<String>();
Locale hindi = new Locale("hi", "IN");
BreakIterator breaker = BreakIterator.getCharacterInstance(hindi);
breaker.setText(textAnswer);
int start = breaker.first();
for (int end = breaker.next();
end != BreakIterator.DONE;
start = end, end = breaker.next()) {
ansCharList.add(textAnswer.substring(start, end));
}
return ansCharList;
}
How can I solve this problem?

Android implementation of BreakIterator was not able to interpret Bharatiya scripts (abiguda type) accurately. See this bug - https://code.google.com/p/android/issues/detail?id=230832
Looks like it has been fixed for API 30 and has not been backported for previous versions.

How can I solve this problem?
As you noted in your question, the behavior in Android 29 is incorrect, and the behavior in Android 30 is correct.
So it depends on what you think the problem is:
If you think that it is that they fixed the behavior in Android 30 at that has "broken" your app on later versions, then a solution would be for you to copy the Android 29 BreakIterator into your app (with a different name) and use it.
If you think that it is that they didn't fix the behavior in Android 29 (and earlier) and you want your app to behave correctly on older Android, then a solution would be for you to copy the Android 30 BreakIterator into your app (with a different name) and use it.
Note that what you would be doing is to implement either forwards or backwards compatible behavior for BreakIterator, albeit not in the standard class.
(There may be a more elegant solution, but I don't have my own local copy of the Android master repo to dig through the history.)
Note: If you can't replace all of your app's use of BreakIterator with an alternative version of the class, then (AFAIK) there won't be a way to make the behavior consistent. You cannot patch the Android platform. And note that while you could use android.icu.text.BreakIterator instead of android.text.BreakIterator in your own code, that wouldn't fix the problem for other code that your app depends on.
But if you think the problem is that BreakIterator behaves differently in Android 29 vs 30 ... there is no solution to that. It is a fact. Yes ... it does behave differently ... and even Google can't make it not behave differently in Android 30 ... now.

Related

How to use unsupported Locale in Java 11 and numbers in String.format()

How can I use an unsupported Locale (eg. ar-US) in JAVA 11 when I output a number via String.format()?
In Java 8 this worked just fine (try jdoodle, select JDK 1.8.0_66):
Locale locale = Locale.forLanguageTag("ar-US");
System.out.println(String.format(locale, "Output: %d", 120));
// Output: 120
Since Java 11 the output is in Eastern Arabic numerals (try jdoodle, use default JDK 11.0.4):
Locale locale = Locale.forLanguageTag("ar-US");
System.out.println(String.format(locale, "Output: %d", 120));
// Output: ١٢٠
It seems, this problem comes from the switch in the Locale Data Providers form JRE to CLDR (source: Localization Changes in Java 9 by #mcarth). Here is a list of supported locales: JDK 11 Supported Locales
UPDATE
I updated the questions example to ar-US, as my example before didn't make sense. The idea is to have a format which makes sense in that given country. In the example it would be the United States (US).

The behavior conforms to the CLDR being treated as the preferred Locale. To confirm this, the same snippet in Java-8 could be executed with
-Djava.locale.providers=CLDR
If you step back to look at the JEP 252: Use CLDR Locale Data by Default, the details follow :
The default lookup order will be CLDR, COMPAT, SPI, where COMPAT
designates the JRE's locale data in JDK 9. If a particular provider
cannot offer the requested locale data, the search will proceed to the
next provider in order.
So, in short if you really don't want the default behaviour to be that of Java-11, you can change the order of lookup with the VM argument
-Djava.locale.providers=COMPAT,CLDR,SPI
What might help further is understanding more about picking the right language using CLDR!

I'm sure I'm missing some nuance, but the problem is with your tag, so fix that. Specifically:
ar-EN makes no sense. That's short for:
language = arabic
country = ?? nobody knows.
EN is not a country. en is certainly a language code (for english), but the second part in a language tag is for country, and EN is not a country. (for context, there is en-GB for british english and en-US for american english).
Thus, this is as good as ar (as in, language = arabic, not tied to any particular country). Even if you did tie it to some country, that is mostly immaterial here; that would affect things like 'what is the first day of the week' ,'which currency symbol is to be presumed' and 'should temperatures be stated in Kelvin or Fahrenheit' perhaps. It has no bearing on how to show digits, because that's all based on language.
And language is arabic, thus, ١٢٠ is what you get when you try ar as a language tag when printing the number 120. The problem is that you expect this to return "120" which is a bizarre wish1, combined with the fact that java, unfortunately, shipped with a bug for a long long time that made it act in this bizarre fashion, thinking that rendering the number 120 in arabic is best done with "120", which is wrong.
So, with that context, in order of preference:
Best solution
Find out why your system ends up with ar-EN and nevertheless expects '120', and fix this. Also fix ar-EN in general; EN is not a country.
More generally, 'unsupported locale' isn't really a thing. the ar part is supported, and it's the only relevant part of the tag for rendering digits.
Alternatives
The most likely best answer if the above is not possible is to explicitly work around it. Detect the tag yourself, and write code that will just respond with the result of formatting this number using Locale.ENGLISH instead, guaranteeing that you get Output: 120. The rest seems considerably worse: You could try to write a localization provider which is a ton of work, or you can try to tell java to use the JRE version of the provider, but that one is obsoleted and will not be updated, so you're kicking the can down the road and setting yourself up for a maintenance burden later.
1.) Given that the JRE variant actually printed 120, and you're also indicating you want this, I get that nagging feeling I'm missing some political or historical info and the expectation that ar-EN results in rendering the number 120 as "120" is not so crazy. I'd love to hear that story if you care to provide it!

Compare strings bwlow API 18

I searched and got clear that == is not used to compare the content of string variables but equals().
However, AS reports that equals() is only available on API 19 (Android 4.4) and up and I targetted API 18 (my only phone is Android 4.3)
So right now, I'm doing if (var1.contains(var2)) or if (array[i].contains(var)) to compare strings and it works but it doesn't seem correct to me.
What would be the correct way to achieve this on API < 19?
Thanks.
Edit: for clarification (I don't know how to put inline images)
With ==
With equals()
Comparison fails with equals().

The equals function of the Object class was added in Java JDK 1.0.
This version was released on January 23, 1996. It was called 'Oak' back then, so technically it predates even Java itself. (source).
In contrast, Android API 1 was released on September 23, 2008. At this time it would be made with Java JDK 1.5 (latest version was Java SE 5 Update 16).
So in conclusion, equals is available on API level 18, there must be some other error.
After seeing the posted code, you are using Objects.equals(), which is a utility method that checks equals() in a null-safe manner.
In many cases, like yours, you don't need the extra null check because you know at least one of the objects is not null and you can just call equals directly:
if("una".equals(hourNames[realHour]))
Your hourNames array will probably not contain null elements so you should turn it around to the more readable order:
if(hourNames[realHour].equals("una"))

use hourNames[realHour].equals("una")
yeah, that is a bug on android studio (or maybe from intellij idea).

Since if (var1.contains(var2)) works for you why don't you post the exact strings ? In case you don't have debug capability, I would suggest using these to debug the point in difference in the strings:-
boolean contentEquals(CharSequence cs)
public int compareTo(String anotherString)
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#compareTo(java.lang.String)
Also you could use the simulator env and run this small test:-
String s1 = "Test";
String s2 = "Test";
if (s1.equals(s2))
System.out.println("Equal");
else
System.out.println("Not Equal");

Java Library to validate and compare MAC address

We have a Java web service that gets a String representing a MAC address. We want to validate if the given String actually matches the required format. Further we want to create a normalized form to make them comparable.
I searched quite a while but found only some "loose regular expressions". We would really prefer to have a library that can parse different formats and return a normalized (String) representation (i.e. 01-23-45-67-89-ab and 01:23:45:67:89:ab would return the same representation and be comparable).
I expected to find some mature and well tested library, which could do that kind of task. Can anyone please point me to it? I just cannot believe that it doesn't exist yet.
I would be very thankful to not see any RegExes as possible solutions (we know how to do that if necessary).

The IPAddress Java library will do it. The javadoc is available at the link. Disclaimer: I am the project manager.
The library will read various common formats for MAC addresses, like aa:bb:cc:dd:ee:ff, aa-bb-cc-dd-ee-ff, aabb.ccdd.eeff, it supports addresses that are 48 or 64 bits, and also allows you to specify ranges of addresses like aa-ff:bb:cc:*:ee:ff
Verify if an address is valid:
String str = "aa:bb:cc:dd:ee:ff";
MACAddressString addrString = new MACAddressString(str);
try {
MACAddress addr = addrString.toAddress();
...
} catch(AddressStringException e) {
//e.getMessage provides validation issue
}
The library is well tested, it has a test suite with thousands of tests.

mature and well tested library
To verify MAC addresses? It's 6 bytes in hex optionally separated by a delimiter. It's a homework assignment or light interview question, no need to write a library. My solution is 10 lines, and it's more paranoid than necessary...

How to get a equivalent localized string of a number ?

I'm trying to convert numbers into a localized equivalent string (for an android app).
For example I would like to convert 25 into twenty five if the locale is US.
If the locale is FR I would like 237 to be converted into deux cent trente sept.
I searched a lot in the Android documentation without finding anything. ( Locale, TextUtils, ... )
I also looked around into other library such as Apache Commons LocaleUtils, without success.
I'm wondering if such a library even exists.
Any ideas ?

I think you are mixing up localization with translation here. Locales in Java are typically used for formatting.
You could have a look at google-api-translate-java.

Byte alignment problem using Preon

Hello everybody :)
I am currently using preon for a spare time project, and I have encountered the following problem: I am trying to read a fixed length String with the following code:
#Bound int string_size;
#ByteAlign #BoundString(size = "string_size") my_string;
The file specification expects a variable padding, so that the next block's offset is a multiple of 4.
For example, if string_size = 5, then 3 null bytes will be added, and so on. I initially thought that the #ByteAlign annotation did exactly this, however, looking into the source code, I realized that it wasn't the case.
I tried to make this quick fix:
#If ("string_size % 4 == 2") #BoundList(size = "2", type = Byte.class) byte[] padding;
Sadly, Limbo doesn't seem to support the "%" operator. Is there a way around this?
(Also, where/how can I get the latest version?)
Thanks in advance.

Preon currently doesn't have a solution for your issue built-in. As you said, it's expression language doesn't have a modulo operator, and it looks like you could use one. You can however implement your own CodecDecorator, which is probably the thing you want to do. You could implement a CodecDecorator that inserts a Codec reading a couple of extrac bytes after it decoded the value.
The latest version of Preon is at Codehaus:
git://git.codehaus.org/preon.git
You could checkout the head, but there's also a separate branch called PREON-35 that has the bits for doing what is discussed over here.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.