Why deleteSurroundingText doesn't work with emojis and select all? - java

I'm working on a custom keyboard and I'm using deleteSurroundingText to delete characters. I only have two issue with this. deleteSurroundingText doesn't work well while deleting emojis. I need to press del button twice in order to get rid of single emoji. And second del key is not working with select all option.
case Keyboard.KEYCODE_DELETE:
getCurrentInputConnection().deleteSurroundingText(1,0);
break;
This is what happens to emoji when I press try to del an emoji:
?
It turns into a question mark.
Also, when I try to del text by doing select all nothing happens.
Any help would be appreciated

Java use 16-bit characters (see note in the documentation). So one character can store codepoint from U+0000 to U+FFFF.
Modern unicode define codepoint range from U+0000 to U+10FFFF.
Most of emojis have codepoints beyond U+FFFF. To represent such codepoints so called "surrogate pairs" are used.
In other words each emoji (and all other codepoints beyond U+FFFF boundary) are represented by two consequent characters in the string.
When you call deleteSurroundingText(1,0); you corrupt surrogate pair. Not yet deleted part of surrogate pair are rendered as an ? mark.
Documentation for deleteSurroundingText() specially emphasize this case:
IME authors: please be careful not to delete only half of a surrogate pair. Also take care not to delete more characters than are in the editor, as that may have ill effects on the application. Calling this method will cause the editor to call onUpdateSelection(int, int, int, int, int, int) on your service after the batch input is over.
Please next time read method documentation carefully before trying to use it.
To determine if character is a part of surrogate pair use Chracter::isSurrogate() method.

In the case where there is some selection, and all of it should be deleted, commitText("", 1) could be used - this will replace the selected text with an empty string.

Related

Does an unicode small subscripted for Alpha exist (similar to „ᵦ“ (U+1D66))? Or is there another way to get this char without HTML?

have to replace letters with subscripted letters in a Java software. Can not use HTML labels, but can rename the subjects using Unicode characters. Unfortunately i am not able to find a subscripted small alpha(the only char missing to succeed..). Does this unicode even exist? Or is there another way to get a small subscripted alpha? Thx in advance and sorry for my english
I do not think so. But anyway, the unicode character U+1D66 is member of the Phonetic Extension block (ref.). That means that you should not use it if you only want to write a normal Greek β (U+03B2) in a subscript text bloc.
Having same or close display does not mean that characters are the same. For example the German Eszett ß (U+00DF) is a totally different character...

Encoding a string in 128c barcode symbology

I am having some trouble with encoding this string into barcode symbology - Code 128.
Text to encode:
1021448642241082212700794828592311
I am using the universal encoder from idautomation.com:
https://www.bcgen.com/fontencoder/
I get the following output for the encoded text for Code 128:
Í*5LvJ8*r5;ÂoP<[7+.Î
However, in ";Âo" the character between the semi-colon and o (let us call it special A) - is not part of the extended character set used in Code128. (See the Latin Supplements at https://www.fonts2u.com/code-128.font)
Yet the same string shows a valid barcode at
https://www.bcgen.com/linear-barcode-creator.html
How?
If I use the output with the Special A on a webpage with a font face for barcodes, the special A character does not show up as the barcode (and that seems correct since the special A is not part of the character set).
What gives? Please help.
I am using the IDAutomation utility to encode the string to 128c symbology. If you can share code to do the encoding (in Java/Python/C/Perl) that would help too.
There are multiple fonts for Code128 that may use different characters to represent the barcode symbols. Make sure the font and the encoding logic match each other.
I used this one http://www.jtbarton.com/Barcodes/Code128.aspx (there is also sample code how to encode it on the site, but you have to translate it from VB). The font works for all three encodings (A, B and C).
Sorry, this is very late.
When you are dealing with the encoding of code 128, in any subset, it's a good idea to think of that coding in terms of numbers, not characters. At this level, when you have shifts, code-changes, checksums and stuff, intermixed with the data, the whole concept of "character" is lost.
However, this is what is happening:
The semicolon in the output corresponds to "27"
The lowercase o corresponds to "48" and the P to "79"
The "A with Macron" corresponds to your "00" sequence. This is why you should be dealing with numbers, not characters, at this level of encoding.
How would you expect it to show a character with a code of 00 ? That would be a space of NULL, neither of which is particularly visible.
Your software has simply rendered it the best way it can, which is to make the character 'visible' by adding 0x80 to it. If you look at charmap, you will see that code 0x80 is indeed A with macron.
The rest (indeed all) of your encoded string looks correct for a setc-encodation.

Adding Annotation with Hebrew letters in Itext

When i add annotaion using:
PdfAnnotation.createFileAttachment(writer,null,null , null, , "שם קובץ", "שם קובץ");
the Hebrew letters in the annotaion are not shown.
Is there a way to fix it?
You're using Hebrew characters in your code. That's not safe. Please replace them with a unicode notation (you'll need to know their unicode value; for instance \u00a0 is the value for a non-breaking space). If you don't do this, compilers could interpret the characters incorrectly (see the remarks that were given).
It appears to me that you don't have the correct number of parameters in the method. I assume that you're using this method.
You're using a 'short-cut' method that assumes that the characters aren't Unicode characters. Please don't. Use the method where you create a PdfFileSpecification object, and use methods such as setUnicodeFileName() with the unicode parameter set to true. This way, iText knows that the characters should be interpreted as Unicode characters.
You probably want the characters to appear from right to left. I don't know if this is supported in PDF. I browsed ISO-32000-1 and looked at Table 44 (Entries in a file specification dictionary), but all I saw was: Unicode text string that provides file specification of the form described in 7.11.2, "File Specification Strings." This is a text string encoded using PDFDocEncoding or UTF-16BE with a leading byte-order marker (as defined in 7.9.2.2, "Text String Type"). You'll have to dig into those sections if you want to know more.
You pass null as value for the Rectangle. That doesn't make sense. Are you sure you want to add a file attachment annotation? Based on your code I would assume that you want to add a document-level attachment instead. That's done like this: writer.addFileAttachment(fs); with fs an instance of the FileSpecification class.

Java regex to distinguish special characters while allowing non english chars

I am trying to do above. One option is get a set of chars which are special characters and then with some java logic we can accomplish this. But then I have to make sure I include all special chars.
Is there any better way of doing this ?
You need to decide what constitutes a special character. One method that may be of interest is Character.getType(char) which returns an int which will match one of the constant values of Character such as Character.LOWERCASE_LETTER or Character.CURRENCY_SYMBOL. This lets you determine the general category of a character, and then you need to decide which categories count as 'special' characters and which you will accept as part of text.
Note that Java uses UTF-16 to encode its char and String values, and consequently you may need to deal with supplementary characters (see the link in the description of the getType method). This is a nuisance, but the Character method does offer methods which help you detect this situation and work around it. See the Character.isSupplementaryCodepoint(int) and Character.codepointAt(char[], int) methods.
Also be aware that Java 6 is far less knowledgeable about Unicode than is Java 7. The newest version of Java has added far more to its Unicode database, but code running on Java 6 will not recognise some (actually quite a few) exotic codepoints as being part of a Unicode block or general category, so you need to bear this in mind when writing your code.
It sounds like you would like to remove all control characters from a Unicode string. You can accomplish this by using a Unicode character category identifier in a regex. The category "Cc" contains those characters, see http://www.fileformat.info/info/unicode/category/Cc/list.htm.
myString = myString.replaceAll("[\p{Cc}]+", "");

How to display X-Bar statistics symbol in Java Swing label?

What's the best way to insert statistics symbols in a JLabel's text? For example, the x-bar? I tried assigning the text field the following with no success:
<html>x̄
Html codes will not work in Java. However, you can use the unicode escape in Java Strings.
For example:
JLabel label = new JLabel(new String("\u0304"));
Also, here is a cool website for taking Unicode text and turning it into Java String leterals.
Well, that's completely mal-formed HTML, probably even for Swing (I think you would need the </html> at the end for it to work. But I would try to never go that road if you can help it, as Swing's HTML support has many drawbacks and bugs.
You can probably simply insert the appropriate character directly, either directly in the source code if you're using Unicode or with the appropriate Unicode escape:
"x\u0304"
This should work, actually. But it depends on font support and some fonts are pretty bad in positioning combining characters. But short of drawing it yourself it should be your best option.
You can obtain x̄ in adding \u0304 to x character.
In your case, you must generate following string
"x\u0304"
The character \u0304 alone is only a overscore or overline character. It is a special Diacritic character in UNICODE table. You can use it in combination with other normal character to obtain a composite character.
You can find more information on Diacritics characters at following location //en.wikipedia.org/wiki/Combining_character
You can also use
\u0305 to have a longer bar
\u0307 to have a dot above previous character (speed in mechanic)
\u0308 to have 2 dots above previous character (accelaration in
mechanic)
\u0325 to have a circle above previous character
Example
"[x\u0305]" -> x̄
"[z\u0307]" -> x̅
"[t\u0308]" -> ṫ
"[u\u0309]" -> ü
"[A\u030A]" -> Å
In example, only x overlapped with long bar is not correctly displayed in HTML because Chrome browser has a bug (I think).
If you Paste/Copy correctly displayed character in MS Word, you will see correct display of all these characters.
To type a new overlined character in MS Word, you type the character and immediately after you press Alt & 0 774 where 0774 is the base 10 representation of diacritic overline character.

Categories