From what I understand, it's up to fonts to turn Arabic (and Persian) characters from their canonical forms (Unicode around 600) to their glyph forms (Unicode around FB00) where appropriate.
(Arabic letters can be connected to their last and/or next character so they usually have 4 different glyph forms)
I am now trying to draw charts using Vaadin with labels which may contain such letters and some specific letters (like 'ک' or 'ی') stay in their base form no matter where they appear in the word, like what I expect is "الکتاب" and what I get is "الکتاب".
The solution I thought of was to manually change every letter into its appropriate glyph form using a HashMap of base forms to an array of glyph forms, but I believe there should be a way to do this in java's libraries. I have seen this answer which does a similar thing using GlyphVector on a font but it's kind of complicated for my case.
Thanks in advance.
Related
I'm having a problem with iText when I'm trying to create a PDF that contains characters like the ones in the title.
What happens is that the accent circonflexe does not sit properly above the letter but rather right next to it or (depending on what font I use) somewhat "merged" into it (see screenshot below where I used FreeSans).
I know that all the characters that have this problem are "composite" characters. What I mean by this is that they are composed of two unicode characters. For example the "D̂" is represented as "\u0044\u0302" whereas all the usual characters are of course represented as "\uXXXX".
So I'm pretty sure that it has to do with this.
For example an "Ê" which has a normal unicode representation is displayed just fine.
Here is a tiny code snippet that hopefully contains everything you need to know:
String TEXT = "\u0044\u0302 \u004A\u030C \u004C\u0302 \u004D\u0302 \u004E\u0302 \u0064\u0302 \u006C\u0302 \u006D\u0302 \u006E\u0302";//D̂ J̌ L̂ M̂ N̂ d̂ l̂ m̂ n̂
BaseFont bf = BaseFont.createFont("FreeSans.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
document.add(new Paragraph(TEXT, font));
Any help would be highly appreciated.
Thanks in advance!
You'll need to use iText 7 with the module pdfCalligraph. This kind of composing requires access to the OTF tables to align correctly the chars depending on size, height, etc.
For more info on pdfCalligraph, see the chapter 2 of the "iText 7: building blocks" tutorial (please scroll towards the end of the chapter) to find out how it works. You can get a free trial version of pdfCalligraph here.
Modern text editors like Notepad++ can visualize control characters like CR, LF, STX, ETX, EOT. I have started to wonder how text editors visualize these characters so neatly.
Note: I am familiar with how encodings and character sets work. And I'm also familiar with the reason why these characters exist.
Some ideas:
Does it apply a special font for these specific characters ?
i.e. a font which contains a representation of all characters.
Or does it use an advanced text-field control/gui-component that renders (i.e. draws) them on the canvas ?
Or does it just replace the characters ? (e.g. replacing a 0x0D with unicode character 0x240D i.e. ␍)
This seems to be the easiest. But then how does it preserve the fact that copying the text still keeps the original text.
The reason for my question: I would like to create a java application that does the same thing.
The 3rd idea should be true, you can replace the charaters with unicode control pictures using a proper font.
There are some inherent problems with assigning glyphs ('images') to Control Codes; most have to do with the case that they already have a particular use! For example, if you send a Tab code to your display, you'd typically expect the cursor to move by a certain number of positions and not to see a character ○ pop up.
Also, typically, fonts use Unicode as their native encoding. Unicode does not allow a glyph to be assigned to the control codes:
Sixty-five code points (U+0000–U+001F and U+007F–U+009F) are reserved as control codes (https://en.wikipedia.org/wiki/Unicode)
There is an 'alias' sort of set defined: U+2400 to U+241F for 0x00 to 0x1f, U+2420 "␠" for "symbol for space", and U+2421 "␡" for "symbol for Delete" (your #3) but then you need to make sure the user has a font that contains these glyphs.
The most configurable way is to 'manually' draw whatever you like. This means you can use any font you want (without the need for a special font), and character replacement is not necessary (only the drawing code need to filter out 'specials'). A drawback, though, is that you are also in charge of drawing regular text.
If that is overkill or you don't have sufficient control over the text draw area, you can simply use different foreground and background colors for the control characters only. This is a screenshot of a quick-and-dirty hex viewer I wrote a while ago – I only change the colors here, but I could have written out custom text for all as well.
For a good overview of what it takes, see James Brown's Design & Implementation of a Win32 Text Editor; it focuses on using Win32 API calls but there is a lot of background as well. Drawing neat Control Codes is addressed in the section Enhanced Drawing & Painting.
I don't understand why those LIGATURES can be turned on and off? What should happen if there is a string containing fi? I would think a string contains a ligature or not. Same is with used font. So what does it mean that ligature can be turned off?
A ligature in font land is a technical term meaning "a replacement when rendering of two or more codepoints in the data with an alternate shape" and is one of the ways in which fonts can perform automatic substitutions (other examples are full word substitutions, or positional substitutions which are important in for instance Arabic, where a letter is drawn differently depending on where in a word it is written).
Having string data that contains the single unicode "character" fi and then seeing that same thing rendered by the font you're using is not seeing a ligature; The data and the rendered form are the same, so what you're seeing is functionally identical to having an "a" in your data, and seeing that same "a" rendered by the font.
However, if your data contains the multiple letters fi (two letters) or ffl (three letters) and the font turns that into the single glyphs fi or ffl respectively, then those are ligatures: what's in the data and what gets rendered are different. So it is that behaviour that you can turn on or off:
"Should the font be allowed to perform replacements in my data based on what the type designers for that font thought looks better, or should it render my data exactly, without ligature substitutions?"
Given the resource
<string name="squareRoot">√x̅</string>
And the java code
System.out.println("unicode: " + getString(R.string.squareRoot));
The output is
Shouldn't the overline (x0305) be on top of the 'x'?
When I try to use the same string resource as text for a textView the overline does not show at all (it occupies space, I know this becasue I tried to swap the 'x' and the overline and got a blank space before the 'x')
Yes, U+0305 COMBINING OVERLINE should cause an overline placed above the preceding character. However, there are several reasons why this may fail more or less. First, placement of combining characters requires a good rendering engine (e.g., the overline must be placed higher if the preceding character is capital X). Second, the font being used may lack U+0305. This may or may not cause a fallback font to be used, possibly a font that is not stylistically similar to the basic font. Third, U+0305 was not really designed for use as a vinculum in conjunction with a square root sign, so it may look misplaced, depending on font.
In plain text, it is usually best to avoid trying to produce “smart” square root expressions with a vinculum. Using just √x or (is x is an expression with operators) √(x) is much safer and in full conformance to the mathematics standard ISO 80000-2.
I don't think the Android font has the glyph for U+0305.
To confirm (or not) this theory you can try embedding a fond you know it's ok with your application and try using that one.
But even if the font has the glyph, the text layout engine might not be smart enough to do the right thing.
I need to draw a text line where some specified characters would be replaced by arbitrary polygons. These polygons must be painted with Graphics directly using drawPolygon. method. While Unicode contains a range of graphical symbols, they are not appropriate for this task.
I was wondering if it was possible to replace a character with a polygon, in any instance of that character's occurrence in a string?
For example, if I typed-in the word 'Holly' and hit 'enter', the first letter would be replaced by the polygon.
If I then went to type the word 'thistle', the polygon's new position would be in place of the second letter?
Any help/guidance will be greatly appreciated.
Assuming the actual polygon is represented in Unicode all you have to do is string replacement.
System.out.println("Hello".replace('H', '\u25C6'));
produces
◆ello
If you want to display a polygon, may be the easy way is to choose a Unicode symbol
There are lots of them with graphic contents (even a snow man)
This is fully doable with FontMetrics that allow to measure the dimensions of the string as printed with the given font. Use FontMetrics to determine to compute the locations of the string fragments and then draw the string fragments and polygons in between.
This approach seems reasonable if the polygons must be somehow very special (maybe non-Unicode characters?), or they required dimensions are very different from the letter dimensions in the used font.
In early days of Java when the Unicode support was yet not very good, it was not uncommon to draw unsupported national characters that way.