Modern text editors like Notepad++ can visualize control characters like CR, LF, STX, ETX, EOT. I have started to wonder how text editors visualize these characters so neatly.
Note: I am familiar with how encodings and character sets work. And I'm also familiar with the reason why these characters exist.
Some ideas:
Does it apply a special font for these specific characters ?
i.e. a font which contains a representation of all characters.
Or does it use an advanced text-field control/gui-component that renders (i.e. draws) them on the canvas ?
Or does it just replace the characters ? (e.g. replacing a 0x0D with unicode character 0x240D i.e. ␍)
This seems to be the easiest. But then how does it preserve the fact that copying the text still keeps the original text.
The reason for my question: I would like to create a java application that does the same thing.
The 3rd idea should be true, you can replace the charaters with unicode control pictures using a proper font.
There are some inherent problems with assigning glyphs ('images') to Control Codes; most have to do with the case that they already have a particular use! For example, if you send a Tab code to your display, you'd typically expect the cursor to move by a certain number of positions and not to see a character ○ pop up.
Also, typically, fonts use Unicode as their native encoding. Unicode does not allow a glyph to be assigned to the control codes:
Sixty-five code points (U+0000–U+001F and U+007F–U+009F) are reserved as control codes (https://en.wikipedia.org/wiki/Unicode)
There is an 'alias' sort of set defined: U+2400 to U+241F for 0x00 to 0x1f, U+2420 "␠" for "symbol for space", and U+2421 "␡" for "symbol for Delete" (your #3) but then you need to make sure the user has a font that contains these glyphs.
The most configurable way is to 'manually' draw whatever you like. This means you can use any font you want (without the need for a special font), and character replacement is not necessary (only the drawing code need to filter out 'specials'). A drawback, though, is that you are also in charge of drawing regular text.
If that is overkill or you don't have sufficient control over the text draw area, you can simply use different foreground and background colors for the control characters only. This is a screenshot of a quick-and-dirty hex viewer I wrote a while ago – I only change the colors here, but I could have written out custom text for all as well.
For a good overview of what it takes, see James Brown's Design & Implementation of a Win32 Text Editor; it focuses on using Win32 API calls but there is a lot of background as well. Drawing neat Control Codes is addressed in the section Enhanced Drawing & Painting.
Related
From what I understand, it's up to fonts to turn Arabic (and Persian) characters from their canonical forms (Unicode around 600) to their glyph forms (Unicode around FB00) where appropriate.
(Arabic letters can be connected to their last and/or next character so they usually have 4 different glyph forms)
I am now trying to draw charts using Vaadin with labels which may contain such letters and some specific letters (like 'ک' or 'ی') stay in their base form no matter where they appear in the word, like what I expect is "الکتاب" and what I get is "الکتاب".
The solution I thought of was to manually change every letter into its appropriate glyph form using a HashMap of base forms to an array of glyph forms, but I believe there should be a way to do this in java's libraries. I have seen this answer which does a similar thing using GlyphVector on a font but it's kind of complicated for my case.
Thanks in advance.
I don't understand why those LIGATURES can be turned on and off? What should happen if there is a string containing fi? I would think a string contains a ligature or not. Same is with used font. So what does it mean that ligature can be turned off?
A ligature in font land is a technical term meaning "a replacement when rendering of two or more codepoints in the data with an alternate shape" and is one of the ways in which fonts can perform automatic substitutions (other examples are full word substitutions, or positional substitutions which are important in for instance Arabic, where a letter is drawn differently depending on where in a word it is written).
Having string data that contains the single unicode "character" fi and then seeing that same thing rendered by the font you're using is not seeing a ligature; The data and the rendered form are the same, so what you're seeing is functionally identical to having an "a" in your data, and seeing that same "a" rendered by the font.
However, if your data contains the multiple letters fi (two letters) or ffl (three letters) and the font turns that into the single glyphs fi or ffl respectively, then those are ligatures: what's in the data and what gets rendered are different. So it is that behaviour that you can turn on or off:
"Should the font be allowed to perform replacements in my data based on what the type designers for that font thought looks better, or should it render my data exactly, without ligature substitutions?"
I am looking to apply an alphabet that does not exist in the UTF-Setup currently applied to all of my view elements and Swing components. To display these new characters, would I have to simply have each as its own image and then present the images adjacent to one another as in a character-like pattern, or is there any method by which to import letters from pictures to be added to something like a text area upon acting on a button?
Basically, if I have a pictograph system, may I import these images as characters, or would I have to maintain them as pictures?
To give some specificity, picture, Klingon writing or Dragon language, something that certainly is not defined in the standard packages of Character sets.
Thank you!
Best way I can think of to do this is to create some font file (.ttf, .otf, etc.) representing your special alphabet and then proceed to follow the instructions in this answer here.
Downside is, there really isn't any easy way to create font files. Usually it involves many hours manually tracing symbols using a vector graphic editor and compiling those to a font file.
If your characters are already vector images, then most of the work will have already been done.
Given the resource
<string name="squareRoot">√x̅</string>
And the java code
System.out.println("unicode: " + getString(R.string.squareRoot));
The output is
Shouldn't the overline (x0305) be on top of the 'x'?
When I try to use the same string resource as text for a textView the overline does not show at all (it occupies space, I know this becasue I tried to swap the 'x' and the overline and got a blank space before the 'x')
Yes, U+0305 COMBINING OVERLINE should cause an overline placed above the preceding character. However, there are several reasons why this may fail more or less. First, placement of combining characters requires a good rendering engine (e.g., the overline must be placed higher if the preceding character is capital X). Second, the font being used may lack U+0305. This may or may not cause a fallback font to be used, possibly a font that is not stylistically similar to the basic font. Third, U+0305 was not really designed for use as a vinculum in conjunction with a square root sign, so it may look misplaced, depending on font.
In plain text, it is usually best to avoid trying to produce “smart” square root expressions with a vinculum. Using just √x or (is x is an expression with operators) √(x) is much safer and in full conformance to the mathematics standard ISO 80000-2.
I don't think the Android font has the glyph for U+0305.
To confirm (or not) this theory you can try embedding a fond you know it's ok with your application and try using that one.
But even if the font has the glyph, the text layout engine might not be smart enough to do the right thing.
I have seen http://docs.oracle.com/javase/1.5.0/docs/api/java/awt/font/GlyphVector.html but I don't know how you would use it to display a glyph on the screen. Lets say you want to print glyph number 1042 (likely to be different in each font and unlikely (but possible) to be the same as Unicode 1042) to the screen. How do you go from the number to the character on screen? Is GlyphVector the way to go or is there a better method?
That GlyphVector class is not available on Android. There don’t seem to be any public API calls in Android graphics that allow access to font glyphs without going through the Unicode encoding.
Your obvious option would seem to be direct parsing of the TrueType font file. Perhaps find some library somewhere that has been ported to Android, or could be so ported (Freetype?).