I was doing some pdf text extractions.
I have attached screenshot of a scenario where i faced the problem.
Why the eclipse console failed to print the word "specification"?
Instead it is printed as "speci?cation".
I can see the characters overlapped.
But during debugging the code, the same text is shown without a "question mark".
Is there any way to print the same text to the console?
Please help.
The problem is the "fi" ligature ("overlapping letters") that is a single character in Unicode. In the debugging view the Windows methods for drawing text are used; these know about Unicode and can render the ligature correctly.
The console view uses a certain encoding. When used with Windows the default is "cp1252", Codepage 1252, or ISO 8859. These encodings do not know this specific letter and cannot print it, so the question mark is used as substitute.
You can set the encoding for Eclipse in general via Window > Preferences, General > Workspace, Text file encoding. While I think it is a good idea to use UTF-8 everywhere it may lead to problems with existing files.
You can set the encoding per project in the project properties, category Resource.
If you just want to set the encoding for the console view, the least immersive solution, it is not exactly intuitive. The console view encoding is a property of the runtime configuration you use for running your project. Run > Run Configurations..., your run configuration, Common.
When you use one of these methods to set the encoding to UTF-8 then the ligature will be printed correctly to the console view.
Of course the more general settings only have effect if not overwritten by more specific ones (Workspace, Project, Run Configuration).
I'm surely getting all the terminology wrong here, but the PDF is probably using a glyph for the "fi" combination that isn't part of the ASCII character set. Thus it renders in the console as "?". Notice in the middle part of the window that the "i" in "fi" is closer to the "f" than it would be if it were the ASCII sequence "f" followed by "i" and that the "i" is also missing the dot.
Related
I'm trying to write a password with letters and special characters but the character caret "^" doesn't work. I tried to add \^ (because I'm testing with Java) after the sendKeys, wrote the unicode, etc.
Other characters like "`" are working fine. But this is not working
webDriver.findElement(By.id("password")).sendKeys("\\^");
webDriver.findElement(By.id("password")).sendKeys("ExA^mplePass1");
Can you help me, please?
Thanks
I could confirm my suspicion: ChromeDriver is configured to always use keyboard layout US - QWERTY. If it is not found, some other, very basic layout is used which does not include special characters like ^ or °. Consequence: Special characters are just not printed, no matter what you pass to SendKeys().
This behavior is actually by design and even well documented. An informational log entry mentions the problem if one actually enables logging:
Cannot switch to US keyboard layout - some keys may be interpreted
incorrectly
The solution is to install keyboard layout US - QWERTY (code 00000409) (not US International - QWERTY or anything similar). It does not matter for which language you add this layout.
Go to Windows "Language settings", click any language under section "Preferred languages" and chose "Options". Then add US - QWERTY:
I would like to use this interface:
http://javaslang.com/javadoc/2.0.0-RC2/javaslang/%CE%BB.html
Fortunately auto completion works, if I start from package name.
But is it possible to write lamda greek letter in IntelliJ IDEA or generally in any editor?
This is all about your keyboard and OS, and not specific to Intellij IDEA. You can create one by copy and pasting, which is pretty easy. If you want to insert it directly then use the way of typing special characters specific to your platform (Windows, Mac, Linux, ...).
So for example on Mac you could:
Do Edit > Special Characters and click on the gear wheel at the top left, select Customize and check the box for Greek. Double click or drag drop to input.
Go to system prefs/language & text/input sources and check the box for Greek, plus the box for Show Input Menu in Finder. Then select Greek from the "flag" menu at the top right of the screen and type.
There are also some suggestions on the Wikipedia article on entering special characters, though IntelliJ may intercept some of these.
I have to admit, I always just copy and paste.
You can express any Unicode character anywhere in Java source code via the \udddd form. It is not just for Strings and chars. (For characters outside the BMP, you need two of those, forming a surrogate pair.) Inasmuch as there sometimes are multiple characters with similar glyphs, Unicode escapes have the advantage of being totally unambiguous.
Presumably the character wanted in this case is the one Unicode names "Greek small letter lamda", U+03BB, which you can express anywhere in Java source code as \u03bb.
For a play button in a Java GUI I currently use a button with the label set to ' ▻ ' (found this symbol in a Unicode symbol table). As I understand, it is better to not use such symbols directly in source code but rather use the explicit unicode representation like \u25BB in this example, because some tools (editor, ...) might not be able to handle files with non-ASCII content (is that correct?).
Assuming the compiled class contains the correct character, under which circumstances would the GUI not show the intended symbol on a current PC operating system? Linux, Windows, Mac should all support UTF-16, right? Do available fonts or font settings cause problems to this approach?
(Of course I could add an icon, but why add extra resources if a symbol should already be available... given that this is a portable solution)
Do available fonts or font settings cause problems to this approach?
Unfortunately they do. You can use unicode in the source code of course, but the problem is that currently unicode has 246,943 code points assigned so obviously no font has even a fraction of those defined. You'll get squares or some other weird rendering when the glyph isn't available. I've had cases where relatively simple symbols such as ³ render fine on one Windows computer and show up as squares in the next, almost identical computer. All sort of language and locale settings and minor version changes affect this, so it's quite fragile.
AFAIK there are few, if any, characters guaranteed to be always available. Java's Font class has some methods such as canDisplay and canDisplayUpTo, which can be useful to check this at runtime.
Instead of using icons, you could bundle some good TrueType font that has the special characters you need, and then use that font everywhere in your app.
I currently use a button with the label set to ' ▻ '
rather than I always use JButton(String text, Icon icon), and Icon doesn't matter if is there this Font or another Font, UTF-16 or Unicode
Most of editors have support for unicode, so go on.
Look at this post: Eclipse French support
If you are using simple editor like notepad then when you save type name and below it choose UTF encoding ( http://www.sevenforums.com/software/72727-how-make-notepad-save-txt-files-unicode.html )
I'd like to do this:
System.out.println("안녕하세요!");
But I get a "Some characters could not be encoded using the MacRoman character encoding" popup error message when I try to compile in Eclipse. I'm running Mac OS X. Is there a way to get around that?
So, I guess I'll try using Unicode:
System.out.println((char)0xD0A4);
Which I'd like to print out '키', but instead get a '?'. I guess the command line doesn't support Unicode. But even if it did, it'd be really annoying to have to go to one of several charts (like this one) to find each character block.
Anyway, FINE! So I'll use a JLabel...
JLabel lbl = new
JLabel(""+(char)0xD0A4);
Awesome, this prints out 키! ... but I still have to look up the Unicode characters for each block. How can I easily spew out Korean characters in a program?
Switch to UTF-8, as said before.
However, instead of doing it on a per-project basis (as J-16) suggests, go through
Window -> Preferences -> General -> Workspace and change the "Text file encoding" to "Other: UTF-8".
This changes the setting for the entire workspace.
Afterwards, you can input your characters as you are used to.
The Eclipse console doesn't use unicode encoding so it can't display those. See https://bugs.eclipse.org/bugs/show_bug.cgi?id=13865.
Try the fix mentioned here: http://paranoid-engineering.blogspot.com/2008/05/getting-unicode-output-in-eclipse.html
Just right click the file in the project view, choose properties. Change the encoding to UTF8 there.
This is similar to my own previous question, but that solution didn't work here.
As mentioned in the previous question, I'm working on a cross platform(Windows/Ubuntu) application that has to transliterate English into one of several official Indian languages. The application has a custom input method, and typing in English and pressing space will transliterate the typed text into the specific local language. Urdu is different from the others in being right to left, like Arabic/Hebrew.
I managed to find an open licensed Urdu font that has both English and Urdu glyphs, but when I type characters in English, nothing shows up.
I don't understand whether it's a font painting issue, or related to the input method. So far, if I disable the custom input method (InputMethod.dispatchEvent() ) for this language, I am able to see the English text (but of course no transliteration takes place).
My findings:
Change font to one of Windows' built in Arabic fonts - same result.
Instead of using ComponentOrientation to align text in the text field, I used setHorizontalAlignment for when the locale is Urdu. Same result.
Decompiled the JDK's default input method provider on Windows (sun.awt.windows.WInputMethod). Here I see the dispatchEvent() makes a native call to the OS for handling IME. I can't do that here.
Found a custom IM for Hebrew - my version of dispatchEvent() is essentially the same.
Stepped through code for JTextField in Eclipse - wasn't able to find anything in the AbstractDocument and subclasses. The AbstractDocument.insertUpdate() method checks for and updates bidirectional text input, but there wasn't anything else significant.
I'm unable to understand what happens after the dispatchEvent() call. The characters are being registered, i.e. the transliteration engine is able to detect the typed characters and process them, but they just don't show up on screen.
Workaround
If I let the text field's orientation be as it is for regular left to right languages, I can see the English text. However, this would not be acceptable to an Urdu speaking user.
Can someone point me in the right direction?
I set the locale to ur_IN.
Sadly, ur_IN is not among the supported locales; I only see en_IN and hi_IN. In the example cited, I used the following code to get the image below:
spinner.setLocale(new Locale("hi", "IN"));