SWT/JFace TableViewer not showing Unicode characters - java

I'm using a TableViewer for showing some data in my app. The problem is that this widget is not showing Unicode characters, and I don't understand why?
Is there a way to solve this?
Edit: I've made a debugging session. I think the problem isn't the Table, but the ArrayList where strings are stored.
The program loads a byte[] array and decodes it, generating Unicode strings. In my previous app it appears that the list is correctly stored, but in the new one (with the same way to generate strings) unicode special characters like 'è' are not stored correctly. Why?

I've solved it ;)
The trick was to use UTF-8 instead of UTF-16 (Eclipse did the job for me)

Related

Display special characters using entity or hex values

I am trying to display ŵ through my jsf page but unable to do so. Basically the text with special characters is read from properties file , but on my application screen it becomes something else . I did try to use entity values but not succeeding for example if original text is :
ŵyhsne klqdw dwql
then after replacing with with entity or hexvalues:
**&wcirc ;**yhsne klqdw dwql but in my page it displays as it is
I can just guess your question. Please edit it and improve it.
If you are displaying in web, you should use ŵ (note: without spaces), but this also requires a fonts on client site that support such character.
If the string is in your code: replace the character with \u0175.
But probably the best way it is to use just ŵ either in code on in web, or on any file, and you should assure that such files (or sources) are interpreted ad UTF-8, and you deliver pages are UTF-8. If you are not using UTF-8, just check in similar way, that you are using consistently the correct encoding.
And sending a character doesn't mean it could be displayed. There is always the possibility that a font will not have all *special" characters in it.

Java MySQL Encoding issue with UTF-8

I have an issue inserting a pdf text into a mysql table. The error message is as follows:
" Incorrect string value: '\xF0\x9D\x9B\xBC i...' for column 'text' at row 1"
I know that this code refers to the greek letter alpha. However, I have set 'characer set' to UTF-8 for the column text but also in the mysql connection. Also, I have tried uft8mb4. However, none of it worked.
The greek letter alpha occurs in different font types. I am not sure if this matters.
Any ideas why this does not work?
I also created a pdf file myself which contained an alpha in the text. For this example, my programme runs without any errors. Although I know that the error message refers to the alpha, there seems to be an additional issue.
Thanks in advance!
UPDATE:
After some checking, I found that some really strange symbols were created from a formula which contained the greek letter alpha. So, apparently these unknown symbols led to the error.
However, I still do not know how to exclude any unknown symbols from the text. What is the easiest way to do this?
These are the symbols:
I restricted the string in Java to only latin symbols. maybe that's not the most general way of getting rid of those strange symbols but it works for now.
In MySQL, use CHARACTER SET utf8mb4.
Add ?useUnicode=yes&characterEncoding=UTF-8 to the JDBC URL

Java - How can I use Arabic characters?

I'm creating a flashcard-type game to help in learning a new language. Now the actual language I'm trying to use in my program is Urdu, but when I look at a unicode chart Arabic and Urdu letters are mixed together and I thought more people would know what I'm talking about if I said Arabic.
So, on my Windows 8 machine I can change the keyboard layout to Urdu and whatever I type into Java is correctly displayed back to me. However transferring this code to another computer with Windows 7 (at my school) changes the Urdu characters in the raw Java file to odd-characters/mumbo-jumbo. Coping and pasting the character from the online unicode chart displays in the java file, but is shown as a '?' in the actual program itself, and in the System.out method.
Now when I use the unicode escape commands (ex. \uXXXX) these are displayed correctly on both computers.
The problem is that I don't want to use escape commands every time I want to write something in Urdu. I plan on writing long sentences and many words. So I was thinking of making an array of the unicode codes and then perhaps a method that converts a English string of letters into Urdu using this array but I thought there must be an easier way to fix this problem.
I'm still kinda a beginner, but I wasn't planning on making a very complex program anyway. For any help, thanks.
This sounds like a problem with the encoding in your compiler on the Windows 7 computer. You should make sure that both computers are using encoding that supports arabic/urdu characters, such as UTF-8, when compiling.
If this is not specified, the compiler will use the system's default encoding which might not support arabic/urdu characters. See this link for information on how to find/set encoding properties.
You can get the encoding currently used for compiling by adding this piece of code:
System.out.println(System.getProperty("file.encoding"));

Properly storing copy/pasted text from a Microsoft Office document into a MySQL database

I know that Microsoft office uses different encoding, what happen is when someone copy and paste texts from office to java text panel, it looks OK. But you then store it into MySQL database, and retrieve it. It suddenly become all kind of rubbish Latin characters.
I've tried to convert it to utf-8 before store, but seems not work.
Wonder if there is anyway you can detect whether there is any latin characters in your text, so I can simply popup an alert to let user know before they save it.
Or, if there is anyway to disable the jTextField to only display everything in UTF-8 characters, so that when user copy and paste from word, it auto shows all the random codes instead of looking fine (at the beginning)
Example: With user entered something in word, and paste to jTextField, we pass the string directly(Note our sql database is utf8_general_ci), we then just fetch it to the JPanel, and we get:
ÃÃâ€
’¢â‚¬ââââ‚
I've had similar issues. First thing to do is find out what exactly has been written to the database. This is very easy with MySQL, just logon and run
SELECT HEX( column ) FROM table;
That'll give you the bytes that have been written to the table. You can then use an app I wrote for this very purpose. Take the hex string you got back from MySQL and give it to the main class using the -b flag for bytes. You'll get a whole heap of output, and hopefully one of them will be what you had originally.
Once you know what it's being stored as, you have a starting point for debugging.

Cleaning an inputstring containing binary junk to produce an ascii printable string

In our application we have a textfield that is controlled by TinyMCE. If the customer pastes text from Word into the textfield, Oracle balks when we are trying to store this text in our database:
ORA-01461: can bind a LONG value only for insert into a LONG column
Cleaning the text in, say Notepad, will not produce any problems, so my guess is that the problem lies in the input string containing some kind of binary junk that Oracle uses as a delimiter between the values that are used in the sql insert string.
Upgrading our ancient TinyMCE will probably fix the problem, but I also want to ensure the text really is clean when passed to the lower layers. So I thought that I might ensure the text is true ASCII, and if not, clean everything that does not pass as ASCII by looping through the lines in the input and do the following:
line.replaceAll("[^\\p{ASCII}]", "")
Is this a viable solution, and if not, what are the pitfalls?
What about cleaning the pastes content like i described here?
This might also remove junk.

Categories