Unable to insert UTF-8 characters in mysql [duplicate] - java

I tried to use UTF-8 and ran into trouble.
I have tried so many things; here are the results I have gotten:
???? instead of Asian characters. Even for European text, I got Se?or for Señor.
Strange gibberish (Mojibake?) such as Señor or 新浪新闻 for 新浪新闻.
Black diamonds, such as Se�or.
Finally, I got into a situation where the data was lost, or at least truncated: Se for Señor.
Even when I got text to look right, it did not sort correctly.
What am I doing wrong? How can I fix the code? Can I recover the data, if so, how?

This problem plagues the participants of this site, and many others.
You have listed the five main cases of CHARACTER SET troubles.
Best Practice
Going forward, it is best to use CHARACTER SET utf8mb4 and COLLATION utf8mb4_unicode_520_ci. (There is a newer version of the Unicode collation in the pipeline.)
utf8mb4 is a superset of utf8 in that it handles 4-byte utf8 codes, which are needed by Emoji and some of Chinese.
Outside of MySQL, "UTF-8" refers to all size encodings, hence effectively the same as MySQL's utf8mb4, not utf8.
I will try to use those spellings and capitalizations to distinguish inside versus outside MySQL in the following.
Overview of what you should do
Have your editor, etc. set to UTF-8.
HTML forms should start like <form accept-charset="UTF-8">.
Have your bytes encoded as UTF-8.
Establish UTF-8 as the encoding being used in the client.
Have the column/table declared CHARACTER SET utf8mb4 (Check with SHOW CREATE TABLE.)
<meta charset=UTF-8> at the beginning of HTML
Stored Routines acquire the current charset/collation. They may need rebuilding.
UTF-8 all the way through
More details for computer languages (and its following sections)
Test the data
Viewing the data with a tool or with SELECT cannot be trusted.
Too many such clients, especially browsers, try to compensate for incorrect encodings, and show you correct text even if the database is mangled.
So, pick a table and column that has some non-English text and do
SELECT col, HEX(col) FROM tbl WHERE ...
The HEX for correctly stored UTF-8 will be
For a blank space (in any language): 20
For English: 4x, 5x, 6x, or 7x
For most of Western Europe, accented letters should be Cxyy
Cyrillic, Hebrew, and Farsi/Arabic: Dxyy
Most of Asia: Exyyzz
Emoji and some of Chinese: F0yyzzww
More details
Specific causes and fixes of the problems seen
Truncated text (Se for Señor):
The bytes to be stored are not encoded as utf8mb4. Fix this.
Also, check that the connection during reading is UTF-8.
Black Diamonds with question marks (Se�or for Señor);
one of these cases exists:
Case 1 (original bytes were not UTF-8):
The bytes to be stored are not encoded as utf8. Fix this.
The connection (or SET NAMES) for the INSERT and the SELECT was not utf8/utf8mb4. Fix this.
Also, check that the column in the database is CHARACTER SET utf8 (or utf8mb4).
Case 2 (original bytes were UTF-8):
The connection (or SET NAMES) for the SELECT was not utf8/utf8mb4. Fix this.
Also, check that the column in the database is CHARACTER SET utf8 (or utf8mb4).
Black diamonds occur only when the browser is set to <meta charset=UTF-8>.
Question Marks (regular ones, not black diamonds) (Se?or for Señor):
The bytes to be stored are not encoded as utf8/utf8mb4. Fix this.
The column in the database is not CHARACTER SET utf8 (or utf8mb4). Fix this. (Use SHOW CREATE TABLE.)
Also, check that the connection during reading is UTF-8.
Mojibake (Señor for Señor):
(This discussion also applies to Double Encoding, which is not necessarily visible.)
The bytes to be stored need to be UTF-8-encoded. Fix this.
The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Fix this.
The column needs to be declared CHARACTER SET utf8 (or utf8mb4). Fix this.
HTML should start with <meta charset=UTF-8>.
If the data looks correct, but won't sort correctly, then
either you have picked the wrong collation,
or there is no collation that suits your need,
or you have Double Encoding.
Double Encoding can be confirmed by doing the SELECT .. HEX .. described above.
é should come back C3A9, but instead shows C383C2A9
The Emoji 👽 should come back F09F91BD, but comes back C3B0C5B8E28098C2BD
That is, the hex is about twice as long as it should be.
This is caused by converting from latin1 (or whatever) to utf8, then treating those
bytes as if they were latin1 and repeating the conversion.
The sorting (and comparing) does not work correctly because it is, for example,
sorting as if the string were Señor.
Fixing the Data, where possible
For Truncation and Question Marks, the data is lost.
For Mojibake / Double Encoding, ...
For Black Diamonds, ...
The Fixes are listed here. (5 different fixes for 5 different situations; pick carefully): http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

I had similar issues with two of my projects, after a server migration. After searching and trying a lot of solutions, I came across with this one:
mysqli_set_charset($con,"utf8mb4");
After adding this line to my configuration file, everything works fine!
I found this solution for MySQLi—PHP mysqli set_charset() Function—when I was looking to solve an insert from an HTML query.

I was also searching for the same issue. It took me nearly one month to find the appropriate solution.
First of all, you will have to update you database will all the recent CHARACTER and COLLATION to utf8mb4 or at least which support UTF-8 data.
For Java:
while making a JDBC connection, add this to the connection URL useUnicode=yes&characterEncoding=UTF-8 as parameters and it will work.
For Python:
Before querying into the database, try enforcing this over the cursor
cursor.execute('SET NAMES utf8mb4')
cursor.execute("SET CHARACTER SET utf8mb4")
cursor.execute("SET character_set_connection=utf8mb4")
If it does not work, happy hunting for the right solution.

Set your code IDE language to UTF-8
Add <meta charset="utf-8"> to your webpage header where you collect data form.
Check your MySQL table definition looks like this:
CREATE TABLE your_table (
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8
If you are using PDO, make sure
$options = array(PDO::MYSQL_ATTR_INIT_COMMAND=>'SET NAMES utf8');
$dbL = new PDO($pdo, $user, $pass, $options);
If you already got a large database with above problem, you can try SIDU to export with correct charset, and import back with UTF-8.

Depending on how the server is setup, you have to change the encode accordingly. utf8 from what you said should work the best. However, if you're getting weird characters, it might help if you change the webpage encoding to ANSI.
This helped me when I was setting up a PHP MySQLi. This might help you understand more: ANSI to UTF-8 in Notepad++

Related

Display special characters using entity or hex values

I am trying to display ŵ through my jsf page but unable to do so. Basically the text with special characters is read from properties file , but on my application screen it becomes something else . I did try to use entity values but not succeeding for example if original text is :
ŵyhsne klqdw dwql
then after replacing with with entity or hexvalues:
**&wcirc ;**yhsne klqdw dwql but in my page it displays as it is
I can just guess your question. Please edit it and improve it.
If you are displaying in web, you should use &wcirc; (note: without spaces), but this also requires a fonts on client site that support such character.
If the string is in your code: replace the character with \u0175.
But probably the best way it is to use just ŵ either in code on in web, or on any file, and you should assure that such files (or sources) are interpreted ad UTF-8, and you deliver pages are UTF-8. If you are not using UTF-8, just check in similar way, that you are using consistently the correct encoding.
And sending a character doesn't mean it could be displayed. There is always the possibility that a font will not have all *special" characters in it.

Java MySQL Encoding issue with UTF-8

I have an issue inserting a pdf text into a mysql table. The error message is as follows:
" Incorrect string value: '\xF0\x9D\x9B\xBC i...' for column 'text' at row 1"
I know that this code refers to the greek letter alpha. However, I have set 'characer set' to UTF-8 for the column text but also in the mysql connection. Also, I have tried uft8mb4. However, none of it worked.
The greek letter alpha occurs in different font types. I am not sure if this matters.
Any ideas why this does not work?
I also created a pdf file myself which contained an alpha in the text. For this example, my programme runs without any errors. Although I know that the error message refers to the alpha, there seems to be an additional issue.
Thanks in advance!
UPDATE:
After some checking, I found that some really strange symbols were created from a formula which contained the greek letter alpha. So, apparently these unknown symbols led to the error.
However, I still do not know how to exclude any unknown symbols from the text. What is the easiest way to do this?
These are the symbols:
I restricted the string in Java to only latin symbols. maybe that's not the most general way of getting rid of those strange symbols but it works for now.
In MySQL, use CHARACTER SET utf8mb4.
Add ?useUnicode=yes&characterEncoding=UTF-8 to the JDBC URL

Store Korean characters in MySQL

I have a form but every time I submit with Korean characters, it shows up in my phpmyadmin database as question marks or is extremely convoluted. I want to be able to submit entries in my MySQL database table using both latin and asian characters, also I'm using java in Eclipse.
I have already done the following:
added this in my jsp files
contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"
modified Connector tag in my server.xml file to have
URIEncoding="UTF-8"
modified URL of connection
conn = DriverManager.getConnection("jdbc:mysql://localhost/login?useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8",
"root", "");
added this is in my doPost method in Servlet that handles the form data
response.setContentType("text/html; charset=UTF-8");
and the following screenshots below. Many thanks in advance for any help.
When trying to use utf8/utf8mb4, if you see Question Marks (regular ones, not black diamonds),
The bytes to be stored are not encoded as utf8. Fix this.
The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
Also, check that the connection during reading is utf8.
When trying to use utf8/utf8mb4, if you see Black Diamonds with question marks,
one of these cases exists:
Case 1 (original bytes were not utf8):
The bytes to be stored are not encoded as utf8. Fix this.
The connection (or SET NAMES) for the INSERT and the SELECT were not utf8/utf8mb4. Fix this.
Also, check that the column in the database is CHARACTER SET utf8 (or utf8mb4).
Case 2 (original bytes were utf8):
The connection (or SET NAMES) for the SELECT was not utf8/utf8mb4. Fix this.
Also, check that the column in the database is CHARACTER SET utf8 (or utf8mb4).
Black diamonds occur only when the browser is set to <meta charset=UTF-8>
Note: euckr is not the same as utf8. It can be handled, but different steps are needed.
I recommend you take everything out of the equation except the database to ensure the problem really is with the database. First examine the values in hexadecimal:
SELECT HEX(column_name) FROM table_name
If you see "3F" where you are seeing "?", then there is most likely a problem with the data coming from your web application (more help here in section A.11.2 like using SET NAMES and changing the MySQL INI file). You should also try manually inserting hexadecimal into the database table and selecting it back both normally and as hexadecimal to ensure the data is going in and out correctly.
If you still suspect a database problem, ensure your table is encoding the character set (i.e. DEFAULT CHARSET) correctly, for example (or other examples):
SHOW CREATE TABLE table_name

How can I show arabic query search from MYSQL by JavaFX?

SELECT * FROM `employee` WHERE `name` LIKE "%شريف%"
Above query works fine and find the element by phpmyAdmin query but using it inside JavaFX doesn't get it.
And get the english searchs, So what I need to add in java to permit me search by Arabic.
As per my above comments, I guess it to be a encoding - decoding issue of Java and has nothing specific to do with JavaFX and I also assume that you are not getting any exceptions. You have to use a proper standard while inserting as well as retrieving data. Helpful information is there at , How to store arabic text in mysql database using python?
Refer this article to work only on bytes so your application is always properly internationalized , Byte Encodings and Strings
Refer this one too as how to set encoding in Java , How can I insert arabic word to mysql database using java
Your console might be UTF8 enabled so you are able to match strings there and see Arabic characters.
Hope it helps.

Reading Unicode Text from Java ResultSet

how to read unicode text from java resultset?
rs.getString() returns a Java String which is Unicode by definition.
If you get mangled characters, you have to configure your database driver to use the right encoding for the connection to the database.
Just read the strings. All strings in Java are unicode already. If you're having problems, then:
It could be a diagnostic problem - you may be reading the right data out of the ResultSet but displaying it so it looks like you haven't read it properly
It could be a configuration problem - there may be something you need to do when connecting to the database so that it determines the right encoding to use
It could be a database problem - the database may not be configured to store full Unicode data
It could be a database schema problem - the particular column you're using may be configured using a column type which doesn't support full Unicode
It could be a problem in the data, e.g. with another program incorrectly submitting data.
I've seen all of these before now. You should use detailed logging (e.g. of the individual characters, in hex) to work out whether you've got the data correctly or not - that will tell you where to look next.
If you are using DataSource (f.e. com.mysql.jdbc.jdbc2.optional.MysqlDataSource) you can directly set channel encoding to UTF8 like ds.setEncoding("UTF-8")

Categories