I´ve been having this problem for a long time, I´ve searched the internet many times for the solution, tried lots of them but not found an adequate solution.
I really don´t know what to do so if you could please help me I´d be very thankful.
(Sorry for my poor english).
Question: How can I solve the charset incompatibility between the input archive and a MYSql table?
Problem: When importing the archive from on my computer the information appears in my database, but some chars as ('ã', 'ç', 'á', etc..) are shown as ?.
Aditional information
I'm using MYSql, my version and variable status are:
MySQL VERSION : 5.5.10
HOST : localhost
USER : root
PORT : 3306
SERVER DEFAULT CHARSET : utf8
character_set_client : utf8
character_set_connection : utf8
character_set_database : utf8
character_set_filesystem : BINARY
character_set_results : utf8
character_set_server : utf8
character_set_system : utf8
collation_connection : utf8_general_ci
collation_database : utf8_general_ci
collation_server : utf8_general_ci
completion_type : NO_CHAIN
concurrent_insert : AUTO
The query that´s being used is:
LOAD DATA LOCAL INFILE 'xxxxx/file.txt'
INTO TABLE xxxxTable
FIELDS TERMINATED BY ';'
LINES TERMINATED BY ' '
IGNORE 1 LINES
( status_ordenar,numero,newstatus,rede,data_emissao,inicio,termino,tempo_indisp
, cli_afet,qtd_cli_afet,cod_encerr,uf_ofensor,localidades,clientes_afetados
, especificacao,equipamentos,area_ofens,descricao_encerr,criticidade,cod_erro
, observacao,id_falha_perc,id_falha_conf,nba,solucao,falhapercebida,falhaconfirmada
, resp_i,resp_f,resp_ue,pre_handover,falha_identificada,report_netcool,tipo_falha
, num_notificacao,equip_afetados,descricao)
About the file being imported:
I´ve opened the file with open office whith 3 charsets:
UTF8 - Gave me strange chars in place of the 'ç', 'ã', etc...
ISO-8859-1 - OK.
WIN-1252 - OK.
ASCII/US - OK.
Already tested: I´ve tested some charsets in my database: latin1, utf-8, ascii, but all of them gave me the same result (? instead of 'á', 'ç' etc).
Extra: I'm using Java with Java JDBC to generate and send the query.
file.txt is saved in ISO-8859-1 or Windows-1252 (these two are very similar), and being interpreted as UTF-8 by MySQL. These are incompatible.
How can I tell?
See point 3.: the file displays correctly when interpreted as ISO-8859-1 or Windows-1252.
See point 1.: character_set_database : utf8
Solution: either convert the file to UTF-8, or tell MySQL to interpret it as ISO-8859-1 or Windows-1252.
Background: the characters you provide (ã etc.) are single-byte values in windows-1252, and these bytes are illegal values in UTF-8, thus yielding the '?'s (unicode replacement characters).
Snippet from MySQL docs:
LOAD DATA INFILE Syntax
The character set indicated by the character_set_database system variable is used to interpret the information in the file.
Saved your characters with standard Windows Notepad as UTF-8 file (Notepad++ is also OK).
Exact file content:
'ã', 'ç', 'á'
MySQL version: 5.5.22
Database charset: utf8
Database collation: utf8_general_ci
CREATE TABLE `abc` (
`qwe` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Imported data with command
LOAD DATA LOCAL INFILE 'C:/test/utf8.txt'
INTO TABLE abc
FIELDS TERMINATED BY ';'
LINES TERMINATED BY ' '
IGNORE 1 LINES
( qwe)
Result (displayed in SQLyog):
So, first - you should check original file with reliable editor (notepad, notepad++). If file corrupted, then you should take another file.
Second - if file is OK, add you Java code for sending data to MySql into question.
Related
I'm using Jsoup to scrape a webpage. It takes the text and enters it directly into the database.
The text on the target webpage looks perfectly fine, but after entering it into the database i get question marks replacing certain characters.
For example the single right quotation marks (U+2019) in the following sentence:
I can’t imagine uh, a domain of human endeavor that isn’t impacted by
the imagination.
Will show up like this in the database and on the webpage i'm outputting it on:
I can?t imagine uh, a domain of human endeavor that isn?t impacted by
the imagination.
Initially i thought this was just a problem with the charset/collation of the database but after trying out different types, the problem persists...
The sql database i'm currently working in is in utf-8:
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
And the meta is set:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
I've tried specifically setting it in java like so:
url = "jdbc:mysql://localhost:3306/somedb?useUnicode=true&characterEncoding=utf-8";
I've tried sql queries like:
SET NAMES 'utf8'
SET CHARACTER SET utf8
I've tried creating a new database and nothing seems to work..
Any ideas why this might be happening?
Jsoup automatically detects the charset for the webpage being crawled.
However, many websites do not set character set encoding along with the content-type header by not defining charset.
If you crawl such webpage, where the charset attribute is missing in HTTP response Content-Type header, Jsoup parses the page using platform’s default character set. That also means that you might not get expected results as the platform’s default character set might be different from the webpage you are crawling.
It might result in loss of characters or them being parsed/printed incorrectly.
To avoid such behavior you need to read the URL as InputStream and manually specify your desired character set
in parse method of Jsoup as given below:
String page = "http://www.somepage.com";
//get input stream from the URL
InputStream in = new URL(page).openStream();
//parse document using input stream and specify the charset
Document doc = Jsoup.parse(in, "ISO-8859-1", page);
//..do your processing
There are several steps to make a page work correctly.
See "question mark" in Trouble with UTF-8 characters; what I see is not what I stored
I tried to insert some special character via java into oracle table and then retrieve it again--assuming my encoding will work.
Below is the code which i tried.
String s=new String("yesterday"+"\u2019"+"s");
...
statement.executeUpdate("INSERT into test1 values ('"+s+"')");
ResultSet rs=statement.executeQuery("select * from test1");
while (rs.next()) {
System.out.println(new String(rs.getString(1).getBytes("UTF-8"),"UTF-8"));
}
...
Now, when I try to see output via commandline execution it displays special character always: yesterday’s
My question is: why even after using encoding, it is not showing expected result. i.e. yesterday’s. Is above mentioned code is not correct or some modification is required?
P.S.: In eclipse, the code might result yesterday’s, but if executed via command line , it shows yesterday’s
I am using :
-- JDK1.6
-- Oracle : 11.1.0.6.0
-- NLS_Database_Parameters: NLS_CHARACTERSET WE8MSWIN1252
--Windows
Edit:
\u2019 : this is RIGHT SINGLE QUOTATION MARK & I am looking for this character only.
Check the java property "file.encoding" when you run on the commandline, it may be set to something other than "UTF-8" causing the text to display incorrectly when you output on the commandline.
Here is an illustration of what I suggested in a comment (change the character set of your client). Straight from my SQL*Plus:
SQL> select unistr('\2019') from dual;
U
-
Æ
SQL> $chcp 1252
Active code page: 1252
SQL> select unistr('\2019') from dual;
U
-
’
If this works for you, you may want to add $chcp 1252 to your [g]login.sql.
The problem is that the character encoding for the apostrophe is \u0027
I ran this in the command line:
public class Yesterday{
public static void main(String[] args) {
String s = new String("yesterday" + "\u0027" +"s");
System.out.println(s);
}
}
it resulted in:
yesterday's
When I run mysql -root -p db2 <mySuperMarketDB.sql and enter password then I get the following error:
Unknown Os Character Set 'cp720' , switching to the default character set 'latin1'
How can I fix that?
Change encoding to 1252
c:\chcp 1252
You can change code page permanantly as follows:
Start -> Run -> regedit
Go to [HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor]
Add new String Value named: Autorun
Change the value to 'chcp 1252'
If you are trying to run some sql text file in mysql and get an error like this:
C:\wamp\bin\mysql\mysql5.6.12\bin\mysql.exe: Unknown OS character set 'cp862'.
C:\wamp\bin\mysql\mysql5.6.12\bin\mysql.exe: Switching to the default character set 'latin1'.
You get the error because the engine is use the default. To set the default to UTF-8 in "my.ini" file you have to add this lines:
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
save the file and restart the service.
credits goes here : https://shlomovitz.blogspot.com/2013/11/unknown-os-character-set-cp862-and.html?fbclid=IwAR15ADTZd5E3cKeQX0w2vR6b5Ef8GBW86ptpd-M2CpyORp3bNkLN2QiXLek
i want to load csv arabic file into mysql
I tried the following
set session collation_database=utf8_general_ci;
set session character_set_database=utf8 ;
#SET NAMES utf8;
#SET character_set_database=utf8;
LOAD DATA local INFILE "D:\\trade20120314.csv"
INTO TABLE trade
CHARACTER SET utf8
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n' ;
but the arabic columns still appears as
"?????? ????????????????".
I am sure that the csv file is encoded to utf8 and the default character set of server and client is as utf8 and also the columns character set and collation as utf8 .
I don't know where is the problem , any suggestions .
Thanks .
Try adding to the JDBC URI ?useUnicode=true&characterEncoding=UTF-8 (or &...). The driver class also has to know the transfer protocol. (Of course the file should be in UTF-8.)
I am using Ajax call to insert Indian characters in MySQL database. I am facing an UTF-8 encoding problem in between flow of my application.
When I am inserting the non-English characters directly by JDBC (not using an Ajax call), then it's showing "????" in the database.
When I include
response.setCharacterEncoding("UTF-8");
request.setCharacterEncoding("UTF-8");
response.setContentType("text/html;charset=UTF-8");
in my JSP file, then I receive the following in my database (question marks instead of non-English characters):
????????
When I do not include above lines it shows me junk character like this in database:
મà«?àª?પà«?ષà«?àª
Whereas the actual value is
મખપષ
So the actual problem lies in or after sending insert request to MySQL command in JSP through JDBC jdbc connector.
I have included following tags in all my JSP files to ensure character encoding:
<%#page contentType="text/html"%>
<%#page pageEncoding="UTF-8"%>
and
<meta http-equiv="Content-Type" content="text/html; charset=utf-8;charset=UTF-8">
I checked MySQL tables are Unicode enabled and I am able to enter correctly non English text through terminal.
How is this problem caused and how can I solve it?
Now, i am able to write using a insert statement only....but when i mix some queries and insert statement then... my application return me following error:
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
following are my database variables:
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
| completion_type | 0 |
| concurrent_insert | 1 |
| connect_timeout | 10 |
When I am inserting the non-English characters directly by JDBC (not using an Ajax call), then it's showing "????" in the database.
This will only happen when the both sides are perfectly aware of the character encoding differences in each side. Any character which is not covered by the character encoding used on the other side will be replaced by a question mark ?. Otherwise you would have seen Mojibake.
In this particular case, those sides are the Java side and the database side, with the JDBC driver as mediator. To fix this, you need to tell the JDBC driver what encoding those characters are in. You can do that by setting the useUnicode=true&characterEncoding=UTF-8 parameters in the JDBC connection URL.
jdbc:mysql://localhost:3306/dbname?useUnicode=true&characterEncoding=UTF-8
Then, depending on how you're sending the parameters from the client to server, you possibly also need to fix the request encoding. Given the fact that you're seeing Mojibake when you removes request.setCharacterEncoding("UTF-8"), you're using POST. So that part is fine.
For the case that, if you were using GET to send the parameters, you would need to configure URI encoding in the server side. It's unclear what server you're using, but in case of for example Tomcat, it's a matter of editing the <Connector> entry in /conf/server.xml as follows:
<Connector ... URIEncoding="UTF-8">
See also:
Unicode - How to get the characters right?
Please help me to solve this issue...
You need to figure out where in the processing chain things are going on.
You say that you have created the tables correctly and that you can enter and display text from your terminal. You have something that is known to be handling these characters correctly, so try the following experiments ... in this order ... to isolate where things are going wrong.
Using the mysql command, attempt to insert a row containing the problem characters, and then to select and display the inserted row.
Write a simple Java program to do the same, using the JDBC URLs that your application is currently doing.
Modify your app to capture and log the request parameter strings it is receiving from the browser.
(If possible) capture the requests as received by your server and as sent by the browser. Check both the request parameters and the headers.
One aside:
Place the directives together without spaces. (You can even combine the attributes inside a single #page.) Because HTTP headers should be set before HTML content is written. Because of page buffering this is not strictly needed, but formally yes.
The other answers till now are true.
One additional issue is the database, table and field definitions which all can have a default and actual character set.
Off course one should be really careful, મ��પ�ષ�ઠmight be a wrong display of the correct data, as the displaying program might not be using UTF-8.