can't save persian char (utf-8) in mysql with java - java
I get following link from the google http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello which contain some persian alphabets, so I wanna save it into mysql database with following code :
pageurl = new URL("http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=of");
t = pageurl.openConnection();
t.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
in = new BufferedReader(new InputStreamReader(t.getInputStream()));
preparedStatement2 = con.prepareStatement("update `en_db` set `meaning` = ? where `id` = ?");
preparedStatement2.setString(1, in.readLine());
preparedStatement2.setInt(2, id);
preparedStatement2.executeUpdate();
in.close();
but it will save something wrong in database,like [[["??","of","",""]],[["preposition",["??","?? ????","?? ???","?? ????","?? ???","?","?? ????","?? ????","?? ???","??????"],[["??",["of","from","in","by"]],["?? ????",["of"]],["?? ???",["on behalf of","of","for"]],["?? ????",["about","on","concerning","of","toward","in re"]],["?? ???",["of","with"]],["?",["of"]],["?? ????",["of"]],["?? ????",["of"]],["?? ???",["of"]],["??????",["by","via","per","of","with"]]]]],"en",,[["??",[5],0,0,1000,0,1,0]],[["of",4,,,""],["of",5,[["??",1000,0,0],["?? ??",0,0,0],["??????? ??",0,0,0],["?? ??",0,0,0]],[[0,2]],"of"]],,,,6]
if I print it by System.out.println it will show [[["از","of","",""]],[["preposition",["از","از لحاظ","از طرف","در باره","در جهت","ز","از مبدا","از منشا","در سوی","بوسیله"],[["از",["of","from","in","by"]],["از لحاظ",["of"]],["از طرف",["on behalf of","of","for"]],["در باره",["about","on","concerning","of","toward","in re"]],["در جهت",["of","with"]],["ز",["of"]],["از مبدا",["of"]],["از منشا",["of"]],["در سوی",["of"]],["بوسیله",["by","via","per","of","with"]]]]],"en",,[["از",[5],0,0,1000,0,1,0]],[["of",4,,,""],["of",5,[["از",1000,0,0],["ای از",0,0,0],["استفاده از",0,0,0],["را از",0,0,0]],[[0,2]],"of"]],,,,16]
How should I solve it?
All of the other answers, and: your database connection URL should be something like:
jdbc:mysql://localhost/mydatabase?useUnicode=true&characterEncoding=UTF-8
This ensures that the driver communicates in UTF-8 too.
Important
in = new BufferedReader(new InputStreamReader(t.getInputStream(), "UTF-8"));
I have seen in Stack Overflow answers to google Translate that the header was given with a language to receive the correct encoding, but everything is already fine.
You table fields are defined with a character set which does not support Persian characters (most probably, Latin1)
You need to convert them into a character set which supports them:
ALTER TABLE en_db MODIFY meaning VARCHAR(100) CHARACTER SET UTF8;
(for each field individually), or
ALTER TABLE en_db CONVERT TO CHARACTER SET UTF8;
(for all fields).
Check how your MySql DB is configured.
Take a look on the following article: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html
The encoding can be configured either globally (per DB) or per table:
CREATE TABLE `mytable` (
.................
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Related
AS400 SQL Script on a parameter file returns
I'm integrating an application to the AS400 using Java/JT400 driver. I'm having an issue when I extract data from a parameter file - the data retrieved seems to be encoded. SELECT SUBSTR(F00001,1,20) FROM QS36F."FX.PARA" WHERE K00001 LIKE '16FFC%%%%%' FETCH FIRST 5 ROWS ONLY Output 00001: C6C9D9C540C3D6D4D4C5D9C3C9C1D34040404040, - 1 00001: C6C9D9C5406040C3D6D4D4C5D9C3C9C1D3406040, - 2 How can I convert this to a readable format? Is there a function which I can use to decode this? On the terminal connection to the AS400 the information is displayed correctly through the same SQL query. I have no experience working with AS400 before this and could really use some help. This issue is only with the parameter files. The database tables work fine.
What you are seeing is EBCDIC output instead of ASCII. This is due to the CCSID not being specified in the database as mentioned in other answers. The ideal solution is to assign the CCSID to your field in the database. If you don't have the ability to do so and can't convince those responsible to do so, then the following solution should also work: SELECT CAST(SUBSTR(F00001,1,20) AS CHAR(20) CCSID(37)) FROM QS36F."FX.PARA" WHERE K00001 LIKE '16FFC%%%%%' FETCH FIRST 5 ROWS ONLY Replace the CCSID with whichever one you need. The CCSID definitions can be found here: https://www-01.ibm.com/software/globalization/ccsid/ccsid_registered.html
Since the file is in QS36F, I would guess that the file is a flat file and not externally defined ... so the data in the file would have to be manually interpreted if being accessed via SQL. You could try casting the field, after you substring it, into a character format. (I don't have a S/36 file handy, so I really can't try it)
It is hex of bytes of a text in EBCDIC, the AS/400 charset. static String fromEbcdic(String hex) { int m = hex.length(); if (m % 2 != 0) { throw new IllegalArgumentException("Must be even length"); } int n = m/2; byte[] bytes = new byte[n]; for (int i = 0; i < n; ++i) { int b = Integer.parseInt(hex.substring(i*2, i*2 + 2), 16); bytes[i] = (byte) b; } return new String(bytes, Charset.forName("Cp500")); } passing "C6C9D9C540C3D6D4D4C5D9C3C9C1D34040404040". Convert the file with Cp500 as charset: Path path = Paths.get("..."); List<String> lines = Files.readAllLines(path, Charset.forName("Cp500")); For line endings, which are on AS/400 the NEL char, U+0085, one can use regex: content = content.replaceAll("\\R", "\r\n"); The regex \R will match exactly one line break, whether \r, \n, \r\n, \u0085.
A Big thank you for all the answers provided, they are all correct. It is a flat parameter file in the AS400 and I have no control over changing anything in the system. So it has to be at runtime of the SQL query or once received. I had absolutely no clue about what the code page was as I have no prior experience with AS400 and files in it. Hence all your answers have helped resolve and enlighten me on this. :) So, the best answer is the last one. I have changed the SQL as follows and I get the desired result. SELECT CAST(F00001 AS CHAR(20) CCSID 37) FROM QS36F."FX.PARA" WHERE K00001 LIKE '16FFC%%%%%' FETCH FIRST 5 ROWS ONLY 00001: FIRE COMMERCIAL , - 1 00001: FIRE - COMMERCIAL - , - 2 Thanks once again. Dilanke
Java Special character encoding issue
I tried to insert some special character via java into oracle table and then retrieve it again--assuming my encoding will work. Below is the code which i tried. String s=new String("yesterday"+"\u2019"+"s"); ... statement.executeUpdate("INSERT into test1 values ('"+s+"')"); ResultSet rs=statement.executeQuery("select * from test1"); while (rs.next()) { System.out.println(new String(rs.getString(1).getBytes("UTF-8"),"UTF-8")); } ... Now, when I try to see output via commandline execution it displays special character always: yesterday’s My question is: why even after using encoding, it is not showing expected result. i.e. yesterday’s. Is above mentioned code is not correct or some modification is required? P.S.: In eclipse, the code might result yesterday’s, but if executed via command line , it shows yesterday’s I am using : -- JDK1.6 -- Oracle : 11.1.0.6.0 -- NLS_Database_Parameters: NLS_CHARACTERSET WE8MSWIN1252 --Windows Edit: \u2019 : this is RIGHT SINGLE QUOTATION MARK & I am looking for this character only.
Check the java property "file.encoding" when you run on the commandline, it may be set to something other than "UTF-8" causing the text to display incorrectly when you output on the commandline.
Here is an illustration of what I suggested in a comment (change the character set of your client). Straight from my SQL*Plus: SQL> select unistr('\2019') from dual; U - Æ SQL> $chcp 1252 Active code page: 1252 SQL> select unistr('\2019') from dual; U - ’ If this works for you, you may want to add $chcp 1252 to your [g]login.sql.
The problem is that the character encoding for the apostrophe is \u0027 I ran this in the command line: public class Yesterday{ public static void main(String[] args) { String s = new String("yesterday" + "\u0027" +"s"); System.out.println(s); } } it resulted in: yesterday's
Unknown Os Character Set 'cp720' , switching to the default character set 'latin1'
When I run mysql -root -p db2 <mySuperMarketDB.sql and enter password then I get the following error: Unknown Os Character Set 'cp720' , switching to the default character set 'latin1' How can I fix that?
Change encoding to 1252 c:\chcp 1252 You can change code page permanantly as follows: Start -> Run -> regedit Go to [HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor] Add new String Value named: Autorun Change the value to 'chcp 1252'
If you are trying to run some sql text file in mysql and get an error like this: C:\wamp\bin\mysql\mysql5.6.12\bin\mysql.exe: Unknown OS character set 'cp862'. C:\wamp\bin\mysql\mysql5.6.12\bin\mysql.exe: Switching to the default character set 'latin1'. You get the error because the engine is use the default. To set the default to UTF-8 in "my.ini" file you have to add this lines: [client] default-character-set=utf8 [mysql] default-character-set=utf8 [mysqld] collation-server = utf8_unicode_ci init-connect='SET NAMES utf8' character-set-server = utf8 save the file and restart the service. credits goes here : https://shlomovitz.blogspot.com/2013/11/unknown-os-character-set-cp862-and.html?fbclid=IwAR15ADTZd5E3cKeQX0w2vR6b5Ef8GBW86ptpd-M2CpyORp3bNkLN2QiXLek
How to Generate Matching MD5 Hash: SQL Server nvarchar Field vs. ColdFusion
I am trying to figure out how to generate a matching MD 5 hash value between SQL Server and ColdFusion. The root cause appears to be that the SQL Server field is an nvarchar datatype, which seems to mean I need to do something with the encoding of the string I would hash in ColdFusion or Java to make it match, but I am unable to figure it out. To be clear if this was a SQL Server varchar field, everything works. Here's the code I'm trying: <cfset stringToHash = "Hello world!"> <cfquery name="sqlserver" datasource="#mySqlServerDSN#"> SELECT RIGHT( master.dbo.fn_varbintohexstr( HashBytes( 'MD5', CAST(<cfqueryparam value="#stringToHash#" cfsqltype="cf_sql_varchar"> AS nvarchar(max)) ) ) , 32) AS HASHED </cfquery> <cfoutput> <pre> CF UFT-8: #hash(stringToHash, 'MD5', 'UTF-8')# CF UFT-16: #hash(stringToHash, 'MD5', 'UTF-16')# SQL Server: #sqlserver.hashed# </pre> </cfoutput> Produces CF UTF-8: 86FB269D190D2C85F6E0468CECA42A20 CF UTF-16: 0C89A9720D83539E3723BB99C07D069F SQL Server: f9a6119c6ec37ce652960382f8b59f2c So I'm guessing I need to change the final argument I'm passing to hash() to be a different encoding, but I can't figure it out. I've also tagged this question as Java too, because I'm more than happy to take an answer in that language as well.
By default SQL Server uses the UTF-16 in little-endian byte order character set for nvarchar fields. In ColdFusion you must use the 'UTF-16LE' character set. <cfscript> helloWorld = "Hello, World!"; utf8HashCF = lcase(hash(helloWorld, 'MD5', 'UTF-16LE')); </cfscript> <cfoutput> #utf8HashCF# <br /> </cfoutput>
I'm curious why your sql server column is nvarchar; it's not necessary for hashes. nvarchar is for storing extended character sets, which you shouldn't be getting back from a hash function. Regardless, I tried all of the hash algorithms available in CF9 and none of them generate the hash you're looking for. Unless you need to keep the column set to nvarchar for some reason you haven't already explained, why not change it to varchar?
I don't think it's the CF hashing because if you compare the CF to Java they create the same hash. Both the CF & Java output "65a8e27d8879283831b664bd8b7f0ad4" on my box and it matched the SQL hash when I changed the cast to varchar(32). In the past when I've needed to do any sort of hash creation and comparison, I created a service that returns a string so you don't have to worry about cross platform algorithm issues. You could also just have sql do it all for you, but then you have the business logic in the wrong layers but to each their own. <cfscript> helloWorld = "Hello, World!"; javaString = CreateObject( "java", "java.lang.String" ).Init(helloWorld); javaHash = CreateObject( "java", "java.security.MessageDigest" ).getInstance("MD5"); javaHash.reset(); javaHash.update(javaString.getBytes("UTF-8"),0,javaString.length()); javaBigInt = CreateObject( "java", "java.math.BigInteger" ).Init(1,javaHash.digest()); utf8HashCF = lcase(hash(helloWorld, 'MD5', 'UTF-8')); utf8HashJava = variables.javaBigInt.toString(16); </cfscript> <cfoutput> #utf8HashCF# <br /> #utf8HashJava# </cfoutput>
Java mail vs. MySQL: probably a character encoding issue?
I'm currently fetching data from a MySQL database using JDBC and executeQuery. One of the fields contains the email content, which I fetch via ResultSet.getString("emailBody"). The mail is sent using the following code (simplified): Properties props = new Properties(); Session session; Message message; props.put("mail.smtp.host", "mysmtpserver"); session = Session.getInstance(props, null); message = new MimeMessage(session); message.setFrom(new InternetAddress("myaddress#example.com", "System"); message.setSubject("Automatic notification"); message.setRecipient(RecipientType.BCC, new InternetAddress("admin#example.com", "Admin Distribution List")); // email contains the previously fetched value message.setContent(email, "text/plain"); Transport.send(message); This works fine for all characters, including german umlaute, brackets, etc. Unfortunately the following characters fail: – which is displayed as ? on the mail clients " which becomes \" ' which is sent as \' I couldn't find anything useful on the web, please advise. Many thanks!
Your mail is probably send encoded as iso-8859-1, which does not include the codepoint for en-dash. You could try to specify the charset as utf-8 in the setContent call: message.setContent(email, "text/plain; charset=utf-8"); This does however not explain the problem with quotes you are seeing, but I guess these are actually two different problems.
" turing into \" and ' turining into \' is escaping issue. During insert those values were escaped so it won't break sql insert query. During select you have to unescape them. (don't know specific java functions...)
The quotes problem is happening because at some point the strings are being escaped one too many times. If you select the strings from the database manually, does this return quotes visually escaped? If so, you're escaping too many times before inserting into the database.