Escaping issue with MySQL JDBC connector - java

So I'm trying to input blog comments into a database for an NLP experiment but I'm having some issues: I'm using prepare statements on the inserts but all the single quotes are turning into question marks.
I'm testing on OS X and don't know the character encoding: I assume it's default isn_swedish, etc, but after a few hours of scattered Googling I haven't been able to figure out how to determine it. I'm submitting something like "I didn't say that" as a param to
PreparedStatement statement = connect.prepareStatement("INSERT IGNORE INTO bwog.article (article_id, date, title, content, url) VALUES (?, ?, ?, ?, ?)");
...
...
String s = "I didn't say that"; //not literal string, but printlns like this
statment.setString(4, s);
and it's turning into "I didn?t say that" in the database after execution and all that.
I assume it's some kind of assumption issue where I didn't know about or forgot to fulfill some precondition.
SOLUTION: It was character encoding. Database and tables were in UTF-8 but command line connection was in latin1 for all the "character_set%" variables, so even though the data was fine it appeared garbled.

In order to remove this from the "Unanswered" filter...
Prediction: Your problem is character encoding. I bet your database and tables are in UTF-8 but your command line connection is in latin1 for all the "character_set%" variables, so even though the data is fine it appears garbled.

Related

Can write but cannot read Unicode from MYSQL table using java and jdbc

I can successfully write to a mysql database table using java/jdbc for the unicode text "привет моя работа программист"
When I search the database table using the mysql command prompt on windows 10 I see the exact text in the table.
However when I read the text back using java jbdc the text from the result set is as follows
привет Ð¼Ð¾Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð° программиÑÑ‚
The url I use to call is
jdbc:mysql://localhost/dbname?useUnicode=true&characterEncoding=utf-8
I use the following code
PreparedStatement ps = con.prepareStatement(SELECT_STATEMENT_EMAIL);
ps.setString(1, idemail);
ps.setString(2, password);
ResultSet res = ps.executeQuery();
if (res.next()) {
String description = res.getString("description");
}
I have converted the database and database table to utf8 using the following commands
ALTER DATABASE database_name CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Can anyone point me in the right direction?
Change the file encoding to UTF8.
Most likely, you have IntelliJ IDEA that likes to change the encoding of the file...
I fear you have two problems.
привет м sounds like Mojibake.
In trying to decode that, I get привет мо� работа программи�т. The black diamonds usually come from the wrong <meta...> on the output page. But they must be coming from somewhere else.
Never mind. I see that я is D18F. But 8F, treated as latin1 seems to be a non-printing character, thereby messing up the cadence of 2-byte utf8 codes, leading to the black diamond.
The decoding was BINARY(CONVERT(col USING latin1)), but I suspect that cannot be relied upon.
Mojibake usually comes from
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
xx The column in the table was declared CHARACTER SET latin1. (Or possibly it was inherited from the table/database.) (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
So, that gives you two things to fix. And, sorry, but I don't think the data already in the table can be fixed. However, I could look deeper, if you provide SELECT col, HEX(col) FROM ... WHERE ... showing either that string, or at least some string with я or с in it.

What's wrong with my JDBC sql statement

I'm writing a java socket app that allows a client to communicate with a server, one of the other requirements is that it also needs to initialize JDBC. I believe I have wrote my JDBC connection method correctly, and my insert statement has worked like this on similar projects. It might be a simple mistake as i'm not using an IDE, can someone tell me what is wrong with my SQL statement? All the info is right, but it won't compile.
Error:
C:\Users\imallin\My Documents> javac provider.java
Provider.java:88 ';' expected
String sql = "Insert INTO 'users' ('ID', 'firstName') VALUES ("123","123")";
Your immediate problem is that you need to escape the double quotes that are in your string. This is because when the compiler see's another " it thinks it is the end of the String definition and exepcts a semi-colon.
String sql = "Insert INTO 'users' ('ID', 'firstName') VALUES (\"123\",\"123\")";
Now that the Java compiler is happy, you will have SQL-related issues.
In general with SQL, you will want to use single quotes to represent a string. It appears MySQL specifically allows double quotes, but only when the SQL QUOTES ANSI mode is not set. So it is best to use single quotes to represent strings here.
Here is what you probably want, assuming that the ID column is an integer, and that the firstName column is a string/varchar.
String sql = "Insert INTO users (ID, firstName) VALUES (123,'123')";
To slightly differ from the other answers that have been posted, you need to not use double quotes in your SQL. The single quotes you've used are all in the wrong places, and the double quotes are simply not allowed. Your statement should look like
String sql = "Insert INTO users (ID, firstName) VALUES ('123','123')";
It looks like you haven't escaped the double quotes in your SQL statement. Java sees your string as finishing before the first 123.
In the line:
String sql = "Insert INTO 'users' ('ID', 'firstName') VALUES ("123","123")";
The double quoted string ends after VALUES (, and is immediately followed by a numeric token. That's illegal in Java. The immediate fix is to add backslashes:
String sql = "Insert INTO 'users' ('ID', 'firstName') VALUES (\"123\",\"123\")";
Though this would also work (assuming it's talking about integers, not strings):
String sql = "Insert INTO 'users' ('ID', 'firstName') VALUES (" + 123 + "," + 123 + ")";
More generally though, what's wrong with it is that you're doing an INSERT without using parameterization. This is virtually always the wrong thing in real code! JDBC has good support for parameterized queries, which you should use.
You can use single quotes instead.
"Insert INTO users (ID, firstName) VALUES ('123','123')";

Java PreparedStatement setString changes characters

As in title: to be sure, I was debugging my application, and so in line, where I put strings into PreparedStatement variable, special characters are changing to "?". I actually don't know where to search for things that should repair it, so I don't know if code is required.. Anyway, I'll put some here:
PreparedStatement stm = null;
String sql = "";
try{
sql = "INSERT INTO methods (name, description) VALUES (?, ?)";
stm = connection.prepareStatement(sql);
stm.setString(1, method.getName());
stm.setString(2, method.getDescription());
//...
}catch(Exception e){}
while debugging 'name' field was correct in method object, but after adding it into stm variable, it changed it's characters to '?'.
I have found one topic about the similar sitoatuin on SO, but there wasn't any answer that could help me since I exactely know that there is something not right in adding string to statement, not in database. But I don't know what..
Any sugestions?
PS. I'm using netbeans 6.7.1 version
EDIT: I was debugging with standard netbeans debugger, and was checking state of variables before adding strings to 'stm' variable. I was even changing getName() method to static string with special characters. So for sure everything is ok with Method class.
EDIT2: I've made one more test. Checked stm variable and one of it's properties is "charEncoding" which is set to "cp1252". So the main question is.. how to change that?
this normally happens by using different charsets in different locations. sound like you're getting your input as UTF-8, converting it to another chatset (maybe your database is set to something else) which breaks the special character.
to fix this: use the same charset everywhere*. (i would recommend using UTF-8)
*take a look at this or my answer to another thread (that's about a problem in php, but in java it's almost the same)
Sounds like a character encoding issue to me. Perhaps the driver is transcoding your strings into the appropriate encoding for the field/table/schema/database rather than letting the server do it? If you are trying to store a character which has no representation in the encoding of the field/table/schema/database, that would explain the '?' characters.
Are you using Oracle? I have had similar situations, if the environment variables regarding character sets weren't defined correctly.
By default, an Oracle connection is ASCII (7-bit characters, A-Z, a-z, numbers, punctuation, ...). If you use any character outside of that (e.g. European accents, Chinese characters, ..) then you need to use something other than ASCII. UTF-8 is best. If you don't, your characters will get replaced by "?".
You'd need to get your sysadmin to set this up for you. Alternatively take a look here:
http://arjudba.blogspot.com/2009/02/what-is-nlslang-environmental-variable.html

Java PreparedStatement UTF-8 character problem

I have a prepared statement:
PreparedStatement st;
and at my code i try to use st.setString method.
st.setString(1, userName);
Value of userName is şakça. setString methods changes 'şakça' to '?akça'. It doesnt recognize UTF-8 characters. How can i solve this problem?
Thanks.
The number of ways this can get screwed up is actually quite impressive. If you're using MySQL, try adding a characterEncoding=UTF-8 parameter to the end of your JDBC connection URL:
jdbc:mysql://server/database?characterEncoding=UTF-8
You should also check that the table / column character set is UTF-8.
Whenever a database changes a character to ?, then it simply means that the codepoint of the character in question is completely out of the range for the character encoding as the table is configured to use.
As to the cause of the problem: the ç lies within ISO-8859-1 range and has exactly the same codepoint as in UTF-8 (U+00E7). However, the UTF-8 codepoint of ş lies completely outside the range of ISO-8859-1 (U+015F while ISO-8859-1 only goes up to U+00FF). The DB won't persist the character and replace it by ?.
So, I suspect that your DB table is still configured to use ISO-8859-1 (or in one of other compatible ISO-8859 encodings where ç has the same codepoint as in UTF-8).
The Java/JDBC API is doing its job perfectly fine with regard to character encoding (Java uses Unicode all the way) and the JDBC DB connection encoding is also configured correctly. If Java/JDBC would have incorrectly used ISO-8859-1, then the persisted result would have been Åakça (the ş exist of bytes 0xC5 and 0x9F which represents Å and a in ISO-8859-1 and the ç exist of bytes 0xC3 and 0xA7 which represents
à and § in ISO-8859-1).
setString methods changes 'şakça' to
'?akça'
How do you know that setString changes this? Or do you see the content in the database and decide this?
It could be that the database is not configured for UTF-8, or simply that the tool you use to see the contects of the database (SQL*PLUS for Oracle...) is not capable of diaplaying UTF-8.
you can use query as below to set unicode strings in prepared statement.
PreparedStatement st= conn.prepareStatement("select * from users where username=unistr(?)");// unistr method is for oracle
st.setString(1, userName);

DB2 database using unicode

I have a problem with DB2 databases that should store unicode characters. The connection is established using JDBC.
What do I have to do if I would like to insert a unicode string into the database?
INSERT INTO my_table(id, string_field) VALUES(1, N'my unicode string');
or
INSERT INTO my_table(id, string_field) VALUES(1, 'my unicode string');
I don't know if I have to use the N-prefix or not. For most of the databases out there it works pretty well when using it but I am not quite sure about DB2. I also have the problem that I do not have a DB2 database at hand where I could test these statements. :-(
Thanks a lot!
The documentation on constants (as of DB2 9.7) says this about graphic strings:
A graphic string constant specifies a varying-length graphic string consisting of a sequence of double-byte characters that starts and ends with a single-byte apostrophe ('), and that is preceded by a single-byte G or N. The characters between the apostrophes must represent an even number of bytes, and the length of the graphic string must not exceed 16 336 bytes.
I have never heard of this in context of DB2. Google learns me that this is more MS SQL Server specific. In DB2 and every other decent RDBMS you only need to ensure that the database is using the UTF-8 charset. You normally specify that in the CREATE statement. Here's the DB2 variant:
CREATE DATABASE my_db USING CODESET UTF-8;
That should be it in the DB2 side. You don't need to change the standard SQL statements for that. You also don't need to worry about Java as it internally already uses Unicode.
Enclosing the unicode string constant within N'' worked through JDBC application for DB2 DB.

Categories