Getting question marks when inserting Hebrew characters into a MySQL table

Getting question marks when inserting Hebrew characters into a MySQL table - java

I'm using Netbeans building a web application using Java, JSP that handle a database with Hebrew fields.
The DDL is as follows:
String cityTable = "CREATE TABLE IF NOT EXISTS hebrew_test.table ("
+"id int(11) NOT NULL AUTO_INCREMENT,"
+"en varchar(30) NOT NULL,"
+"he varchar(30) COLLATE utf8_bin NOT NULL,"
+"PRIMARY KEY (id)"
+") ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1;";
String insert = "INSERT INTO hebrew_test.table (en, he) VALUES ('A','a')";
String insert2 = "INSERT INTO hebrew_test.table (en, he) VALUES ('B','ב')";
String insert3 = "INSERT INTO hebrew_test.table (en, he) VALUES ('C','אבג')";
executeSQLCommand(cityTable);
executeSQLCommand(insert);
executeSQLCommand(insert2);
executeSQLCommand(insert3);
The output tabel I get:
1 A a
2 B ?
3 C ???
Instead of:
1 A a
2 B ב
3 C אבג
I tried Hebrew appears as question marks in Netbeans, but that isn't the same problem. I get the question marks in the table.
Also I defined the table to be in UTF8_bin as you can see in the above code.

You need to tell the JDBC driver to use UTF-8 encoding while decoding the characters representing the SQL query to bytes. You can do that by adding useUnicode=yes and characterEncoding=UTF-8 query parameters to the JDBC connection URL.
jdbc:mysql://localhost:3306/db_name?useUnicode=yes&characterEncoding=UTF-8
It will otherwise use the operating system platform default charset. The MySQL JDBC driver is itself well aware about the encoding used in both the client side (where the JDBC code runs) and the server side (where the DB table is). Any character which is not covered by the charset used by the DB table will be replaced by a question mark.
See also:
Spring Encoding with CharacterEncodingFilter in web.xml

You're including your values directly into the SQL. That's always a bad idea. Use a PreparedStatement, parameterized SQL, and set the values as parameters. It may not fix the problem - but it's definitely the first thing to attempt, as you should be using parameterized SQL anyway. (Parameterized SQL avoids SQL injection attacks, separates code from data, and avoids unnecessary conversions.)
Next, you should work out exactly where the problem is really occurring:
Make sure that the value you're trying to insert is correct.
Check that the value you retrieve is correct.
Check what's in your web response using Wireshark - check the declared encoding and what's in the actual data
When checking the values, you should iterate over each character in the string and print out the value as a UTF-16 code unit (either use toCharArray() or use charAt() in a loop). Just printing the value to the console leaves too much chance of other problems.
EDIT: For a little context of why I wrote this as an answer:
In my experience, including string values as parameters rather than directly into SQL can sometimes avoid such issues (and is of course better for security reasons etc).
In my experience, diagnosing whether the problem is at the database side or the web side is also important. This diagnosis is best done via logging the exact UTF-16 code units being used, not just strings (as otherwise further encoding issues during logging or console output can occur).
In my experience, problems like this can easily occur at either insert or read code paths.
All of this is important as a way of moving the OP forward, not just in a comment-like request for more information.

Related

Java / Sql-server parameter binding does not work as expected

We notice a strange behaviour in our application concerning bind parameters. We use Java with JDBC to connect to a Sql Server database. In a table cell we have the value 'µ', and we compare it with a bind parameter, which is also set to the value 'µ'.
Now, in a sql statement like "... where value != ?", where 'value' is the value of 'µ' in the database and ? the bind variable, which is also set to 'µ', we notice that we get a record, though we would expect that 'µ' equals 'µ'.
The method that we use to fill the bind parameter is java.sql.PreparedStatement.setString(int, String).
Some facts:
The character value of µ in different encodings is:
ASCII (ISO-8859-1) : 0xB5
UTF-8 : 0xC2B5
UTF-16 (= Java) : 0x00B5
Now I did some investigations to see which bytes the database actually sees. Therefor I tried a sql-statement like this:
select convert(VARBINARY(MAX), value), -- selects µ from database table
convert(VARBINARY(MAX), N'µ'), -- selects µ from literal
convert(VARBINARY(MAX), ?) -- selects µ from bind parameter
from ...
The result for the three values is:
B500
B500
C200B500 <-- Here is the problem!
So, the internal representation of µ in the database and as NVARCHAR literal is B500.
Now we can't understand what is going on here. We have the value of 'µ' in a Java variable (which should internally be 0x00B5). When it is passed as bind variable, then is seems as if it is converted to UTF-8 (which makes byte sequence 0xC2B5), and then the database treats it as if it were two characters, making the sequence of characters C200B500 from it.
To make things even more confusing:
(1) On an other machine with a different database the same code works like expected. The result of the three lines is B500/B500/B500, so the bind variable is converted to be a proper B500.
(2) On the same machine, the same database but a different program (but using the same jdbc driver library and the same connect parameters) this also works as expected, giving the result of B500/B500/B500.
Some additional facts, maybe they are important:
The database is Sql Server 2014
Java is Java 7
The application in question is a webapp running in Tomcat 7.
Jdbc library is sqljdbc 4.2
Any help to sort this out is greatly appreciated!

I now found the solution. It did not at all have something to do with Sql Server or binding, but instead...
Tomcat 7 is not running in UTF-8 mode by default (I wasn't aware of that). The µ we are talking about comes from an other application that is providing this value via webservice calls. However, this application is using UTF-8 as default. So, it was sending an UTF-8 µ, but the webservice did not expect UTF-8 and thought that it would be two characters, and treated them like this, filling the internal String variable with the character for 0xC2 and 0xB5 (which is, for Sql Server, C200B500).

JTDS: Unicode parameters using CallableStatement with sendStringParametersAsUnicode=false

We have encountered the performance issues described in the JTDS documentation regarding index scans (SQL Server 2000 and upwards), and have therefore had to set the sendStringParametersAsUnicode parameter to false.
This is fine for 99.9% of our cases, however, we have an application that does rely on unicode data in an ntext field. We write to the aforementioned table using a stored procedure, which has an NTEXT parameter. Since changing the above setting, our unicode strings are translated to '?' characters, which is not particularly useful.
I have fiddled with various things, including:
setObject(1, unicode_string, Types.NCLOB); //as well as NVARCHAR
stmt.setUnicodeStream(1, new ByteArrayInputStream(unicode_string.getBytes("UTF16")), unicode_string.length());
setNClob(1, unicode_string);
None of these however work. Any ideas?

One workaround (though its not the correct answer), is to use a Statement rather than a CallableStatement:
stmt = cn.createStatement();
stmt.execute("INSERT INTO test_unicode (my_unicode) VALUES (N'" + input + "')");
This is however presents a significant performance overhead.

Escape sequence when adding multiple records to DerbyDB

I'm converting (or trying to) an Ms AccessDB into derby.
When I extract the data from certain varchar / text / memo field from access they are filled with apostrophe, and mathematical symbols (percent, less than etc), and possible foreign characters
I need to keep these and I test for them so as I can use an 'escape sequence' to ensure they get put into the database.
However for now I am unable to get the data into the DB without it failing on these fields. When the SQL fails I output the SQL string, and cut and past it into ij. Then I modify just the first record, and it is always these characters that cause me grief.
I've tried to modify the strings by surrounding with "double quote marks" but that just gives a different error (stating that it has 'enounterd """ at line1 column x' which is always the first occurance of the double quote).
I haven't found a setting in derby to alter the behaviour for strings, yet. Is there one?
I have also tried to set the SQL statment to a preparedStatement then use the {call preparedStatement} again this fails also. I can't use the {escape "escape char} in a normal statment as derby just says incorrect syntax at me.
How do others manage to get user content with strange characters into a field in derby?
Do I need to change my field into a CLOB or something other than varchar / long Varchar?
Are my problems being caused by using the wrong characteset (eg iso rather UTF-8), how do I tell what it is, how to change it?
Below is a sample of the SQL insert that fails when I send it to derby (via my JAVA 'programme')
insert into S1.SORTIEDESSAI (OBS, DATEDUSORTIE, CONTREINDIC, FIN,
PDEVU, REFUS, INVDECISN, ADMIN, MOTIF_DE_LA_SORTIE, NOMVALIDEE,
DATEVALIDEE) values ('"0001/0001"' , '2007-07-15' , false , true ,
'"null"' , '"null"' , '"null"' , '"null"' , '"2. FIN DE L’ESSAI"' ,
'"DR SIMON"' , '2011-04-19' )
Note:
Actually I look at the above and notice that the order of columns names isn't good? It was OK yesterday, not sure why it would have changed? something to do with Access returning the column names in a random order from the resultSetMetaData, which would be a surprise.
for now I recomend any further answers to hold off whilst I sort this problem out, OK solved that problem, do I need to set another question about this behaviour....
Back to the main thread...
Ok as you can see on my SQL statement I have wrapped any varchar fields in double quotes. This always fails (even directly through ij). help help help...

I'm not quite sure what your question is, but in general you can input these characters by using a PreparedStatement of the form: INSERT INTO tablename (columnname) values (?), and then using the PreparedStatement.setString() method to supply your character data for that column.

Error Inserting Java Character object value into Oracle CHAR(1) column

I'm using a Spring jdbcTemplate.update(String sql, Object[] args) to execute a prepared insert statement on an Oracle database. One of the objects is a Character object containing the value 'Y', and the target column is of CHAR(1) type, but I'm receiving a
java.sql.SQLException: Invalid column type
exception.
I've debugged this backwards and forwards and there is no doubt that it is this one particular object that is causing the problem. The insert executes as expected when this Character Object is omitted.
I can also output the sql and Object[] values, copy the sql into sql developer, replace the value placeholders (?'s) with the actual values of the Objects, and the insert will work fine.
The sql (obfuscated to protect the guilty):
INSERT INTO SCHEMA.TABLE(NUMBER_COLUMN,VARCHAR_COLUMN,DATE_COLUMN,CHAR_COLUMN) VALUES (?,?,?,?);
The object values:
values[0] = [123]
values[1] = [Some String]
values[2] = [2012-04-19]
values[3] = [Y]
The combination run manually in sql developer and that works just fine:
INSERT INTO SCHEMA.TABLE(NUMBER_COLUMN,VARCHAR_COLUMN,DATE_COLUMN,CHAR_COLUMN) VALUES (123,'Some String','19-Apr-2012','Y');
The prepared statement sql itself is generated dynamically based on the non-null instance variable objects contained within a data transfer object (we want the database to handle generation of default values), so I can't accept any answers suggesting that I just rework the sql or insertion routine.
Anyone ever encountered this and can explain to me what's going on and how to fix it? It's frustratingly bizzare that I can't seem to insert a Character object into a CHAR(1) field. Any help would be much appreciated.
Sincerely, Longtime Lurker First-time Poster

There is no PreparedStatement.setXxx() that takes a character value, and the Oracle docs states that all JDBC character types map to Java Strings. Also, see http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html#1039196, which does not include a mapping from Java char or Character to a JDBC type.
You will have to convert the value to a String.

Hibernate and padding on CHAR primary key column in Oracle

I'm having a little trouble using Hibernate with a char(6) column in Oracle. Here's the structure of the table:
CREATE TABLE ACCEPTANCE
(
USER_ID char(6) PRIMARY KEY NOT NULL,
ACCEPT_DATE date
);
For records whose user id has less than 6 characters, I can select them without padding the user id when running queries using SQuirreL. I.E. the following returns a record if there's a record with a user id of "abc".
select * from acceptance where user_id = "abc"
Unfortunately, when doing the select via Hibernate (JPA), the following returns null:
em.find(Acceptance.class, "abc");
If I pad the value though, it returns the correct record:
em.find(Acceptance.class, "abc ");
The module that I'm working on gets the user id unpadded from other parts of the system. Is there a better way to get Hibernate working other than putting in code to adapt the user id to a certain length before giving it to Hibernate? (which could present maintenance issues down the road if the length ever changes)

That's God's way of telling you to never use CHAR() for primary key :-)
Seriously, however, since your user_id is mapped as String in your entity Hibernate's Oracle dialect translates that into varchar. Since Hibernate uses prepared statements for all its queries, that semantics carries over (unlike SQuirreL, where the value is specified as literal and thus is converted differently).
Based on Oracle type conversion rules column value is then promoted to varchar2 and compared as such; thus you get back no records.
If you can't change the underlying column type, your best option is probably to use HQL query and rtrim() function which is supported by Oracle dialect.

How come that your module gets an unpadded value from other parts of the system?
According to my understanding, if the other part of the system don't alter the PK, they should read 6 chars from the db and pass 6 chars all along the way -- that would be ok. The only exception would be when a PK is generated, in which case it may need to be padded.
You can circumvent the problem (by trimming or padding the value each time it's necessary), but it won't solve the problem upfront that your PK is not handled consistently. To solve the problem upfront you must eiher
always receive 6 chars from the other parts of the module
use varchar2 to deal with dynamic size correctly
If you can't solve the problem upfront, then you will indeed need to either
add trimming/padding all around the place when necessary
add trimming/padding in the DAO if you have one
add trimming/padding in the user type if this works (suggestion from N. Hughes)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.