Java / Sql-server parameter binding does not work as expected - java

We notice a strange behaviour in our application concerning bind parameters. We use Java with JDBC to connect to a Sql Server database. In a table cell we have the value 'µ', and we compare it with a bind parameter, which is also set to the value 'µ'.
Now, in a sql statement like "... where value != ?", where 'value' is the value of 'µ' in the database and ? the bind variable, which is also set to 'µ', we notice that we get a record, though we would expect that 'µ' equals 'µ'.
The method that we use to fill the bind parameter is java.sql.PreparedStatement.setString(int, String).
Some facts:
The character value of µ in different encodings is:
ASCII (ISO-8859-1) : 0xB5
UTF-8 : 0xC2B5
UTF-16 (= Java) : 0x00B5
Now I did some investigations to see which bytes the database actually sees. Therefor I tried a sql-statement like this:
select convert(VARBINARY(MAX), value), -- selects µ from database table
convert(VARBINARY(MAX), N'µ'), -- selects µ from literal
convert(VARBINARY(MAX), ?) -- selects µ from bind parameter
from ...
The result for the three values is:
B500
B500
C200B500 <-- Here is the problem!
So, the internal representation of µ in the database and as NVARCHAR literal is B500.
Now we can't understand what is going on here. We have the value of 'µ' in a Java variable (which should internally be 0x00B5). When it is passed as bind variable, then is seems as if it is converted to UTF-8 (which makes byte sequence 0xC2B5), and then the database treats it as if it were two characters, making the sequence of characters C200B500 from it.
To make things even more confusing:
(1) On an other machine with a different database the same code works like expected. The result of the three lines is B500/B500/B500, so the bind variable is converted to be a proper B500.
(2) On the same machine, the same database but a different program (but using the same jdbc driver library and the same connect parameters) this also works as expected, giving the result of B500/B500/B500.
Some additional facts, maybe they are important:
The database is Sql Server 2014
Java is Java 7
The application in question is a webapp running in Tomcat 7.
Jdbc library is sqljdbc 4.2
Any help to sort this out is greatly appreciated!

I now found the solution. It did not at all have something to do with Sql Server or binding, but instead...
Tomcat 7 is not running in UTF-8 mode by default (I wasn't aware of that). The µ we are talking about comes from an other application that is providing this value via webservice calls. However, this application is using UTF-8 as default. So, it was sending an UTF-8 µ, but the webservice did not expect UTF-8 and thought that it would be two characters, and treated them like this, filling the internal String variable with the character for 0xC2 and 0xB5 (which is, for Sql Server, C200B500).

Related

oracle jdbc unicode for polish character is not working properly

Hi: We have a tool that is able to handle reports for unicode support. It works fine until we encounter this new report for Polish characters.
We are able to retrieve the data and display correctly, however, when we use the data as input to perform search, it seems not convert some of the character correctly and therefore, not able to retrieve data. Here is an sample.
Table polish has two columns: party, description. One of the value of party is "Bełchatów". I use jdbc to read that value from database and search with the following statement using SQL:
SELECT * from polish where party = N'Bełchatów'
However, this give me no result. This is with ojdbc6.jar. (JDK 8) However, this does give me result back with ojdbc7.jar.
What is the reason? And how can we fix when using ojdbc6.jar.
Thanks!
This is because the Oracle JDBC driver doesn't convert the string into unicode character. There is a database property, oracle.jdbc.defaultNChar=true.
http://docs.oracle.com/cd/B14117_01/java.101/b10979/global.htm
When this property is true, it will convert the string when it is mark with N'Belchatów' nchart literal into u'Be\0142chat\00f3w'.
The user can also set in data source level. Depends on your Persist API vendor, the way to set it can be different.

Getting question marks when inserting Hebrew characters into a MySQL table

I'm using Netbeans building a web application using Java, JSP that handle a database with Hebrew fields.
The DDL is as follows:
String cityTable = "CREATE TABLE IF NOT EXISTS hebrew_test.table ("
+"id int(11) NOT NULL AUTO_INCREMENT,"
+"en varchar(30) NOT NULL,"
+"he varchar(30) COLLATE utf8_bin NOT NULL,"
+"PRIMARY KEY (id)"
+") ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1;";
String insert = "INSERT INTO hebrew_test.table (en, he) VALUES ('A','a')";
String insert2 = "INSERT INTO hebrew_test.table (en, he) VALUES ('B','ב')";
String insert3 = "INSERT INTO hebrew_test.table (en, he) VALUES ('C','אבג')";
executeSQLCommand(cityTable);
executeSQLCommand(insert);
executeSQLCommand(insert2);
executeSQLCommand(insert3);
The output tabel I get:
1 A a
2 B ?
3 C ???
Instead of:
1 A a
2 B ב
3 C אבג
I tried Hebrew appears as question marks in Netbeans, but that isn't the same problem. I get the question marks in the table.
Also I defined the table to be in UTF8_bin as you can see in the above code.
You need to tell the JDBC driver to use UTF-8 encoding while decoding the characters representing the SQL query to bytes. You can do that by adding useUnicode=yes and characterEncoding=UTF-8 query parameters to the JDBC connection URL.
jdbc:mysql://localhost:3306/db_name?useUnicode=yes&characterEncoding=UTF-8
It will otherwise use the operating system platform default charset. The MySQL JDBC driver is itself well aware about the encoding used in both the client side (where the JDBC code runs) and the server side (where the DB table is). Any character which is not covered by the charset used by the DB table will be replaced by a question mark.
See also:
Spring Encoding with CharacterEncodingFilter in web.xml
You're including your values directly into the SQL. That's always a bad idea. Use a PreparedStatement, parameterized SQL, and set the values as parameters. It may not fix the problem - but it's definitely the first thing to attempt, as you should be using parameterized SQL anyway. (Parameterized SQL avoids SQL injection attacks, separates code from data, and avoids unnecessary conversions.)
Next, you should work out exactly where the problem is really occurring:
Make sure that the value you're trying to insert is correct.
Check that the value you retrieve is correct.
Check what's in your web response using Wireshark - check the declared encoding and what's in the actual data
When checking the values, you should iterate over each character in the string and print out the value as a UTF-16 code unit (either use toCharArray() or use charAt() in a loop). Just printing the value to the console leaves too much chance of other problems.
EDIT: For a little context of why I wrote this as an answer:
In my experience, including string values as parameters rather than directly into SQL can sometimes avoid such issues (and is of course better for security reasons etc).
In my experience, diagnosing whether the problem is at the database side or the web side is also important. This diagnosis is best done via logging the exact UTF-16 code units being used, not just strings (as otherwise further encoding issues during logging or console output can occur).
In my experience, problems like this can easily occur at either insert or read code paths.
All of this is important as a way of moving the OP forward, not just in a comment-like request for more information.

Escape sequence when adding multiple records to DerbyDB

I'm converting (or trying to) an Ms AccessDB into derby.
When I extract the data from certain varchar / text / memo field from access they are filled with apostrophe, and mathematical symbols (percent, less than etc), and possible foreign characters
I need to keep these and I test for them so as I can use an 'escape sequence' to ensure they get put into the database.
However for now I am unable to get the data into the DB without it failing on these fields. When the SQL fails I output the SQL string, and cut and past it into ij. Then I modify just the first record, and it is always these characters that cause me grief.
I've tried to modify the strings by surrounding with "double quote marks" but that just gives a different error (stating that it has 'enounterd """ at line1 column x' which is always the first occurance of the double quote).
I haven't found a setting in derby to alter the behaviour for strings, yet. Is there one?
I have also tried to set the SQL statment to a preparedStatement then use the {call preparedStatement} again this fails also. I can't use the {escape "escape char} in a normal statment as derby just says incorrect syntax at me.
How do others manage to get user content with strange characters into a field in derby?
Do I need to change my field into a CLOB or something other than varchar / long Varchar?
Are my problems being caused by using the wrong characteset (eg iso rather UTF-8), how do I tell what it is, how to change it?
Below is a sample of the SQL insert that fails when I send it to derby (via my JAVA 'programme')
insert into S1.SORTIEDESSAI (OBS, DATEDUSORTIE, CONTREINDIC, FIN,
PDEVU, REFUS, INVDECISN, ADMIN, MOTIF_DE_LA_SORTIE, NOMVALIDEE,
DATEVALIDEE) values ('"0001/0001"' , '2007-07-15' , false , true ,
'"null"' , '"null"' , '"null"' , '"null"' , '"2. FIN DE L’ESSAI"' ,
'"DR SIMON"' , '2011-04-19' )
Note:
Actually I look at the above and notice that the order of columns names isn't good? It was OK yesterday, not sure why it would have changed? something to do with Access returning the column names in a random order from the resultSetMetaData, which would be a surprise.
for now I recomend any further answers to hold off whilst I sort this problem out, OK solved that problem, do I need to set another question about this behaviour....
Back to the main thread...
Ok as you can see on my SQL statement I have wrapped any varchar fields in double quotes. This always fails (even directly through ij). help help help...
I'm not quite sure what your question is, but in general you can input these characters by using a PreparedStatement of the form: INSERT INTO tablename (columnname) values (?), and then using the PreparedStatement.setString() method to supply your character data for that column.

performance is slow with hibernate and MS sql server

I'm using hibernate and db is sqlserver.
SQL Server differentiates it's data types that support Unicode from the ones that just support ASCII. For example, the character data types that support Unicode are nchar, nvarchar, longnvarchar where as their ASCII counter parts are char, varchar and longvarchar respectively. By default, all Microsoft’s JDBC drivers send the strings in Unicode format to the SQL Server, irrespective of whether the datatype of the corresponding column defined in the SQL Server supports Unicode or not. In the case where the data types of the columns support Unicode, everything is smooth. But, in cases where the data types of the columns do not support Unicode, serious performance issues arise especially during data fetches. SQL Server tries to convert non-unicode datatypes in the table to unicode datatypes before doing the comparison. Moreover, if an index exists on the non-unicode column, it will be ignored. This would ultimately lead to a whole table scan during data fetch, thereby slowing down the search queries drastically.
The solution we used is ,we figured that there is a property called sendStringParametersAsUnicode which helps in getting rid of this unicode conversion. This property defaults to ‘true’ which makes the JDBC driver send every string in Unicode format to the database by default. We switched off this property.
My question is now we cannot send data in unicode conversion. in future if db column of varchar is changed to nvarchar (only one column not all varchar columns), now we should sent the string in unicode format.
Please suggest me how to handle the scenario.
Thanks.
You need to specify property: sendStringParametersAsUnicode=false in connection string url.
jdbc:sqlserver://localhost:1433;databaseName=mydb;sendStringParametersAsUnicode=false
Unicode is the native string representation for communication with SQL Server, if you are converting to MBCS (Multibyte character sets), then you are doing 2 converts for every string. I suggest that if you are concerned with performance, use all Unicode instead of all MBCS
ref: http://social.msdn.microsoft.com/Forums/en/sqldataaccess/thread/249c629f-b8f2-4a8a-91e8-aad0d83919ca

JDBC, MySQL: getting bits into a BIT(M!=1) column

I'm new to using JDBC + MySQL.
I have several 1/0 values which I want to stick into a database with a PreparedStatement. The destination column is a BIT(M!=1). I'm unclear on which of the setXXX methods to use. I can find the references for what data comes out as easily enough, but how it goes in is eluding me.
The values effectively live as an ordered collection of booleans in the objects used by the application. Also, I'll occasionally be importing data from flat text files with 1/0 characters.
To set a BIT(M) column in MySQL
For M==1
setBoolean(int parameterIndex, boolean x)
From the javadoc
Sets the designated parameter to the
given Java boolean value. The driver
converts this to an SQL BIT value when
it sends it to the database.
For M>1
The support for BIT(M) where M!=1 is problematic with JDBC as BIT(M) is only required with "full" SQL-92 and only few DBs support that.
Check here Mapping SQL and Java Types: 8.3.3 BIT
The following works for me with MySQL (at least with MySQL 5.0.45, Java 1.6 and MySQL Connector/J 5.0.8)
...
PreparedStatement insert = con.prepareStatement(
"INSERT INTO bittable (bitcolumn) values (b?)"
);
insert.setString(1,"111000");
...
This uses the special b'110101010' syntax of MySQL to set the value for BIT columns.
You can use get/setObject with a byte array (byte[]). 8 bits are packed into each byte with the least significant bit being in the last array element.

Categories