How to change JDBC Encoding to support French accent characters - java

im having a problem with fetching data from Mysql 5.1 using JDBC(mysql-connector-java-5.1.18-bin.jar).
my insert into statement goes like this:
INSERT INTO Categories VALUES ('01','Matières premières');
and the output from Netbeans terminal and Swing Interfaces goes like this :
MatiŠres premiŠres
i think that I need to specify the encoding parameters.
can you please help.
p.s: the OS is Windows 7 French.
this is my url to the database in class connection :
DriverManager.getConnection("jdbc:mysql://X.X.X.X:XXXX/XXXX?useUnicode=yes&characterEncoding=UTF-8","XXXX","XXXX");

Two solutions :
the sanest one : set your tables to use as collation "utf8-general-ci" and set mysql to use the same collation for connections. So you won't have any problem and you won't need to specify anything when connecting using jdbc.
another one, barely acceptable now but possible : determine what's the collation of your tables and use it to configure your connection in JDBC as described here : http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html . But you must know what's the encoding of your tables (as you don't know it and if you're in France it may be Latin1).
To determine or change the encoding of your tables you may use any mysql admin tool. Be sure to be coherent for the tables and connection.

you have to set encoding to ISO-8859-1 with System.setProperty("file.encoding", "ISO-8859-1")

Related

Issue in saving Special charcters in Oracle Database in BLOB datatype

While storing a report generated in CSV file from Java code using hibernate into the column which holds "BLOB" data-type in Oracle 11g makes all the special characters converted into Question Mark(?).The character encoding for the Oracle DB is WE8ISO8859P1, the character set encoding for the database should be UTF-8 or AL32UTF-8 as far as I know to support all the UTF-8 characters. But since this is an existing application running in production I cannot modify the character set to UTF-8/AL32UTF-8 due to risk factor involved. I have been looking for some solution we may apply from the Java code during save option so that the character set is enforced.
I have tried below code
testReport.setTestReportBlob(new SerialBlob(byteArrayToStore));
Using Hibernate to store the save the reportBlob object,
getCurrentSession().saveOrUpdate(testReport);
While we extract the xl from blob data type , it shows special charcters as "?".
below is the piece of code where we are enforcing the character set before setting the byte array to be stored in database
fileProcess.setFiledata(exchange.getIn().getBody(String.clas‌​s).getBytes());
would setting following properties useUnicode=true&characterEncoding=UTF-8 while getting the connection from the oracle database will that help?Any help or suggestion would be much appreciated. Thanks a lot in advance
PS: I am using toad for oracle as db client.

Informix, MySQL and Oracle blob contains

We have an application that runs with any of IBM Informix, MySQL and Oracle, and we are using Java with Hibernate to connect to the database. We will store XML, CSV and other text-based files inside the database (clob column). The entities in Java are byte[] objects.
One feature request to the application is now to "grep" content inside the data. So I need to find all files with a specific content.
On regular char/varchar fields I can use like '%xyz%', but this is not working on byte[] / blobs.
The first approach was to load each entity, cast the byte[] into a string and use the contains method in Java. If the use enters any filter parameters on other (non-clob) columns, I will apply those filters before testing the clob in order to reduce the number of blobs I have to scan.
That worked quite well for 100 files (clobs) and as long as the application and database are on the same server. But I think it will get really slow if I have 1.000.000 files inside the database and the database is not always in the same network. So I think that is not a good idea.
My next thought was creating a database procedure. But I am not quite sure if this is possible for Informix, MySQL and Oracle. And I am not sure if this is possible.
The last but not favored method is to store the content of the data not inside a clob. Maybe I can use a different datatype for that?
Does anyone has a good idea how to realize that? I need a solution for all three DBMS. The application knows on what kind of DBMS it is connected to. So it would be okay, if I have three different solutions (one for each DBMS).
I am completely open to changing what kind of datatype I use (BLOB, CLOB ...) — I can modify that as I want.
Note: the clobs will range from about 5 KiB to about 500 KiB, with a maximum of 1 MiB.
Look into Apache Lucene or other text indexing library.
https://en.wikipedia.org/wiki/Lucene
http://en.wikipedia.org/wiki/Full_text_search
If you go with a DB specific solution like Oracle Text Search you will have to implement a custom solution for each database. I know from experience that Oracle Text search takes significant time to learn and involves a lot of tweaking to get just right.
Also, if you use a DB solution you would receive different results in each DB even if the data sets were the same (each DB would have it's own methods of indexing and retrieving the data).
By going with a 3rd party solution like Lucene -- you only have to learn one solution and results will be consistent regardless of the Db.

Where is "latin1_german1_ci" collation coming from?

I'm receiving the following error message from a Java/Spring/Hibernate application when it tries to execute a prepared statement against a mysql database :
Caused by: java.sql.SQLException: Illegal mix of collations (latin1_swedish_ci,COERCIBLE) and (latin1_german1_ci,COERCIBLE) for operation '='
The select statement which generates this (as shown in the tomcat log) is:
SELECT s.* FROM score_items s where
s.s_score_id_l=299 and
(s.p_is_plu_b = 'F') and
isTestProduct(s.p_upc_st) = 'N' and
v_is_complete_b='T'
order by s.nc_name_st, s.p_upc_st
The table collation per the show table status command is:
utf8_general_ci
The collation for all the char, varchar and text fields is "utf8_general_ci". It's null for the bigint, int and datetime fields.
The database collation is latin1_swedish_ci as displayed by the command:
show variables like "collation_database";
Edit: I was able to successfully run this from my local machine using Eclipse/STS and a Tomcat 6 instance. The local process is reading the from the same database as the process on the production server which generated the error. The server where the error occurs is a Tomcat 7. instance is an Amazon Linux server.
Edit 2: I was also able to successfully run the report when I ran it from our QA environment, with the JDBC statement in server.xml reset to point at the production database. QA is essentially a mirror of the production environment, with some dev work going on. I should also note that I saw a similar error last month, but it disappeared when I reran the report. Finally, I'm not sure why it would make a difference, but the table being queried is huge, with over 7 million rows and probably 100 fields per row.
Edit 3: Based on Shadow's comments, I discovered the character set "latin1" was being specified on the test function. I've changed that to utf8 and hoping this solves the issue.
How do I found out which field is "latin1_german1_ci"?
Why is the comparison using "latin1_swedish_ci" when the table and fields are either "utf8_general_ci or null?
Could the problem be related to function character set, and if so how do I identify which character set/collation it's using?
How do I narrow down which field/function is causing the problem?
This has got nothing to do with java or hibernate, this is purely down to mysql and perhaps to the connection string.
In mysql you can define character set and collation at multiple levels, which can cause a lot of issues:
server
database
table
column
connection
See mysql documentation on character sets and collations for more details.
To sum up: the higher level defaults kick in if and only if at lower level you do not specify charater set or collation. So, a column level definition overrides a table level definition. show table status command show the table level defaults, but these may have been overridden on column level. show full columns or show create table commands will show you the true character sets and collations used by any given field.
Connection level character set / collation definitions further complicate the picture because string constants used in the sql statements will use the connection character set / collation, unless they have an explicit declaration.
However, mysql uses coercibility values to avoid most issues arising from the use of various character sets and expressions as described in mysql documentation on character sets / collations used in expressions.
From you mentioning that the query works when executed from another computer indicates that the issue is with the connection character set / collation. I think it will be around the isTestProduct() call.
The only way to really determine which condition causes the isdue is to eliminate the conditions one by one and when the error is gone, then the last eliminated condition was the culprit. But defining appropriate connection character set and collation that is in line with what is used in the fields will also help.

Message Invalid column type: getBLOB not implemented for class oracle.jdbc.driver.T4CLongRawAccessor

I have a problem when trying to read blob from Oracle DB
by using this
rs.getBlob("ARCHIVE_REQ_FILE_BLOB")
I also tried this
oracle.sql.BLOB blob= (oracle.sql.BLOB) ((OracleResultSet) rs).getBlob("ARCHIVE_REQ_FILE_BLOB");
The following error appears
SQL Message Invalid column type: getBLOB not implemented for class oracle.jdbc.driver.T4CLongRawAccessor
I use IBM WebSphere application server 8.5.5, open connection using WebSphere datasource using oracle oracle 11.2.0.2
You are not trying to read a BLOB value. You actually have a LONG RAW value in the database and you are trying to read that as if it were a BLOB.
I would recommend that you read the Oracle documentation for reading data from LONG and LONG RAW values in JDBC. Oracle even provides example code to help you out.
If your column is really a BLOB, then you need to make sure that in your Java code that you are not defining the column as a LONG_RAW (search for calls to defineColumnType) as this will make the server send the data as a LONG_RAW instead of a BLOB.
in some cases solvable on an SQL level if the content is not too big for your use case:
select dbms_lob.substr( some_blob, 4000 ) as some_blob
from some_tab
depending on your Oracle db you can choose a higher value for 4000, but for older versions this should work almost everywhere.
sometimes 3500 is safer because of unicode conversion of some chars to multiple 8-bit characters.
(the above shortens the blob content to 4000 characters, if necessary and converts the blob to some more suitable datatype)

Prevent jdbc from padding strings on sql insert and update

We are using the jdbc-odbc bridge to connect to an MS SQL database. When perform inserts or updates, strings are put into the database padded to the length of the database field. Is there any way to turn off this behavior (strings should go into the table without padding)?
For reference, we are able to insert field values that don't contain the padding using the SQL management tools and query analyzer, so I'm pretty sure this is occuring at the jdbc or odbc layer of things.
EDIT: The fields in the database are listed as nvarchar(X), where X = 50, 255, whatever
EDIT 2: The call to do the insert is using a prepared statement, just like:
PreparedStatement stmt = new con.prepareStatement("INSERT INTO....");
stmt.setString(1, "somevalue");
How are you setting the String? Are you doing?:
PreparedStatement stmt = new con.prepareStatement("INSERT INTO....");
stmt.setString(1, "somevalue");
If so, try this:
stmt.setObject(1, "somevalue", Types.VARCHAR);
Again, this is just guessing without seeing how you are inserting.
Are you using CHAR fields in the database or VARCHAR?
CHAR pads the size of the field. VARCHAR does not.
I don't think JDBC would be causing this.
If you can make your insert to work with regular SQL tools ( like ... I don't know Toad for MS SQL Sever or something ) then changing the driver should do.
Use Microsoft SQL Server JDBC type IV driver.
Give this link a try
http://www.microsoft.com/downloads/details.aspx?familyid=F914793A-6FB4-475F-9537-B8FCB776BEFD&displaylang=en
Unfortunately these kinds of download comes with a lot of garbage. There's an install tool and another hundreds of file. Just look for something like:
intalldir\lib\someSingle.jar
Copy to somewhere else and uninstall/delete the rest.
I did this a couple of months ago, unfortunately I don't remeber exactly where it was.
EDIT
Ok, I got it.
Click on the download and at the end of the page click on "I agree and want to download the UNIX version"
This is a regular compressed file ( use win rar or other ) and there look for that sigle jar.
That should work.
If you are using the bundled Sun JDBC-ODBC Bridge driver, you may want to consider migrating to a proper MS SQL JDBC driver. Sun does not recommend that the bridge driver be used in a production environment.
The JDBC-ODBC Bridge driver is recommended only for experimental use or when no other alternative is available.
Moving to a more targeted driver may fix your problem all together, or at least it will provide a production ready solution when you do fix the bug.

Categories