I'm using hibernate and db is sqlserver.
SQL Server differentiates it's data types that support Unicode from the ones that just support ASCII. For example, the character data types that support Unicode are nchar, nvarchar, longnvarchar where as their ASCII counter parts are char, varchar and longvarchar respectively. By default, all Microsoft’s JDBC drivers send the strings in Unicode format to the SQL Server, irrespective of whether the datatype of the corresponding column defined in the SQL Server supports Unicode or not. In the case where the data types of the columns support Unicode, everything is smooth. But, in cases where the data types of the columns do not support Unicode, serious performance issues arise especially during data fetches. SQL Server tries to convert non-unicode datatypes in the table to unicode datatypes before doing the comparison. Moreover, if an index exists on the non-unicode column, it will be ignored. This would ultimately lead to a whole table scan during data fetch, thereby slowing down the search queries drastically.
The solution we used is ,we figured that there is a property called sendStringParametersAsUnicode which helps in getting rid of this unicode conversion. This property defaults to ‘true’ which makes the JDBC driver send every string in Unicode format to the database by default. We switched off this property.
My question is now we cannot send data in unicode conversion. in future if db column of varchar is changed to nvarchar (only one column not all varchar columns), now we should sent the string in unicode format.
Please suggest me how to handle the scenario.
Thanks.
You need to specify property: sendStringParametersAsUnicode=false in connection string url.
jdbc:sqlserver://localhost:1433;databaseName=mydb;sendStringParametersAsUnicode=false
Unicode is the native string representation for communication with SQL Server, if you are converting to MBCS (Multibyte character sets), then you are doing 2 converts for every string. I suggest that if you are concerned with performance, use all Unicode instead of all MBCS
ref: http://social.msdn.microsoft.com/Forums/en/sqldataaccess/thread/249c629f-b8f2-4a8a-91e8-aad0d83919ca
Related
Hi: We have a tool that is able to handle reports for unicode support. It works fine until we encounter this new report for Polish characters.
We are able to retrieve the data and display correctly, however, when we use the data as input to perform search, it seems not convert some of the character correctly and therefore, not able to retrieve data. Here is an sample.
Table polish has two columns: party, description. One of the value of party is "Bełchatów". I use jdbc to read that value from database and search with the following statement using SQL:
SELECT * from polish where party = N'Bełchatów'
However, this give me no result. This is with ojdbc6.jar. (JDK 8) However, this does give me result back with ojdbc7.jar.
What is the reason? And how can we fix when using ojdbc6.jar.
Thanks!
This is because the Oracle JDBC driver doesn't convert the string into unicode character. There is a database property, oracle.jdbc.defaultNChar=true.
http://docs.oracle.com/cd/B14117_01/java.101/b10979/global.htm
When this property is true, it will convert the string when it is mark with N'Belchatów' nchart literal into u'Be\0142chat\00f3w'.
The user can also set in data source level. Depends on your Persist API vendor, the way to set it can be different.
I am parsing rss news feeds from over 10 different languages.
All the parsing is being done in java and data is stored in MySQL before my API's written in php are responding to the clients.
I constantly come across garbage characters when I read the data.
What have I tried :
I have configured my MySQL to store utf-8 data. My db,table and even the column have UTF8 as their default charset.
While connecting my db,I set the character set results as utf-8
When I run the jar file manually to insert the data,the character's appear fine. But when I set a cronjob for the same jar file,I start facing the problem all over again.
In English,I particularly face problems like this and in other vernacular languages,the character appear to be totally garbish and I cant even recongnize a single character.
Is there anything that I am missing?
Sample garbage characters :
Gujarati :"રેલવે મà«àª¸àª¾àª«àª°à«€àª®àª¾àª‚ સામાન ચોરી થશે તો મળશે વળતર!"
Malyalam : "നേപàµà´ªà´¾à´³à´¿à´²àµ‡à´•àµà´•àµà´³àµà´³ കോളàµâ€ നിരകàµà´•àµ à´•àµà´±à´šàµà´šàµ"
English : Bank Board Bureau’s ambit to widen to financial sector PSUs
The Gujarati starts રેલવે, correct? And the Malyalam starts നേപ, correct? And the English should have included Bureau’s.
This is the classic case of
The bytes you have in the client are correctly encoded in utf8. (Bureau is encoded in the Ascii/latin1 subset of utf8; but ’ is not the ascii apostrophe.)
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the table was declared CHARACTER SET latin1. (Or possibly it was inherited from the table/database.) (It should have been utf8.)
The fix for the data is a "2-step ALTER".
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8 ...;
where the lengths are big enough and the other "..." have whatever else (NOT NULL, etc) was already on the column.
Unfortunately, if you have a lot of columns to work with, it will take a lot of ALTERs. You can (should) MODIFY all the necessary columns to VARBINARY for a single table in a pair of ALTERs.
The fix for the code is to establish utf8 as the connection; this depends on the api used in PHP. The ALTERs will change the column definition.
Edit
You have VARCHAR with the wrong CHARACTER SET. Hence, you see Mojibake like રેલ. Most conversion techniques try to preserve રેલ, but that is not what you need. Instead, taking a step to VARBINARY preserves the bits while ignoring the old definition of the bits representing latin1-encoded characters. The second step again preserves the bits, but now claiming they represent utf8 characters.
I am using mssql with j2ee spring framework.
When insert a data to a table, i am using bulk insert with xml argument in mssql.
Can you anyone say how much data we can pass using this.
I would like to know this range with xml argument.
T.Saravanan
On the SQL Server side, it is is 2GB
The stored representation of xml data type instances cannot exceed 2 gigabytes (GB) in size
"Stored" means after some processing for efficiency
SQL Server internally represents XML in an efficient binary representation that uses UTF-16 encoding. User-provided encoding is not preserved, but is considered during the parse process.
I have couple of fields in oracle which is NVARCHAR and I am using Java 1.5. If I read those values as string is that okay or is there a better approach for reading columns with NVARCHAR?
Assuming you have a ResultSet named rs then this is appropriate for NVARCHAR:
String myColumn = rs.getString("my_column");
See Globalization Support for JDBC Drivers:
Oracle JDBC drivers provide globalization support by allowing you to retrieve data from or insert data into columns of the SQL CHAR and NCHAR datatypes of an Oracle9i database. Because Java strings are encoded as UTF-16 (16-bit Unicode) for JDBC programs, the target character set on the client is always UTF-16. For data stored in the CHAR, VARCHAR2, LONG, and CLOB datatypes, JDBC transparently converts the data from the database character set to UTF-16. For Unicode data stored in the NCHAR, NVARCHAR2, and NCLOB datatypes, JDBC transparently converts the data from the national character set to UTF-16.
I'm working with third party user data that may or may not fit into our database. The data needs to be truncated if it is too long.
We are using IBatis with Connector/J. If the data is too long a SQL exception is thrown. I have had two choices: either truncate the strings in Java or truncate the strings in sql using substring.
I don't like truncating the strings in sql, because I am writing table structure in our Ibatis XML, but SQL on the other hand knows about our database collation (which isn't consistent and would be expensive to make consistent) and can truncate string in a multibyte safe manner.
Is there a way to have the Connector/J just straight insert this SQL and if not which route would people recommend?
According to the MySQL documentation it's possible that inserting data that exceeds the length could be treated as a warning:
Inserting a string into a string
column (CHAR, VARCHAR, TEXT, or BLOB)
that exceeds the column's maximum
length. The value is truncated to the
column's maximum length.
One of the Connector/J properties is jdbcCompliantTruncation. This is its description:
This sets whether Connector/J should
throw java.sql.DataTruncation
exceptions when data is truncated.
This is required by the JDBC
specification when connected to a
server that supports warnings (MySQL
4.1.0 and newer). This property has no effect if the server sql-mode includes
STRICT_TRANS_TABLES. Note that if
STRICT_TRANS_TABLES is not set, it
will be set as a result of using this
connection string option.
If I understand correctly then setting this property to false doesn't throw the exception but inserts the truncated data. This solution doesn't require you to truncate the data in program code or SQL statements, but delegates it to the database.