Set the charset for Firebird connection from Pentaho DI - java

I'm trying to set the connection Charset for my Firebird connection using Pentaho DI but still couldn't read the data in the right encoding.
I used many parameters like encoding , charSet,...etc but no luck.
Correct me what i have missed ?

You either need to use encoding with the Firebird name of the character set, or charSet with the Java name of the character set(*).
WIN1256 is not a valid Java character set name, so the connection will fail. If you specify charSet, then you need to use the Java name Cp1256 or - with Jaybird 2.2.1 or newer - windows-1256.
If this doesn't work then either Pentaho is not correctly passing connection properties, or your data is stored in a column with character set NONE in a different encoding than WIN1256 (or worse: stored in a column with character set WIN1256, but the data is actually a different encoding).
*: Technically you can combine encoding and charSet, but it is only for special use cases where you want Firebird to read data in one character set, and have Jaybird interpret it in another character set.

Related

Is mysql load file data charset handling environment dependent?

I called DATA INFILE from java.sql.Statement.executeUpdate(String sql) to load UTF-8 CSV file into table.
When I use
LOAD DATA INFILE '/var/lib/mysql-files/upload/utf8table.csv' INTO TABLE temp.utf8table CHARACTER SET utf8 FIELDS TERMINATED BY ';' LINES TERMINATED BY '\r\n' (#vC1, #vC2) set C1=#vC1, C2=nullif(#vC2,'');
, without specifying CHARACTER SET utf8, non ASCII characters were corrupted. But the same query imported all characters correctly when was executed in Mysql Workbench. Query with charset specified works well in both cases. What can be the difference in the execution environments that leaded to such behavior?
According to the docs:
The server uses the character set indicated by the character_set_database system variable to interpret the information in the file. SET NAMES and the setting of character_set_client do not affect interpretation of input. If the contents of the input file use a character set that differs from the default, it is usually preferable to specify the character set of the file by using the CHARACTER SET clause. A character set of binary specifies “no conversion.”
See also sysvar_character_set_client. The default is latin1 if not specified.

Getting UTF-8 character issue in Oracle and Java

Have one table in Oracle which contains column type is NVARCHAR2. In this column I am trying to save Russian characters. But it showing ¿¿¿¿¿¿¿¿¿¿¿ like this. When I try to fetch same characters from Java, I am getting same string.
I have try with NCHAR VARCHAR2. But in all cases same issue.
Is this problem from Java or Oracle? I have try same Java code, same characters using PostgreSQL which is working fine. I am not getting whether this problem is from Oracle or Java?
In my oracle database NLS_NCHAR_CHARACTERSET property value is AL16UTF16.
Any Idea how can I show UTF-8 characters as it is in Java which saved in Oracle.
Problem with characters is that you cannot trust your eyes. Maybe the database stores the correct character values but your viewing tool does not understand them. Or maybe the characters get converted somewhere along the way due to language settings.
To find out what character values are stored in your table use the dump function:
Example:
select dump(mycolumn),mycolumn from mytable;
This will give you the byte values of your data and you can check whether or not the values in your table are as they should be.
After doing some google I have resolved my issue. Here is ans: AL32UTF8 is the Oracle Database character set that is appropriate for XMLType data. It is equivalent to the IANA registered standard UTF-8 encoding, which supports all valid XML characters.
It means while creating database in Oracle, set AL32UTF8 character set.
Here is link for this
http://docs.oracle.com/cd/B19306_01/server.102/b14231/create.htm#i1008322
You need to specify useUnicode=true&characterEncoding=UTF-8 while getting the connection from the database. Also make sure the column supports UTF-8 encoding. Go through Oracle documentation.

How to specify the character set for a Sybase JDBC connection using jTDS?

I am using jTDS to connect to a Sybase database, and non-ASCII character data is broken. This happens both in my own app and in SQuirreLSQL.
Where can I specify the character set to be used for the connection? And can I find out what that character set should be somewhere in the data dictionary?
You can set the charset property
charset (default - the character set the server was installed with)
Very important setting, determines the byte value to character mapping for CHAR/VARCHAR/TEXT values. Applies for characters from the
extended set (codes 128-255). For NCHAR/NVARCHAR/NTEXT values doesn't
have any effect since these are stored using Unicode.
Simply append ;<property>=<value> to your JDBC URL.
See the FAQ

Parse from XML, insert to mysql; characters give java.sql.SQLException: Incorrect string value

I am parsing a bunch of XML files and inserting the value obtained from them into a MySQL database. The character set of the mysql tables is set to utf8. I'm connecting to the database using the following connection url - jdbc:mysql://localhost:3306/articles_data?useUnicode=false&characterEncoding=utf8
Most of the string values with unicode characters are entered fine (like Greek letters etc.), except for some that have a math symbol. An example in particular - when I try to insert a string with mathematical script capital g (img at www.ncbi.nlm.nih.gov/corehtml/pmc/pmcents/1D4A2.gif) ( http://graphemica.com/𝒢 ) (Trying to parse and insert this article), I get the following exception -
java.sql.SQLException: Incorrect string value: '\xF0\x9D\x92\xA2 i...' for column 'text' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3515)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3447)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1951)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2101)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2554)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1761)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2046)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1964)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1949)
If I change my connection URL to - jdbc:mysql://localhost:3306/articles_data, then the insert works, but all regular UTF8 characters are replaced with a question mark.
There are two possible ways I'm trying to fix it, and haven't succeeded at either yet -
When parsing the article, maintain the encoding. I'm using org.apache.xerces.parsers.DOMParser to parse the xml files, but can't figure out how to prevent it from decoding (relevant XML - <p>𝒢 is a set containing...</p>). I could re-encode it, but that just seems inefficient.
Insert the math symbols into the database.
MySQL up to version 5.1 seems to only support unicode characters in the basic multilingual plane, which when encoded as utf-8 take no more than 3 bytes. From the manual on unicode support in version 5.1:
MySQL 5.1 supports two character sets for storing Unicode data:
ucs2, the UCS-2 encoding of the Unicode character set using 16 bits per character
utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character
In version 5.5 some new character sets where added:
...
utf8mb4, a UTF-8 encoding of the Unicode character set using one to four bytes per character
ucs2 and utf8 support BMP characters. utf8mb4, utf16, and utf32 support BMP and supplementary characters.
So if you are on mysql 5.1 you would first have to upgrade. In later versions you have to change the charset to utf8mb4 to work with these supplementary characters.
It seems the jdbc connector also requires some further configuration (From Connector/J Notes and Tips):
To use 4-byte UTF8 with Connector/J configure the MySQL server with character_set_server=utf8mb4. Connector/J will then use that setting as long as characterEncoding has not been set in the connection string. This is equivalent to autodetection of the character set.

Extra spaces in the end while retrieving CHAR values

I am using ibatis and oracle 10. The jdbc driver is oracle.jdbc.driver.OracleDriver. When I retrieve data from table, I found two spaces ' ' are appended. Let's say column ACTIVE_IND CHAR(1), the data retrieved is 'A '.
Please note that this is happening for all the CHAR fields. And no of extra spaces is always two times the length of CHAR. For example, if there is a column of CHAR(14), no of extra spaces in the end are 28.
This is happening in the System Testing environment only. In our local desktops, using the same ojdbc14.jar and same code, we did not get any extra spaces.
I think the only thing different in System Testing environment is database. Is it related to some character encoding? Do we have some configuration in database to change it?
It sounds very like a character encoding issue. Have you checked
the configuration of the Oracle db in each case
what character encoding your app is running under for each environment (you can configure this using -Dfile.encoding=UTF8 or similar - I would strongly recommend this)
It cannot be completely determined from the information you gave what exactly went wrong. As far as the encoding for a column is concerned it is affected by the COLLATION SETTING. You can check the following link for more information: http://www.lessanvaezi.com/collation-in-oracle-using-nls_sort/

Categories