Extra spaces in the end while retrieving CHAR values

Extra spaces in the end while retrieving CHAR values - java

I am using ibatis and oracle 10. The jdbc driver is oracle.jdbc.driver.OracleDriver. When I retrieve data from table, I found two spaces ' ' are appended. Let's say column ACTIVE_IND CHAR(1), the data retrieved is 'A '.
Please note that this is happening for all the CHAR fields. And no of extra spaces is always two times the length of CHAR. For example, if there is a column of CHAR(14), no of extra spaces in the end are 28.
This is happening in the System Testing environment only. In our local desktops, using the same ojdbc14.jar and same code, we did not get any extra spaces.
I think the only thing different in System Testing environment is database. Is it related to some character encoding? Do we have some configuration in database to change it?

It sounds very like a character encoding issue. Have you checked
the configuration of the Oracle db in each case
what character encoding your app is running under for each environment (you can configure this using -Dfile.encoding=UTF8 or similar - I would strongly recommend this)

It cannot be completely determined from the information you gave what exactly went wrong. As far as the encoding for a column is concerned it is affected by the COLLATION SETTING. You can check the following link for more information: http://www.lessanvaezi.com/collation-in-oracle-using-nls_sort/

Related

Storing Unicode and special characters in MySQL tables

My current requirement is to store Unicode and other special characters, such as double quotes in MySQL tables. For that purpose, as many have suggested, we should use Apache's StringEscapeUtils.escapeJava() method. The problem is, although this method does replace special characters with their respective unicodes (\uxxxx), the MySQL table stores them as uxxxx and not \uxxxx. Due to this, when I try to decode it while fetching from the database, StringEscapeUtils.unescapeJava() fails (since it cannot find the '\').
Here are my questions:
Why is it happening (that is, '\' are skipped by the table).
What is the solution for this?

Don't use Unicode "codepoints" (\uxxxx), use UTF8.
Dont' use any special functions. Instead announce that everything is UTF-8 (utf8mb4 in MySQL).
See Best Practice
(If you are being provided \uxxxx, then you are stuck with converting to utf8 first. If your real question is on how to convert, then ask it that way.)
`

Skip rows with non-BMP characters (emojis) when using LOAD DATA INFILE with REPLACE option?

Emoji characters are messing up a loading system we built and I'm looking for a simple short-term solution.
Its a Java loading program that uses JDBC to execute MySQL commands with this structure:
LOAD DATA
LOCAL INFILE `filepath`
REPLACE INTO TABLE `SOME_TABLE`
CHARACTER SET utf8
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\'' ESCAPED BY ''
LINES TERMINATED BY '\n'
(`col1`,...,`coln`)
SOME_TABLE has ENGINE=InnoDB DEFAULT CHARSET=utf8.
We are running MySQL 5.6.22.
Its been working great for years, but recently the files that we load started having occasional non-BMP characters (that happen to be emojis) and the LOAD DATA LOCAL INFILE ... command throws exceptions like:
java.sql.SQLException: Incorrect string value: '\xF0\x9D\x93\x9C' for column 'fieldm' at row 3004
I understand that the long-term solution is we need to move the table to CHARSET=utf8mb4. However, the tables are huge at this point and conversion will not be easy. There are also VARCHAR(255) fields indexed, and these need to be converted to VARCHAR(191) [to fit under max key length 767], or we need to go to DYNAMIC row format and set innodb_large_prefix=true.
We are looking for a short-term solution until we get to a point where we have time and resources to migrate to utfmb4.
It would be OK, in the short term, to simply discard the rows with non-BMP (emoji) characters. But, LOAD DATA LOCAL INFILE filepath REPLACE ... will not skip the bad rows, it fails the entire file.
At this point, it looks like we will need to write some filtering in Java to remove the non-BMP (emoji) rows before calling LOAD DATA LOCAL INFILE filepath REPLACE .... But, I am thinking that there must be some way to do this in MySQL without having to introduce that kind of pre-filter.
Does anybody have any ideas for a simple way to get MySQL to simply skip the rows that have non-BMP (emoji) data?
***** UPDATE *****
It looks like using CONVERT might be the solution for short term. Doing this replaces the Emoji with '????' in col4.
LOAD DATA
LOCAL INFILE `filepath`
REPLACE INTO TABLE `SOME_TABLE`
CHARACTER SET utf8
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\'' ESCAPED BY ''
LINES TERMINATED BY '\n'
(`col1`,`col2`,`col3`,#q, ..., `coln`)
SET `col4` = CONVERT(CONVERT(#q USING utf8mb4) USING utf8);
Does anybody see a problem with that?

In order to store Emoji, you must use utf8mb4, not utf8 throughout.
A shortcut (perhaps) for the 191 index issue is to upgrade to 5.7. There, you can keep 255 and have indexes.
Only certain columns will be holding Emoji, correct? Convert just those columns. (It is OK for different columns in the same table to have different charset and/or collation.)

How does hyphen, en-dash and em-dash gets stored in oracle database having character set WEISO98859P1?

I have a java web application which uses oracle database. Character set of my database is WEISO98859P1. The problem which i am facing is, The characters like en-dash and em-dash gets stored inverted ? or some other weird symbol. Same goes while retrieving and displaying the data.
What can be the workaround for this problem?

You cannot store “en-dash” (–) and “em-dash” (—) in an Oracle database with character set WE8ISO8859P1, since these characters are not defined in that character set.
The best thing would be to convert the database to character set WE8MSWIN1252. You can do that without changing any of the data in the database, since WE8MSWIN1252 is a superset of WE8ISO8859P1. WE8MSWIN1252 contains “en-dash” (code point 96) and “em-dash” (code point 97).

Newline escape sequence not unescaping in proper way in Java

I am fetching a String from SQL server 2008 database into my Java code and trying to print it. Unfortunately the newline escape sequence is not automatically converted into newline.
I know the reason is we are not putting the string inside the double quotes in the Database table. Below is the sample value stored in the varchar column :
Remarks \nTestRemarks Issue\nTestIssue\n\nRegards \nSunny
When I am printing it on log file it is printing along with \n. My application convention doesn't allow me to store String within double quotes inside Database varchar column, therefore I chose to explicitly unescape it using Apache StringEscapeUtils.unescapeJava(str). Unfortunately, the result is that 1st and last newline escape sequence is successfully converted to newlines, but rest all newline escapes remain unchanged. If I put space before the newline escape sequence in the DB, then it gets recognized and converted,but not otherwise. Can you please help how I solve this situation.

How about doing the opposite once you retrive it, ie StringEscapeUtils.escapeJava(str) or repeat StringEscapeUtils.unescapeJava(str) after you retrieve it from the database. Either one might work.

my setup is working in wierd manner. for some reason after system restart and eclipse restart and tomcat restart, everything seems to work seamlessly. closing the answer as non-issue

Getting UTF-8 character issue in Oracle and Java

Have one table in Oracle which contains column type is NVARCHAR2. In this column I am trying to save Russian characters. But it showing ¿¿¿¿¿¿¿¿¿¿¿ like this. When I try to fetch same characters from Java, I am getting same string.
I have try with NCHAR VARCHAR2. But in all cases same issue.
Is this problem from Java or Oracle? I have try same Java code, same characters using PostgreSQL which is working fine. I am not getting whether this problem is from Oracle or Java?
In my oracle database NLS_NCHAR_CHARACTERSET property value is AL16UTF16.
Any Idea how can I show UTF-8 characters as it is in Java which saved in Oracle.

Problem with characters is that you cannot trust your eyes. Maybe the database stores the correct character values but your viewing tool does not understand them. Or maybe the characters get converted somewhere along the way due to language settings.
To find out what character values are stored in your table use the dump function:
Example:
select dump(mycolumn),mycolumn from mytable;
This will give you the byte values of your data and you can check whether or not the values in your table are as they should be.

After doing some google I have resolved my issue. Here is ans: AL32UTF8 is the Oracle Database character set that is appropriate for XMLType data. It is equivalent to the IANA registered standard UTF-8 encoding, which supports all valid XML characters.
It means while creating database in Oracle, set AL32UTF8 character set.
Here is link for this
http://docs.oracle.com/cd/B19306_01/server.102/b14231/create.htm#i1008322

You need to specify useUnicode=true&characterEncoding=UTF-8 while getting the connection from the database. Also make sure the column supports UTF-8 encoding. Go through Oracle documentation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.