I got problems while reading arabic characters from oracle in java using JDBC driver, the main problem was i couldn't find the proper character encoding to get the correct data , but i solved the problem manually using this method:
public static String cleanORCLString(String s) throws UnsupportedEncodingException {
byte[] bytes = s.getBytes("UTF16");
String x = new String(bytes, "Cp1256");
String finalS = x.substring(3);
StringBuilder sb = new StringBuilder(finalS);
for(int k = sb.length() - 1 ; k > 0 ; k--) {
if(!isEven(k)) {
sb.deleteCharAt(k);
}
}
return sb.toString();
}
this method give me the correct characters like its shown in database, but when I try to update/insert arabic data, it save wrong characters.
For example: my text saved in database as "?????????" instead of "مرحبا".
This is the way I connect to oracle database.
URL = ORCLConnProperties.ORCL_THIN_PREFIX + orclProp.getIpAddress()
+ orclProp.getPortNumber() + ORCLConnProperties.ORCL_THIN_SUFIX;
// URL = jdbc:oracle:thin:#10.0.0.12:1521:ORCL
System.out.println("URL: " + URL);
Properties connectionProps = new Properties();
connectionProps.put("characterEncoding", "Cp1256");
connectionProps.put("useUnicode", "true");
connectionProps.put("user", orclProp.getUserName());
connectionProps.put("password", orclProp.getPassword());
try {
Class.forName("oracle.jdbc.driver.OracleDriver");
} catch (ClassNotFoundException ex) {
System.out.println("Error: unable to load driver class!");
System.exit(1);
}
myDriver = new oracle.jdbc.driver.OracleDriver();
DriverManager.registerDriver(myDriver);
conn = DriverManager.getConnection(URL, connectionProps);
please help me in solving this issue ?
Thanks.
New Note:
Database itself don't use UTF16 character set, but
"the JDBC OCI driver transfers the data from the server to the client
in the character set of the database. Depending on the value of the
NLS_LANG environment variable, the driver handles character set
conversions: OCI converts the data from the database character set to
UTF-8. The JDBC OCI driver then passes the UTF-8 data to the JDBC
Class Library, where the UTF-8 data is converted to UTF-16."
this note is mentioned here:
http://docs.oracle.com/cd/B10501_01/java.920/a96654/advanc.htm
First you may check the NLS_CHARACTERSET parameter of your database using
the SQL*PLUS command :-
select * from v$nls_parameters where parameter = 'NLS_CHARACTERSET';
the result should be
PARAMETER
VALUE
NLS_CHARACTERSET
AR8MSWIN1256
if it's not, you have to change the value of this parameter using :-
hit WINDOWS KEY + r on your keyboard
write :- SQLPLUS sys as sysdba
press Enter then enter the password or just hit another Enter
issue the following commands :
SHUTDOWN IMMEDIATE
STARTUP RESTRICT
ALTER DATABASE CHARACTER SET INTERNAL_USE AR8MSWIN1256;
ALTER DATABASE CHARACTER SET AR8MSWIN1256;
SHUTDOWN IMMEDIATE
STARTUP
change the value of the NLS_LANG registry string into AMERICAN_AMERICA.AR8MSWIN1256
if your operating system is a flavor of UNIX use
AR8ISO8859P6 instead of AR8MSWIN1256 as the value of NLS_CHARACTERSET
DON'T use National datatypes (i.e NVARCHAR, NTEXT, or NCLOB ) in your database unless you are going to use other languages than (Arabic and English) inside your database
AR8MSWIN1256 character set is sufficient for mixing arabic and english inside the same field (as far as I know).
TAKEN FROM
https://www.youtube.com/watch?v=zMphHE78imM
https://ksadba.wordpress.com/2008/06/10/how-to-show-arabic-characters-in-your-client-app/
Check your oracle version. if it old it can't support UTF16.
here is an article -- hope it will be useful.
http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch6unicode.htm
Related
According to Oracle's 19c documentation:
The schema name can be 128 bytes, the table name can be 128 bytes, and the column name can be 128 bytes.
However, I'm facing this issue whenever I try to use a schema name bigger than 30 bytes:
Caused by: java.sql.SQLException: Invalid argument(s) in call
at oracle.jdbc.driver.PhysicalConnection.setSchema(PhysicalConnection.java:9462)
at com.zaxxer.hikari.pool.ProxyConnection.setSchema(ProxyConnection.java:460)
at com.zaxxer.hikari.pool.HikariProxyConnection.setSchema(HikariProxyConnection.java)
The driver used is:
<dependency>
<groupId>com.oracle.database.jdbc</groupId>
<artifactId>ojdbc8</artifactId>
<version>19.7.0.0</version>
</dependency>
It looks like the driver is not supporting longer object names introduced as per 12c version, any clues if this is somehow configurable? Could it also perhaps be some AWS RDS specific issue?
On SQL Developer using same jdbc url:
SELECT name, value FROM v$parameter WHERE name = 'compatible';
NAME | VALUE
-------------------
compatible | 19.0.0
ALTER SESSION SET CURRENT_SCHEMA = VERY_VERY_VERY_LONG_SCHEMA_NAME;
Session altered.
UPDATE:
After decompiling the driver code this is what I see:
public void setSchema(String schema) throws SQLException {
try {
String quoted = "\"[^\u0000\"]{0,28}\"";
String unquoted = "(\\p{javaLowerCase}|\\p{javaUpperCase})(\\p{javaLowerCase}|\\p{javaUpperCase}|\\d|_|\\$|#){0,29}";
String idPat = "(" + quoted + ")|(" + unquoted + ")";
SQLException var10000;
SQLException var9;
if (schema == null) {
var10000 = var9 = (SQLException)((SQLException)DatabaseError.createSqlException(this.getConnectionDuringExceptionHandling(), 68).fillInStackTrace());
throw var10000;
} else if (!schema.matches(idPat)) {
var10000 = var9 = (SQLException)((SQLException)DatabaseError.createSqlException(this.getConnectionDuringExceptionHandling(), 68).fillInStackTrace());
throw var10000;
} else {
String sql = "alter session set current_schema = " + schema;
Statement stmt = null;
try {
stmt = this.createStatement();
stmt.execute(sql);
...
}
Meaning that the driver is hardcoded to accept only 30 chars. So it seems to be a bug in the Oracle JDBC driver implementation. Any ideas for alternatives?
The setSchema seems to be forgotten in the long identifier change (and it seems you'll have to open SR with Oracle to get it work)
Contrary to that the basic usage of long identifiers (inclusive the binding by name) in JDBC seems to work fine.
Example
def rs = stmt.executeQuery("select COL1, LAAAAAAAAAAAAAAAAAAAAAAAAAAAAARGE_NAME from LAAAAAAAAAAAAAAAAAAAAAAAAAAAAARGE_NAME.LAAAAAAAAAAAAAAAAAAAAAAAAAAAAARGE_NAME")
while(rs.next())
{
println "col1= ${rs.getInt('COL1')} col2= ${rs.getInt('LAAAAAAAAAAAAAAAAAAAAAAAAAAAAARGE_NAME')}"
}
Tested with DB Version 19.3.0.0.0
DriverVersion 19.3.0.0.0 and 21.1.0.0.0
From your documentation link:
The following list of rules applies to both quoted and nonquoted identifiers unless otherwise indicated:
The maximum length of identifier names depends on the value of the COMPATIBLE initialization parameter.
If COMPATIBLE is set to a value of 12.2 or higher, then names must be from 1 to 128 bytes long with these exceptions:
Names of databases are limited to 8 bytes.
Names of disk groups, pluggable databases (PDBs), rollback segments, tablespaces, and tablespace sets are limited to 30 bytes.
If COMPATIBLE is set to a value lower than 12.2, then names must be from 1 to 30 bytes long with these exceptions:
Names of databases are limited to 8 bytes.
Names of database links can be as long as 128 bytes.
You need to check the COMPATIBLE initialisation parameter and if it is set below 12.2 then you are limited to 30 bytes.
We have an Oracle database with the following charset settings
SELECT parameter, value FROM nls_database_parameters WHERE parameter like 'NLS%CHARACTERSET'
NLS_NCHAR_CHARACTERSET: AL16UTF16
NLS_CHARACTERSET: WE8ISO8859P15
In this database we have a table with a CLOB field, which has a record that starts with the following string, stored obviously in ISO-8859-15: X²ARB (here correctly converted to unicode, in particular that 2-superscript is important and correct).
Then we have the following trivial piece of code to get the value out, which is supposed to automatically convert the charset to unicode via globalization support in Oracle:
private static final String STATEMENT = "SELECT data FROM datatable d WHERE d.id=2562456";
public static void main(String[] args) throws Exception {
Class.forName("oracle.jdbc.driver.OracleDriver");
try (Connection conn = DriverManager.getConnection(DB_URL);
ResultSet rs = conn.createStatement().executeQuery(STATEMENT))
{
if (rs.next()) {
System.out.println(rs.getString(1).substring(0, 5));
}
}
}
Running the code prints:
with ojdbc8.jar and orai18n.jar: X�ARB -- incorrect
with ojdbc7.jar and orai18n.jar: X�ARB -- incorrect
with ojdbc-6.jar: X²ARB -- correct
By using UNISTR and changing the statement to SELECT UNISTR(data) FROM datatable d WHERE d.id=2562456 I can bring ojdbc7.jar and ojdbc8.jar to return the correct value, but this would require an unknown number of changes to the code as this is probably not the only place where the problem occurs.
Is there anything I can do to the client or server configurations to make all queries return correctly encoded values without statement modifications?
It definitely looks like a bug in the JDBC thin driver (I assume you're using thin). It could be related to LOB prefetch where the CLOB's length, character set id and the first part of the LOB data is sent inband. This feature was introduced in 11.2. As a workaround, you can disable lob prefetch by setting the connection property
oracle.jdbc.defaultLobPrefetchSize
to "-1". Meanwhile I'll follow up on this bug to make sure that it gets fixed.
Please have a look at Database JDBC Developer's Guide - Globalization Support
The basic Java Archive (JAR) file ojdbc7.jar, contains all the
necessary classes to provide complete globalization support for:
CHAR or VARCHAR data members of object and collection for the character sets US7ASCII, WE8DEC, WE8ISO8859P1, WE8MSWIN1252, and UTF8.
To use any other character sets in CHAR or VARCHAR data members of
objects or collections, you must include orai18n.jar in the CLASSPATH
environment variable:
ORACLE_HOME/jlib/orai18n.jar
I have found this article which describes how to get the Derby database schema version from the command line with derbyrun.jar.
How can I determine the schema version from within my Java program?
EDIT: I answered my own question but I think it is not the best solution. When someone proposes a better solution I will accept that as an answer.
I've come up with something to get the schema version in Java, but it's ugly (and not quite complete). I hope somebody out there has something better:
public static void main(String[] args) throws Exception {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
String strConnectionURL = "jdbc:derby:c:\data\tomcat7\active\LearnyErnie\data\derby;create=false";
Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
Connection connection = DriverManager.getConnection(strConnectionURL);
String strCommand = "values syscs_util.syscs_get_database_property( 'DataDictionaryVersion' );";
InputStream inputStream = new ByteArrayInputStream(strCommand.getBytes("UTF-8"));
ij.runScript(connection, inputStream, "UTF-8", outputStream, "UTF-8");
String strOutput = new String(outputStream.toByteArray());
System.out.println("Output = " + strOutput);
}
strOutput will contain the following text:
Output = CONNECTION0* - jdbc:derby:c:\data\tomcat7\active\LearnyErnie\data\derby
* = current connection
ij> values syscs_util.syscs_get_database_property( 'DataDictionaryVersion' );
1
(long line of dashes taken out because it squirrels up the SO output)
10.9
1 row selected
ij>
The schema version is "10.9" in the output above. It is up to you to parse it out. (I'm not bothering with that today. This is just for extra info in a Help -> About dialog and I will just write the whole ugly mess to the screen.)
We certainly can't count on that parsing logic to work in future versions, or even the call to syscs_get_database_property() itself.
I am hoping someone out there has a more resilient way to do this.
Edit: It occurred to me later that perhaps the JDBC DatabaseMetaData object would contain info on the database schema in one of its properties, but I've examined them all and it doesn't. They give info on the driver version only (which can be different than the database schema).
This is many years later, but I was looking for information on using syscs_util.syscs_get_database_property( 'DataDictionaryVersion' ) from Java and found this post. Thought I'd share an easier way to get the DataDictionaryVersion from Java. This example doesn't have any Exception or other error handling, but provides the basic code on how to do it. I initially wrote more code to determine that the Statement execute() call returns a ResultSet containing one column of varchar data, with a column name of 1. I used ResultSet.getMetaData() to get the column information. Once I figured that out, I chose to
assume that the column information won't change and did not keep that code.
String connectionURL = "jdbc:derby:dbname";
java.sql.Connection conn = java.sql.DriverManager.getConnection(dbConnectionURL)
String ddVersionQueryStr = "values syscs_util.syscs_get_database_property( 'DataDictionaryVersion' )";
java.sql.Statement stmt = conn.createStatement();
boolean isResultSet = stmt.execute(ddVersionQueryStr);
java.sql.ResultSet resultSet = stmt.getResultSet();
resultSet.next();
System.out.print("ddVersion = " + resultSet.getString(1) + "\n");
The output from this for me is "10.11".
I have three textfields (ID,Name,Address) which reflects columns in DB table i'm trying to insert arabic characters into that table but it appears as "?????"
iam connecting to DB using JDBC
my Database info. : MySQL Server 5.5.14 community
Server characterset: latin1
Db characterset: utf8
Client characterset: latin1
Conn. characterset: latin1
i try to encode strings using the following code:
private String ArEncode(String text){
String txt="";
try {
Charset cset = Charset.forName("utf8");
CharsetEncoder encoder = cset.newEncoder();
CharsetDecoder decoder = cset.newDecoder();
ByteBuffer buffer = encoder.encode(CharBuffer.wrap(text));
txt=buffer.asCharBuffer().toString();
} catch (Exception ex) {
Logger.getLogger(UserView.class.getName()).log(Level.SEVERE, null, ex);
}
return txt;
}
then the returned string "txt" is inserted in to the database
note: when i try to insert values directly into DB from netbeans it is inserted correctly and the Arabic characters appears correctly.
Why would you do this? It's the job of the mysql JDBC driver to encode strings correctly in the declared database character set. Set your character set in the database to UTF-8, set the JDBC options correctly, and just do a plain insert of a plain old String.
your encoding logic should if it works correctly return the exact same value in txt as in text thus it is not needed.
And: set the connection characterset to utf8 - the JDBC will/should take care of the rest...
I'm doing
private void doSomething(ScrollableResults scrollableResults) {
while(scrollableResults.next()) {
Object[] result = scrollableResults.get();
String columnValue = (String) result[0];
}
}
I tried this in two computers
It works fine. It is a Windows 7. System.getProperty("file.encoding") returns Cp1252.
When the word in the database has accents columnValue gets strange values. Is is a CentOS. System.getProperty("file.encoding") returns UTF-8.
Both databases are MySql, Charset: latin1, Collation: latin1_swedish_ci.
What should I do to correct this?
My suggestion would be to use UTF-8 everywhere:
at the database/tables level (the following ALTER will change the character set not only for the table itself, but also for all existing textual columns)
ALTER TABLE <some table> CONVERT TO CHARACTER SET utf8
in the connection string (which is required with MySQL's JDBC driver or it will use the client's encoding)
jdbc:mysql://localhost:3306/db_name?useUnicode=yes&characterEncoding=UTF-8
References
MySQL 5.0 Reference Manual
9.1.3.2. Database Character Set and Collation
9.1.3.3. Table Character Set and Collation
Connector/J (JDBC) Reference
20.3.4.4. Using Character Sets and Unicode