We need to pull some tables from an Informix SE database, truncate tables on Oracle 10g, and then populate them with the Informix data.
Does a bulk import work? Will data types clash?
I'd like to use a simple Java executable that we can schedule daily. Can a Java program call the bulk import? Is there an example you can provide? Thanks.
Interesting scenario!
There are several issues to worry about:
What format does Oracle's bulk import expect the data to be in?
What is the correct format for the DATE and DATETIME values?
Pragmatically (and based on experience with Informix rather than Oracle), rather than truncate the tables before bulk loading, I would bulk load the data into newly created tables (a relatively time-consuming process), then arrange to replace the old tables with the new. Depending on what works quickest, I'd either do a sequence of operations:
Rename old table to junk table
Rename new table to old table
followed by a sequence of 'drop junk table' operations, or I'd do:
Drop old table
Rename new table to old table
If the operations are done this way, the 'down time' for the tables is minimized, compared with 'truncate table' followed by 'load table'.
Oracle is like SE - its DDL statements are non-transactional (unlike IDS where you can have a transaction that drops a table, creates a new one, and then rolls back the whole set of operations).
How to export the data?
This depends on how flexible the Oracle loaders are. If they can adapt to Informix's standard output formats (for example, the UNLOAD format), then the unloading operations are trivial. You might need to set the DBDATE environment variable to ensure that date values are recognized by Oracle. I could believe that 'DBDATE="Y4MD-"' is likely to be accepted; that is the SQL standard 2009-12-02 notation for 2nd December 2009.
The default UNLOAD format can be summarized as 'pipe-delimited fields with backslash escaping embedded newlines, backslash and pipe symbols':
abc|123|2009-12-02|a\|b\\c\
d||
This is one record with a character string, a number, a date, and another character string (containing 'a', '|', 'b', '\', 'c', newline and 'd') and a null field. Trailing blanks are removed from character strings; an empty but non-null character field has a single blank in the unload file.
If Oracle cannot readily be made to handle that, then consider whether Perl + DBI + DBD::Informix + DBD::Oracle might be a toolset to use - this allows you to connect to both the Oracle and the Informix (SE) databases and transfer the data between them.
Alternatively, you need to investigate alternative unloaders for SE. One program that may be worth investigating unless you're using Windows is SQLCMD (fair warning: author's bias creeping in). It has a fairly powerful set of output formatting options and can probably create a text format that Oracle would find acceptable (CSV, for example).
A final fallback would be to have a tool generate INSERT statements for the selected data. I think this could be useful as an addition to SQLCMD, but it isn't there yet. So, you would have to use:
SELECT 'INSERT INTO Target(Col1, Col2) VALUES (' ||
Col1 || ', ''' || Col2 || ''');'
FROM Source
This generates a simple INSERT statement. The snag with this is that it is not robust if Col2 (a character string) itself contains quotes (and newlines may cause problems on the receiving end too). You'd have to evaluate whether this is acceptable.
Related
I write a scala program which interoperates with some database engines (for example MySQL, PostgreSQL).
I use JDBC api to handle SQL queries. I use queries to create table, and I want to create a table with the fields given by users, then these fields are names which can contains spaces or words with accent.
For example, create a table dummy with 2 fields, 'column 1' and 'column 2' as varchar fields.
Writing this query for MySQL database while preserving the spaces contained in the fields, we need to use backticks in the query like :
CREATE TABLE dummy (`column 1` varchar(20), `column 2` varchar(20));
In the same way, the right way to write this query for PostgreSQL while preserving the spaces is :
CREATE TABLE dummy ("column 1" varchar(20), "column 2" varchar(20));
Maybe for another database engine, there is a different way to write this query.
Is there any standard way to write this query with the constrainsts above and using JDBC so that it works with any database engines ?
Thank in advance for your answers.
This doesn't directly address your question as asked, but I think that you would be better off not naming your columns this way. I would 'normalize' the column names (by, e.g. replacing spaces with underscores).
The column names should probably not be exposed directly to the users anyway.
If you need 'human readable' names for columns, I would store them in another table. Or, if it is as simple as preserving spaces, just reverse the process, replacing underscores with spaces.
As already mentioned you should not use spaces, special characters or reserved words as column or table names. To play it safe you can generally put quotes around table and column names to avoid case-sensitivy issues accross databases.
CREATE TABLE "foo" ("id" VARCHAR(32), "bar" VARCHAR(64))
According to SQL-99 standard double quotes (") are used to delimit identifiers. Case sensitivy actually relates to the type of database used and it's settings. There are dbs that will upper- or lowercase your table and column names if they are not quoted but sometimes only in CREATE TABLE or ALTER TABLE commands. Which can lead to runtime errors.
For example CREATE TABLE foo might actually create a table named FOO. When doing a SELECT * FROM foo you might get an error because table foo does not exist but table FOO does exist. Having worked accross lots of DBMS I tend to use quotes for table and column names.
The important part is that you have to stick to writing lowercase or uppercase when using quotes because "foo" is not equal to "FOO" but foo might be equal to FOO depending on the used DBMS. Either do lowercase or uppercase but stick to it if you're using quotes.
You should also avoid database specific column types (stick to ANSI SQL whenever possible).
But doing database migrations by hand is very tedious and error-prone. I would suggest to use migration tools (flyway db) or let the migrations get created by libraries like slick.
As you mentioned scala, please have a look at Slick (http://slick.lightbend.com/) which is a great functional database layer for scala. There are others too but that one I use heavily and can recommend it.
Hi: We have a tool that is able to handle reports for unicode support. It works fine until we encounter this new report for Polish characters.
We are able to retrieve the data and display correctly, however, when we use the data as input to perform search, it seems not convert some of the character correctly and therefore, not able to retrieve data. Here is an sample.
Table polish has two columns: party, description. One of the value of party is "Bełchatów". I use jdbc to read that value from database and search with the following statement using SQL:
SELECT * from polish where party = N'Bełchatów'
However, this give me no result. This is with ojdbc6.jar. (JDK 8) However, this does give me result back with ojdbc7.jar.
What is the reason? And how can we fix when using ojdbc6.jar.
Thanks!
This is because the Oracle JDBC driver doesn't convert the string into unicode character. There is a database property, oracle.jdbc.defaultNChar=true.
http://docs.oracle.com/cd/B14117_01/java.101/b10979/global.htm
When this property is true, it will convert the string when it is mark with N'Belchatów' nchart literal into u'Be\0142chat\00f3w'.
The user can also set in data source level. Depends on your Persist API vendor, the way to set it can be different.
I am parsing rss news feeds from over 10 different languages.
All the parsing is being done in java and data is stored in MySQL before my API's written in php are responding to the clients.
I constantly come across garbage characters when I read the data.
What have I tried :
I have configured my MySQL to store utf-8 data. My db,table and even the column have UTF8 as their default charset.
While connecting my db,I set the character set results as utf-8
When I run the jar file manually to insert the data,the character's appear fine. But when I set a cronjob for the same jar file,I start facing the problem all over again.
In English,I particularly face problems like this and in other vernacular languages,the character appear to be totally garbish and I cant even recongnize a single character.
Is there anything that I am missing?
Sample garbage characters :
Gujarati :"રેલવે મà«àª¸àª¾àª«àª°à«€àª®àª¾àª‚ સામાન ચોરી થશે તો મળશે વળતર!"
Malyalam : "നേപàµà´ªà´¾à´³à´¿à´²àµ‡à´•àµà´•àµà´³àµà´³ കോളàµâ€ നിരകàµà´•àµ à´•àµà´±à´šàµà´šàµ"
English : Bank Board Bureau’s ambit to widen to financial sector PSUs
The Gujarati starts રેલવે, correct? And the Malyalam starts നേപ, correct? And the English should have included Bureau’s.
This is the classic case of
The bytes you have in the client are correctly encoded in utf8. (Bureau is encoded in the Ascii/latin1 subset of utf8; but ’ is not the ascii apostrophe.)
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the table was declared CHARACTER SET latin1. (Or possibly it was inherited from the table/database.) (It should have been utf8.)
The fix for the data is a "2-step ALTER".
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8 ...;
where the lengths are big enough and the other "..." have whatever else (NOT NULL, etc) was already on the column.
Unfortunately, if you have a lot of columns to work with, it will take a lot of ALTERs. You can (should) MODIFY all the necessary columns to VARBINARY for a single table in a pair of ALTERs.
The fix for the code is to establish utf8 as the connection; this depends on the api used in PHP. The ALTERs will change the column definition.
Edit
You have VARCHAR with the wrong CHARACTER SET. Hence, you see Mojibake like રેલ. Most conversion techniques try to preserve રેલ, but that is not what you need. Instead, taking a step to VARBINARY preserves the bits while ignoring the old definition of the bits representing latin1-encoded characters. The second step again preserves the bits, but now claiming they represent utf8 characters.
I am having trouble puting those single quotes for ASCII/Timestamp columns and not puting for other types like Int, Decimal, Boolean etc.
The data comes from another db/table, which is a sql.
I have all the column data as string. I don't want to format each column data to check null values and then decide to put quote or not.
Is it possible to pass in insert data value without giving single quotes, using prepared statement or whatever.
If you don't want to write a loader that uses prepared statements (via the CQL driver...which is a good idea), I can think of one other way. To import without using single quotes, you should be able to accomplish this with the COPY FROM CQL3 command (setting the QUOTE parameter to an empty string). If you can dump your RDBMS data to a csv file, you should be able to insert those values into Cassandra like this:
COPY myColumnFamily (colname1,colname2,colname3)
FROM '/home/myUser/rdbmsdata.csv' WITH QUOTE='';
Check out the documentation on the COPY command for more information. Examples can be found here.
EDIT:
I also read the above question and assumed that you did not want a prepared statement-based answer. Since that's obviously not the case, I thought I'd also provide one here (using DataStax's Java CQL driver). Note that this answer is based on my column family and column names from my example above, and assumes that col1 is the (only) primary key.
PreparedStatement statement = session.prepare(
"UPDATE myKeyspace.myColumnFamily " +
"SET col2=?, col3=? " +
"WHERE col1=?");
BoundStatement boundStatement = statement.bind(
strCol2, strCol3, strCol1);
session.execute(boundStatement);
This solution does not require you to encapsulate your string data in single quotes, and has a few added benefits over your String.ReplaceAll:
Allows you to insert values containing single quotes.
Escapes your values, protecting you from CQL-Injection (the lesser-known relative of SQL-Injection).
In CQL, both UPDATE and INSERT add a record if it does not exist and update it if it does (effectively known as an "UPSERT"). Using an UPDATE over an INSERT supports counter columns (if your schema ends up using them).
Prepared statements are faster, because they allow Cassandra to only have to parse the query once, and then re-run that same query with different values.
For more information, check out DataStax's documentation on using prepared statements with the Java Driver.
Finally did it using String.format clubbed with replace
String.format("INSERT INTO xyz_zx(A,B,C,D) VALUES('%s','%s',%s,%s);",(Object[])Strings).replaceAll("'null'","null");
I'm using Netbeans building a web application using Java, JSP that handle a database with Hebrew fields.
The DDL is as follows:
String cityTable = "CREATE TABLE IF NOT EXISTS hebrew_test.table ("
+"id int(11) NOT NULL AUTO_INCREMENT,"
+"en varchar(30) NOT NULL,"
+"he varchar(30) COLLATE utf8_bin NOT NULL,"
+"PRIMARY KEY (id)"
+") ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1;";
String insert = "INSERT INTO hebrew_test.table (en, he) VALUES ('A','a')";
String insert2 = "INSERT INTO hebrew_test.table (en, he) VALUES ('B','ב')";
String insert3 = "INSERT INTO hebrew_test.table (en, he) VALUES ('C','אבג')";
executeSQLCommand(cityTable);
executeSQLCommand(insert);
executeSQLCommand(insert2);
executeSQLCommand(insert3);
The output tabel I get:
1 A a
2 B ?
3 C ???
Instead of:
1 A a
2 B ב
3 C אבג
I tried Hebrew appears as question marks in Netbeans, but that isn't the same problem. I get the question marks in the table.
Also I defined the table to be in UTF8_bin as you can see in the above code.
You need to tell the JDBC driver to use UTF-8 encoding while decoding the characters representing the SQL query to bytes. You can do that by adding useUnicode=yes and characterEncoding=UTF-8 query parameters to the JDBC connection URL.
jdbc:mysql://localhost:3306/db_name?useUnicode=yes&characterEncoding=UTF-8
It will otherwise use the operating system platform default charset. The MySQL JDBC driver is itself well aware about the encoding used in both the client side (where the JDBC code runs) and the server side (where the DB table is). Any character which is not covered by the charset used by the DB table will be replaced by a question mark.
See also:
Spring Encoding with CharacterEncodingFilter in web.xml
You're including your values directly into the SQL. That's always a bad idea. Use a PreparedStatement, parameterized SQL, and set the values as parameters. It may not fix the problem - but it's definitely the first thing to attempt, as you should be using parameterized SQL anyway. (Parameterized SQL avoids SQL injection attacks, separates code from data, and avoids unnecessary conversions.)
Next, you should work out exactly where the problem is really occurring:
Make sure that the value you're trying to insert is correct.
Check that the value you retrieve is correct.
Check what's in your web response using Wireshark - check the declared encoding and what's in the actual data
When checking the values, you should iterate over each character in the string and print out the value as a UTF-16 code unit (either use toCharArray() or use charAt() in a loop). Just printing the value to the console leaves too much chance of other problems.
EDIT: For a little context of why I wrote this as an answer:
In my experience, including string values as parameters rather than directly into SQL can sometimes avoid such issues (and is of course better for security reasons etc).
In my experience, diagnosing whether the problem is at the database side or the web side is also important. This diagnosis is best done via logging the exact UTF-16 code units being used, not just strings (as otherwise further encoding issues during logging or console output can occur).
In my experience, problems like this can easily occur at either insert or read code paths.
All of this is important as a way of moving the OP forward, not just in a comment-like request for more information.