Data transfer from one Database server to another in Java - java

I don't really ask a lot of questions but This time, it's too much. Here's the problem.
I have these two database (Sybase) servers and there's this database with over 90 tables but i need to archive only 20 tables.
These tables are however quite large and can contain up to 90million records. So here's the deal. Currently, what I do is that
For the big tables (alot of records), I create a temp table and copy from the temp table to the destination but running an insert for each statement.
After the copying is done, i drop the temp table created.
Now, I've tried other methods like for instance.
Up to now, the multi threading is just okay but the speed of archival is not good enough. for instance, it can archive up to 1.6M records within one hour. That is not good enough for my Boss.
Kindly advice on any other solution, approach or thought you'd think can help. Please not that all solutions are welcome.
Thanks in advance.

Do not copy such amount of data by yourself. Create database jobs to copy/archive tables. And monitor the output/logs of those jobs in your application. It will be much faster.

Generate SQL executive script and pass it to database. It mean fetch all records from select statement and create insert/update statements:
String query = "UPDATE OR INSERT INTO TABLE (ID, VALUE) VALUES (9, 2) MATCHING (IDPRODUCT, COUNT); "+
"UPDATE OR INSERT INTO TABLE (ID, VALUE) VALUES (10, 1) MATCHING (IDPRODUCT, COUNT); "+
"COMMIT WORK;";
If Sybase can connect to other Sybase instance create procedure for execute previous clause. For FierebirdSQL it possible through ON EXTERNAL and EXECUTE PROCEDURE with procedure name as parameter instructions.
The users need to monitor and know how the tool is running. For each commited table update user interface.

Related

How to copy selective data from one database to another (ORACLE)

We have a need to find a way to copy certain data from production into our dev regions so that we can debug/fix any issue.
Sometimes single user related data gets impacted. We have to replicate the same scenario in dev and find a solution.
Presently we follow two approaches:-
1. Check the audit history and try to recreate the similar scenario
in dev. <50% sucess rate in recreating the exact same scenario.
2. Restore+Encrypt the "whole" production into dev and then continue
on the work. It is an overkill if issue impacts only a single user.
So I am trying to find a way to just select a single user data from production and insert it into dev region.
We just have Java and Oracle. Can't use any external tools. Because we
dont have license and cannot download freeware due to security issues.
I tried the follwing:-
Write a java code which will query the informaition schema tables to find the relationships between the tables and create select statements like below:-
select 'insert into TABLE1(C1,C2,C3,C4) values ('||''''||C1||''''||','||coalesce(to_char(C2),'null')||','||''''||C3||''''||','||coalesce(to_char(C4),'null'));'
from TABLE1 where ID='1006' union all
select 'insert into TABLE2(C1,C2,C3,C4) values ('||''''||C1||''''||','||coalesce(to_char(C2),'null')||','||''''||C3||''''||','||coalesce(to_char(C4),'null'));'
from TABLE2 WHERE TABLE1ID in ( select ID FROM TABLE1 where ID='1006') union all
select 'insert into TABLE3(C1,C2,C3,C4) values ('||''''||C1||''''||','||coalesce(to_char(C2),'null')||','||''''||C3||''''||','||coalesce(to_char(C4),'null'));'
from TABLE3 WHERE TABLE2ID in ( select ID FROM TABLE2 WHERE TABLE1ID in ( select ID FROM TABLE1 where ID='1006'));
2. Use this set of selects in production, so that you get a set of insert statements as output.
3. Use the insert statements in dev.
Problem:-
The select queries are becoming huge. Around 25 MB in total :(
We cannot even execute that big query in production.
Could you suggest any better approach for this usecase?
Does oracle itself allow selective data exports? Or any other way I should write my java code?
We use something like this to move records from one database to another:
copy from username/password#database1 to username/password#database2 insert target_table using select * from source_table where where_clause_goes_here;
Use datapump to move data for the tables you need and with the whereclause you want. Straight forward and standard functionality of the database.
If both the DBs are Oracle, you can create a DBLINK in your local
database for the remote DB and Create a job in your local DB that queries
all the data from remote DB using the DBLINK, and update the tables in your
local database.
Or there are plenty of data migration API are availabe you can give a try to one of them.
Below are the some link,have a look in to them,may be it will solve your problem
http://code.google.com/p/c5-db-migration/
http://flywaydb.org/documentation/migration/java.html
http://migrate4j.sourceforge.net/
http://flywaydb.org/ --- its better to use
http://www.operatornew.com/2012/11/automatic-db-migration-for-java-web.html

Table rows seem to be disapearing

I have a ton of raw html files that I'm parsing and inserting to a MySQL database via a connection in Java.
I'm using "REPLACE INTO" statements and this method:
public void migrate(SomeThread thread) throws Exception{
PreparedStatement threadStatement = SQL.prepareStatement(threadQuery);
thread.prepareThreadStatement(threadStatement);
threadStatement.executeUpdate();
threadStatement.close();
for(SomeThread.Post P : thread.threadPosts){
PreparedStatement postStatement = SQL.prepareStatement(postQuery);
P.preparePostStatement(postStatement);
postStatement.executeUpdate();
postStatement.close();
}
}
I am running 3 separate instances of my program each in its own command prompt, with their own separate directory of htmls to parse and commit.
I'm using HeidiSQL to monitor the database and a funny thing is happening where I'll see that I have 500,000 rows in a table at one point for example, then I'll close HeidiSQL and check back later to find that I now have 440,000 rows. The same thing occurs for the two tables that I'm using.
Both of my tables use a primary key called "id", each of their ID's have their own domain but it's possible their values overlap and are overwriting each other? I'm not sure if this could be an issue because I'd think SQL would differentiate between the table's "local" id values.
Otherwise I was thinking it could be that since I'm running 3 separate instances that each have their connection to the DB, some kind of magic is happening where right as one row is being committed, the execution swaps to another commit statement, displaces the table, then back to the first commit and then some more magic that causes the database to roll back the number of rows collected.
I'm pretty new to SQL so I'm not too sure where to start, if somebody has an idea about what the heck is going on and could point me in the right direction I'd really appreciate it.
Thanks
You might want to use INSERT INTO instead of REPLACE INTO.
Data doesn't disappear.
Here are some tips:
Do you have another thread running that actually deletes entries?
Do other people have access to the database?
Not sure what HeidiSQL may do. To exclude that possibility maybe use MySQL Workbench instead.
Yeah now that I run a COUNT(*) query against my tables I see that all my rows are in fact there.
Most likely the heidiSQL summary page is just a very rough estimate.
Thanks for the suggestion to use workbench pete, I will try it and see if it is better than Heidi as Heidi is freezing up on me on a regular basis.

Get inner SELECT result as ResultSet from INSERT INTO SELECT statement in mySQL Java

I have 2 DBs, Database A and Database B.
What I want to achieve:
build records from Database A and insert them to Database B
Process those records in my java app
What I'm currently doing:
I use two separate queries:
For (1) I use INSERT INTO ... SELECT ...
For (2) I perform another SELECT.
My solution works but it isn't optimal since I'm getting the records from Database A twice (instead of just one time).
Is there a way to execute the INSERT INTO ... SELECT ... and get the inner select result as a ResultSet?
I know I can perform only a SELECT and then insert the records in a batch, but thats a bit cumbersome and I want to find out if there's a cleaner solution.
Your cleaner solution look more cumbersome than simple read and write operation.
As you have to manipulate data in database B. You simply do this
Read Data from A to your app
Process data
Write data to B from your app
Then you have singe read single write and is simple.
You can not gain the result of INSERT INTO as Result set as this is INSERT statement
Sadly, I do not think that this is possible. What you are trying to achieve are two distinct operations i.e. an INSERT and a SELECT. However you cut it you are still going have to do at least one INSERT and one SELECT.
use this for two database
INSERT INTO Database2 (field1,field2,field3){
SELECT * FROM Database1;);
Both the database have the same field name.

Storing result set for later fetch

I have some queries that run for a quite long (20-30 minutes). If a lot of queries are started simultaneously, connection pool is drained quickly.
Is it possible to wrap the long-running query into a statement (procedure) that will store the result of a generic query into a temp table, terminanting the connection, and fetchin (polling) the results later on demand?
EDIT: queries and data stuctures are optimized, and tips like 'check your indices and execution plan' don't work for me. I'm looking for a way to store [maybe a] byte presentation of a generic result set, for later retreive.
First of all, 20-30 minutes is an extremely long time for a query - are you sure you aren't missing any indexes for the query? Do check your execution plan - you could get a huge performance gain from a well-placed index.
In MySQL, you could do
INSERT INTO `cached_result_table` (
SELECT your_query_here
)
(of course, cached_result_table needs to have the exact same column structure as your SELECT returns, otherwise you'll get an error).
Then, you could query these cached results (instead of the original tables), and only run the above query from time to time - to update the cached_result_table.
Of course, the query will need to run at least once initially, which will take the 20-30 minutes you mentioned. I suggest to pre-populate the cached table before the data are requested, and keep some locking mechanism to prevent the update query to run several times simultaneously. Pseudocode:
init:
insert select your_big_query
work:
if your_big_query cached table is empty or nearing expiration:
refresh in the background:
check flag to see if there's another "refresh" process running
if yes
end // don't run two your_big_queries at the same time
else
set flag
re-run your_big_query, save to cached table
clear flag
serve data to clients always from cached table
An easy way to do that in Oracle is "CREATE TABLE sometempname AS SELECT...". That will create a new table using the result columns from the select.
Not quite sure what you are requesting.
Currently you have 50 database sessions. Say you get 40 running long-running queries, that leaves 10 to service the rest.
What you seem to be asking for is, you want those 40 queries asynchronously (running in the background) not clogging up the connection pool of 50. The question is, do you want those 40 running concurrently with (potentially) another 50 queries from the connection pool, or do you want them queued up in some way ?
Queuing can be done (look into DBMS_SCHEDULER and DBMS_JOB). But you will need to deliver those results into some other table and know how to deliver that result set. The old fashioned way is simply to generate reports on request that get delivered to a directory on a shared drive or by email. Could be PDF or CSV or Excel.
If you want the 40 running concurrently alongside the 50 'connection pool' settings, then you may be best off setting up a separate connection pool for the long-running queries.
You can look into Resource Manager for terminating calls that take too long or too many resources. That way the quickie pool can't get bogged down in long running requests.
The most generic approach in Oracle I can think of is creating a stored procedure that will convert a result set into XML, and store it as CLOB XMLType in a table with the results of your long-running queries.
You can find more on generation XMLs from a generic result sets here.
SQL> select dbms_xmlgen.getxml('select employee_id, first_name,
2 last_name, phone_number from employees where rownum < 6') xml
3 from dual

Insert a lot of data into database in very small inserts

So i have a database where there is a lot of data being inserted from a java application. Usualy i insert into table1 get the last id, then again insert into table2 and get the last id from there and finally insert into table3 and get that id as well and work with it within the application. And i insert around 1000-2000 rows of data every 10-15 minutes.
And using a lot of small inserts and selects on a production webserver is not really good, because it sometimes bogs down the server.
My question is: is there a way how to insert multiple data into table1, table2, table3 without using such a huge amount of selects and inserts? Is there a sql-fu technique i'm missing?
Since you're probably relying on auto_increment primary keys, you have to do the inserts one at a time, at least for table1 and table2. Because MySQL won't give you more than the very last key generated.
You should never have to select. You can get the last inserted id from the Statement using the getGeneratedKeys() method. See an example showing this in the MySQL manual for the Connector/J:
http://dev.mysql.com/doc/refman/5.1/en/connector-j-usagenotes-basic.html#connector-j-examples-autoincrement-getgeneratedkeys
Other recommendations:
Use multi-row INSERT syntax for table3.
Use ALTER TABLE DISABLE KEYS while you're importing, and re-enable them when you're finished.
Use explicit transactions. I.e. begin a transaction before your data-loading routine, and commit at the end. I'd probably also commit after every 1000 rows of table1.
Use prepared statements.
Unfortunately, you can't use the fastest method for bulk load of data, LOAD DATA INFILE, because that doesn't allow you to get the generated id values per row.
There's a lot to talk about here:
It's likely that network latency is killing you if each of those INSERTs is another network roundtrip. Try batching your requests so they only require a single roundtrip for the entire transaction.
Speaking of transactions, you don't mention them. If all three of those INSERTs need to be a single unit of work you'd better be handling transactions properly. If you don't know how, better research them.
Try caching requests if they're reused a lot. The fastest roundtrip is the one you don't make.
You could redesign your database such that the primary key was not a database-generated, auto-incremented value, but rather a client generated UUID. Then you could generated all the keys for every record upfront and batch the inserts however you like.

Categories