How to programmatically transfer a lot of data between tables? - java

i have two tables where in the first one i have 14 millions and in the second one i have 1.5 million of data.
So i wonder how could i transfer this data to another table to be normalized ?
And how do i convert some type to another, for example: i have a field called 'year' but its type is varchar, but i want it an integer instead, how do i do that ?
I thought about do this using JDBC in a loop while from java, but i think this is not effeciently.
// 1.5 million of data
CREATE TABLE dbo.directorsmovies
(
movieid INT NULL,
directorid INT NULL,
dname VARCHAR (500) NULL,
addition VARCHAR (1000) NULL
)
//14 million of data
CREATE TABLE dbo.movies
(
movieid VARCHAR (20) NULL,
title VARCHAR (400) NULL,
mvyear VARCHAR (100) NULL,
actorid VARCHAR (20) NULL,
actorname VARCHAR (250) NULL,
sex CHAR (1) NULL,
as_character VARCHAR (1500) NULL,
languages VARCHAR (1500) NULL,
genres VARCHAR (100) NULL
)
And this is my new tables:
DROP TABLE actor
CREATE TABLE actor (
id INT PRIMARY KEY IDENTITY,
name VARCHAR(200) NOT NULL,
sex VARCHAR(1) NOT NULL
)
DROP TABLE actor_character
CREATE TABLE actor_character(
id INT PRIMARY KEY IDENTITY,
character VARCHAR(100)
)
DROP TABLE director
CREATE TABLE director(
id INT PRIMARY KEY IDENTITY,
name VARCHAR(200) NOT NULL,
addition VARCHAR(150)
)
DROP TABLE movie
CREATE TABLE movie(
id INT PRIMARY KEY IDENTITY,
title VARCHAR(200) NOT NULL,
year INT
)
DROP TABLE language
CREATE TABLE language(
id INT PRIMARY KEY IDENTITY,
language VARCHAR (100) NOT NULL
)
DROP TABLE genre
CREATE TABLE genre(
id INT PRIMARY KEY IDENTITY,
genre VARCHAR(100) NOT NULL
)
DROP TABLE director_movie
CREATE TABLE director_movie(
idDirector INT,
idMovie INT,
CONSTRAINT fk_director_movie_1 FOREIGN KEY (idDirector) REFERENCES director(id),
CONSTRAINT fk_director_movie_2 FOREIGN KEY (idMovie) REFERENCES movie(id),
CONSTRAINT pk_director_movie PRIMARY KEY(idDirector,idMovie)
)
DROP TABLE genre_movie
CREATE TABLE genre_movie(
idGenre INT,
idMovie INT,
CONSTRAINT fk_genre_movie_1 FOREIGN KEY (idMovie) REFERENCES movie(id),
CONSTRAINT fk_genre_movie_2 FOREIGN KEY (idGenre) REFERENCES genre(id),
CONSTRAINT pk_genre_movie PRIMARY KEY (idMovie, idGenre)
)
DROP TABLE language_movie
CREATE TABLE language_movie(
idLanguage INT,
idMovie INT,
CONSTRAINT fk_language_movie_1 FOREIGN KEY (idLanguage) REFERENCES language(id),
CONSTRAINT fk_language_movie_2 FOREIGN KEY (idMovie) REFERENCES movie(id),
CONSTRAINT pk_language_movie PRIMARY KEY (idLanguage, idMovie)
)
DROP TABLE movie_actor
CREATE TABLE movie_actor(
idMovie INT,
idActor INT,
CONSTRAINT fk_movie_actor_1 FOREIGN KEY (idMovie) REFERENCES movie(id),
CONSTRAINT fk_movie_actor_2 FOREIGN KEY (idActor) REFERENCES actor(id),
CONSTRAINT pk_movie_actor PRIMARY KEY (idMovie,idActor)
)
UPDATE:
I'm using SQL Server 2008.
Sorry guys i forgot to mention that are different databases :
The not normalized is call disciplinedb and the my normalized call imdb.
Best regards,
Valter Henrique.

If both tables are in the same database, then the most efficient transfer is to do it all within the database, preferably by sending a SQL statement to be executed there.
Any movement of data from the d/b server to somewhere else and then back to the d/b server is to be avoided unless there is a reason it can only be transformed off-server. If the destination is different server, then this is much less of an issue.

Though my tables were dwarfs compared to yours, I got over this kind of problem once with stored procedures. For MySQL, below is a simplified (and untested) essence of my script, but something similar should work with all major SQL bases.
First you should just add a new integer year column (int_year in example) and then iterate over all rows using the procedure below:
DROP PROCEDURE IF EXISTS move_data;
CREATE PROCEDURE move_data()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE orig_id INT DEFAULT 0;
DECLARE orig_year VARCHAR DEFAULT "";
DECLARE cur1 CURSOR FOR SELECT id, year FROM table1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur1;
PREPARE stmt FROM "UPDATE table1 SET int_year = ? WHERE id = ?";
read_loop: LOOP
FETCH cur1 INTO orig_id, orig_year;
IF done THEN
LEAVE read_loop;
END IF;
SET #year= orig_year;
SET #id = orig_id;
EXECUTE stmt USING #orig_year, #id;
END LOOP;
CLOSE cur1;
END;
And to start the procedure, just CALL move_data().
The above SQL has two major ideas to speed it up:
Use CURSORS to iterate over a large table
Use PREPARED statement to quickly execute pre-known commands
PS. for my case this speeded things up from ages to seconds, though in your case it can still take a considerable amount of time. So it would be probably best to execute from command line, not some web interface (e.g. PhpMyAdmin).

I just recently did this for ~150 Gb of data. I used a pair of merge statements for each table. The first merge statement said "if it's not in the destination table, copy it there" and the second said "if it's in the destination table, delete it from the source". I put both in a while loop and only did 10000 rows in each operation at a time. Keeping it on the server (and not transferring it through a client) is going to be a huge boon for performance. Give it a shot!

Related

How to deal with metadata in a Many-To-Many relationship with Spring Data JDBC?

I have a project, where bot_users play in game_tables. So I have a join table. I also store the buy_in (points available to the player at the table) in that join table. My SQL:
CREATE TABLE bot_users (
user_id bigint PRIMARY KEY,
free_points bigint CHECK (free_points >= 0),
frozen_points bigint CHECK (frozen_points >= 0)
);
CREATE TABLE game_tables (
channel_id bigint PRIMARY KEY,
owner bigint REFERENCES bot_users(user_id),
in_game boolean NOT NULL
);
CREATE TABLE game_tables_participants (
game_table_id bigint REFERENCES game_tables(channel_id),
participant_id bigint REFERENCES bot_users(user_id),
buy_in bigint NOT NULL,
PRIMARY KEY (game_table_id, participant_id)
);
My question now is: how do I represent that buy_in metadata in entities in Java? If buy_in didn't exist, I would simply have a ParticipantRef of which I would have a Set in the GameTable entity, and then have methods that work on IDs there. But I want to have buy_in available in the code too, so should I create a ParticipantRef-like entity that contains the buy_in? If yes, then how would persisting it work?
Your proposed solution is pretty much the way to do this. Instead of having a ParticipantRef with just the userId of the BotUsers it also gets a buyIn attribute.
Since you have an object references from GameTable via a Set to the ParticipantRef it will be considered part of the GameTable-Aggregate and get persisted whenever you persist a GameTable instance.

Creating a table in mysql with java named by the user?

The question is, is there a way to create a new table named by the user from a text field. I know its a huge injection port, but i really need new tables, it will work only offline. I tried
String newtable = jTextField1.getText();
PreparedStatement create = conn.prepareStatement("CREATE TABLE IF NOT EXISTS '"+newtable+"'(ID INTEGER NOT NULL AUTO_INCREMENT, IDapol INTEGER, ΗΜΕΡΟΜΗΝΙΑ DATE, ΕΣΟΔΑ DOUBLE, PRIMARY KEY(ID), CONSTRAINT IDapol FOREIGN KEY(IDapol) REFERENCES apol(IDapol)");
but i get an error saying: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near "1718"(ID INTEGER NOT NULL AUTO_INCREMENT, IDapol, INTEGER, ΗΜΕΡΟΜΗΝΙΑ' at line 1
1718 is the value of my textField1.
Any help i could use? Thanks
As per here: https://dev.mysql.com/doc/refman/5.7/en/identifiers.html , "Identifiers may begin with a digit but unless quoted may not consist solely of digits."
Also, currently your code is wide open for an SQL injection attack.
Table names are never in '. Either use backticks (`) or simply nothing:
"CREATE TABLE IF NOT EXISTS " + newtable + " (...)
Your table's name should also start with a character!
CREATE TABLE IF NOT EXISTS 1718
(ID INTEGER NOT NULL AUTO_INCREMENT,
IDapol INTEGER,
ΗΜΕΡΟΜΗΝΙΑ DATE,
ΕΣΟΔΑ DOUBLE,
PRIMARY KEY(ID),
CONSTRAINT IDapol
FOREIGN KEY(IDapol) REFERENCES apol(IDapol);
give table name in `` codes

MySQL Get value inserted by trigger in Java

Using MySQL, I have the following SQL Table definition:
CREATE TABLE books (
author INT,
book INT,
name VARCHAR(128),
PRIMARY KEY(author, book)
);
What I want is that I have an Id for author that I set manually and an Id for book that is incremented for each author id. Therefore I created a trigger like so:
CREATE TRIGGER trBooks
BEFORE INSERT ON books
FOR EACH ROW SET NEW.book = (
SELECT COALESCE(MAX(book), -1) + 1 FROM books
WHERE author = NEW.author
);
This works fine for me. But now I need to know the book id that was set for my inserted entry that I inserted in Java. Something like the Insert with Output as in MSSQL or a Statement.executeQuery("INSERT ..."). The solution has to be thread safe, so a separate INSERT and SELECT is no good solution, since there might have been another INSERT in the meantime.
Thanks for your help!
Your data model just doesn't make sense. You have two entities, "books" and "authors". These should each be represented as a table. Because a book can have multiple authors and an author can write multiple books, you want a junction table.
This looks like this:
CREATE TABLE Books (
BookId INT auto_increment primary key,
Title VARCHAR(255)
);
CREATE TABLE Authors (
AuthorId INT auto_increment primary key,
Name VARCHAR(255)
);
CREATE TABLE BookAuthors (
BookAuthorId INT auto_increment primary key,
AuthorId INT,
BookId INT,
CONSTRAINT fk_BookAuthor_BookId FOREIGN KEY (BookId) REFERENCES Books(BookId),
CONSTRAINT fk_BookAuthor_AuthorId FOREIGN KEY (BookId) REFERENCES Authors(AuthorId),
UNIQUE (AuthorId, BookId)
);
As for your question about inserts. You don't need a trigger to set auto-incremented ids. You can use LAST_INSERT_ID() to fetch the most recent inserted value.

MYSQL update with inner join performance

The join is done on the primary key column of both these tables.
I have a doubt if I should fire a select query before the update or will this query be a good alternative?(in terms of performance)
order item table
order_item_id
order_id
quantity
unit_price
shipping_price
business_id
workflow_id
delivery_id
item_id
Orders table
billing_address_id
shipping_address_id
payment_mode
total_price
shipping_price
customer_id
order_id
Following is the query I fire from my Java service (using jdbc) :
UPDATE order_items t1
INNER
JOIN Orders t2
ON t2.order_id = t1.order_id
SET t1.workflow_id = ?
WHERE t1.order_item_id = ?
and t2.order_id = ?
and t2.customer_id = ?
and t1.delivery_id = ?
UPDATE : Adding show create table order_items
'CREATE TABLE `order_items` (
`order_item_id` int(20) NOT NULL AUTO_INCREMENT,
`quantity` int(10) unsigned NOT NULL,
`unit_price` int(10) unsigned NOT NULL,
`shipping_price` int(10) unsigned NOT NULL,
`pickup_date` datetime DEFAULT NULL,
`create_TS` datetime DEFAULT CURRENT_TIMESTAMP,
`update_TS` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
`business_id` int(10) NOT NULL,
`order_id` int(11) NOT NULL,
`item_id` int(10) unsigned NOT NULL,
`delivery_id` int(11) NOT NULL,
`workflow_id` int(11) DEFAULT NULL,
PRIMARY KEY (`order_item_id`),
KEY `fk_business_id` (`business_id`),
KEY `fk_order_id` (`order_id`),
KEY `fk_item_id` (`item_id`),
KEY `fk_delivery_id` (`delivery_id`),
CONSTRAINT `fk_business_id` FOREIGN KEY (`business_id`) REFERENCES `business` (`MID`),
CONSTRAINT `fk_delivery_id` FOREIGN KEY (`delivery_id`) REFERENCES `delivery_mode` (`delivery_id`),
CONSTRAINT `fk_item_id` FOREIGN KEY (`item_id`) REFERENCES `item_business` (`item_id`),
CONSTRAINT `fk_order_id` FOREIGN KEY (`order_id`) REFERENCES `Orders` (`order_id`)
)
Talking in theory
You should have the minimum set of data before you do the join, so the join will actually be performed only on the data you need, and that is the case even with the update that is internally a special select and "write this data on the select"
Talking in practice
One of the job of any dbms is to perform an agressive level of optimization using database algebra and other stuff, so most of the time the time you spend in optimizing your query is actually futile because your dbms will perform the same level of optimization
So what
I would try to have the table the slimmest as possible but without getting too crazy, I performed on a aws db2.micro machine an update query on like 100k rows and it took it like 4 seconds, so in my opinion, try and see if you're getting the real result you need.
tl;dr just try and see if the speed increase

Is it too much overhead if ResultSet could be used to getBinaryStream() but wasn't to?

In mysql I have table
CREATE TABLE `articles_attachments` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(200) NOT NULL ,
`size` BIGINT NOT NULL ,
`article_id` BIGINT NOT NULL ,
`contents` LONGBLOB NOT NULL ,
PRIMARY KEY (`id`) ,
UNIQUE INDEX `id_UNIQUE` (`id` ASC) ,
UNIQUE INDEX `unique_file` (`article_id` ASC, `name` ASC),
INDEX `fk_article` (`article_id` ASC)
) ENGINE = InnoDB DEFAULT CHARACTER SET = utf8 COLLATE = utf8_general_ci;
In application code I often need to just list attachments, but don't get their contents. So when I retrieve rows from that table I don't want resources wasted to serve "content" field.
Tricky part is that I use custom library which does "SELECT * FROM articles_attachments", so it queries to return all fields.
What I can easily do is to override RowMapper (comes from Spring Jdbc) and just don't map "content" field (do not call ResultSet.getBinaryStream).
Question: Will that help to avoid resource waste?... I don't want 100 stream to be opened when I retrieve 100 rows of attachments table.
I did couple tests and it turns out that answer is "Yes, you waste resources (specifically bandwidth) if resulting query contains fields of ~BLOB type even if you don't call ResultSet.getBinaryStream".
I did tested it with:
MySQL 5.6.20 & MirandaDB 10.0.13
mysql-connector-java-5.1.19-bin.jar
HikariCP-java6-2.2.5.jar

Categories