Insert 2 million records into table from file - java

I have 2 million records in file. and i'm trying to insert all records into my table. i'm very complicated with which way i should use. LOAD DATA INFILE or hibernate begin transaction.
How to insert all data very fast?
File format is txt and its separated with line. need to insert only one row and others will generate auto.
sorry for bad English.

LOAD DATA INFILE is first choice of mysql database users. But if you want to validate the data then you need efforts for that. You can also use Data Integration tools for that. For ex- talend is open source Data integration tool. By clicks it load from file to database and so on. Its useful for large data set. You can also validate and cleaning your data.

I decided to use Load Data Infile but here is another problem. when i finish process i get this warning :
WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper - SQL Warning Code: 1062, SQLState: 23000
' for key 'PRIMARY'
its my query
String query = " LOAD DATA LOCAL INFILE :file " +
" IGNORE INTO TABLE Code" +
" (code) " +
" SET point = 0, created = NOW(), activated = 0; ";
and when i check my records from mysql there has no value on CODE column
+-----------+-------+------+-----------+---------------+---------------------+
| code | point | user | activated | activatedDate | created |
+-----------+-------+------+-----------+---------------+---------------------+
| 0 | NULL | 0 | NULL | 2015-10-01 16:35:02 |
| 0 | NULL | 0 | NULL | 2015-10-01 16:35:02 |
+-----------+-------+------+-----------+---------------+---------------------+
2 rows in set (0.00 sec)

Related

Updating table in Schema 1 based on values of few columns of a table in Schema 2

I will try to explain the problem with the best of my ability.
SO I have 2 tables in 2 different Schemas with few columns in both & I own only 1 of the schemas.
What I need to do is Update Table A in Schema 1 with a value from one of the fields from Table B from Schema 2.
I need to update only few rows in this table
The problem lies in when table A is populated the data in Table B is not ready with the data.
I am trying to this programmatically if possible.
Since, they are in different schemas & update size is comparatively smaller than the A's table size what should be the best way to do this?
SAMPLE DATA
**
Table A
orderNum | orderNumInternal | validity | averageSales |type
1000 | 5636 | 2020-06-30 00:00:00.000 | NULL |valid
Table B
orderNum | orderNumInternal | validity | averageSales
1000 | 5636 | 2020-06-30 00:00:00.000 | 65
**
Here I need to update Table A with the averageSales value from Tabel B whenever the type in Table A is valid & there is match in table B for the first 3 columns
Table A is created in an overnight whilst I don't have control over when the data would be available in Table B
Would this not simply be an UPDATE with a JOIN?
UPDATE A
SET averageSales = B.averageSales
FROM Schema1.TableA A
JOIN Schema2.TableB B ON A.orderNum = B.orderNum
WHERE A.averageSales IS NULL; --Unsure if this WHERE is needed

How to format data in column for WHERE clause just before executing SELECT?

I am using Microsoft SQL Server with already stored data.
In one of my tables I can find data like:
+--------+------------+
| id | value |
+--------+------------+
| 1 | 12-34 |
| 2 | 5678 |
| 3 | 1-23-4 |
+--------+------------+
I realized that the VALUE column was not properly formatted when inserted.
What I am trying to achieve is to get id by given value:
SELECT d.id FROM data d WHERE d.value = '1234';
Is there any way to format data in column just before SELECT clause?
Should I create new view and modify column in that view or maybe use complicated REGEX to get only digits (with LIKE comparator)?
P.S. I manage database in Jakarta EE project using Hibernate.
P.S.2. I am not able to modify stored data.
One method is to use replace() before the comparison:
WHERE REPLACE(d.value, '-', '') = '1234'

SQL Dynamic column handling in Table during data load

I need to design a Table in Oracle/SQL & data will be upload via Java/C# application via CSV with 50 fields (mapped to columns of Table). How to design Table/DB with below constraints during data importing from CSV
CSV may have new fields being added to existing 50 fields.
In that case instead of adding column to table manually & load data. How can we design table for smooth/automatic file handling with dynamic fields
EX:
CSV has S_ID, S_NAME, SUBJECT, MARK_VALUE fields in it
+------+---------+-------------+------------+
| S_ID | S_NAME | SUBJECT | MARK_VALUE |
+------+---------+-------------+------------+
| 1 | Stud | SUB_1 | 50 |
| 2 | Stud | SUB_2 | 60 |
| 3 | Stud | SUB_3 | 70 |
+------+---------+-------------+------------+
What if CSV has new field "RANK" (similar more fields) added to it & i need to store all new fields in Table.
Please suggest DB design for this consideration
So there are few approaches come to my mind, one of the way would be having metadata(Record) information in one table (column name, data type, any constraint) and have another free form table with large enough no. of columns which will hold the data. Use the metadata table while inserting data into this table to maintain data integrity and other stuff.

Get TEXT column value in psql

I have created simple entity with Hibernate with #Lob String field. Everything works fine in Java, however I am not able to check the values directly in DB with psql or pgAdmin.
Here is the definition from DB:
=> \d+ user_feedback
Table "public.user_feedback"
Column | Type | Modifiers | Storage | Stats target | Description
--------+--------+-----------+----------+--------------+-------------
id | bigint | not null | plain | |
body | text | | extended | |
Indexes:
"user_feedback_pkey" PRIMARY KEY, btree (id)
Has OIDs: no
And here is that I get from select:
=> select * from user_feedback;
id | body
----+-------
34 | 16512
35 | 16513
36 | 16514
(3 rows)
The actual "body" content is for all rows "normal" text, definitely not these numbers.
How to retrieve actual value of body column from psql?
This will store the content of LOB 16512 in file out.txt :
\lo_export 16512 out.txt
Although using #Lob is usually not recommended here (database backup issues ...). See store-strings-of-arbitrary-length-in-postgresql for alternatives.
Hibernate is storing the values as out-of-line objects in the pg_largeobject table, and storing the Object ID for the pg_largeobject entry in your table. See PostgreSQL manual - large objects.
It sounds like you expected inline byte array (bytea) storage instead. If so, you may want to map a byte[] field without a #Lob annotation, rather than a #Lob String. Note that this change will not be backward compatible - you'll have to export your data from the database then drop the table and re-create it with Hibernate's new definition.
The selection of how to map your data is made by Hibernate, not PostgreSQL.
See related:
proper hibernate annotation for byte[]
How to store image into postgres database using hibernate

Solr - Index MySQL Database

Is it possible index a complete database without mentioning the table names explicitly in the data-config.xml as new tables are added everyday and I cannot change the data-config.xml everyday to add new tables.
Haven table names based on the date smells like there is something wrong in your Design. But given this requirement in your question you can add Data to your solr server without telling you have a DB. You just have to make sure you hav a unique ID for the data record in you solr Server with whom you can identify the corresponding record in your DB, something like abcd_2011_03_19.uniqueid. You can post the data to solr in Java in solrj or just plain xml or json.
Example:
--------------
| User Input |
--------------
|post
V
-----------------------------------
| My Backend (generate unique id) |
-----------------------------------
|post(sql) |post (e.g. solrj)
V V
------ --------
| DB | | solr |
------ --------
My ascii skillz are mad :D

Categories