Sorry if my question is not specific or if it has been answered before. I tried looking for it and for a better way to ask but this is the most accurate way.
I have developed a program in Java in which I insert a new row into my database in the following way:
INSERT INTO table_name VALUES (?,?,?)
The thing is that I have this query in many parts of the program, and now I decided to add a fourth column to my table. Do I have to update EVERY SINGLE query with a new question mark in the program? If I dont, it crashes.
What is the best way to proceed in these cases?
YES.
you need to add extra ? (parameter placeholder) because you are using implicit INSERT statement. That means that you didn't specify the column names of the table to which the values will be inserted.
INSERT INTO table_name VALUES (?,?,?)
// the server assumes that you are inserting values for all
// columns in your table
// if you fail to add value on one column. an exception will be thrown
The next time you create an INSERT statement, make sure that you specify the column names on it so when you alter the table by adding extra column, you won't update all your place holders.
INSERT INTO table_name (Col1, col2, col3) VALUES (?,?,?)
// the server knows that you are inserting values for a specific column
Do I have to update EVERY SINGLE query with a new question mark in the program?
Probably. What you should do, while you're updating every single one of those queries, is to encapsulate them into an object, probably using a Data Source pattern such as a Table Data Gateway or a Row Data Gateway. That way you Don't Repeat Yourself and the next time you update the table, you only have one place to update the query.
Because of the syntax you've used, you might run some issues. I've referring to the lack of column names. Your INSERT queries will start failing as soon as you change your table structure.
If you had used the following syntax:
INSERT INTO table_name (C1, C2, C3) VALUES (?,?,?)
assuming your new column has a proper default value, then it would've work fine.
Related
So i a bit lost and don t really know how to hang up this one...
Consider that i have a 2 DB table in Talend, let say firstly
A table invoices_only which has as fields, the invoiceNummer and the authors like this
Then, a table invoices_table with the field (invoiceNummer, article, quantity and price) and for one invoice, I can have many articles, for example
and through a tmap want to obtain a table invoice_table_result, with new columns, one for the article position, an one other for the total price. for the position i know that i can use something like the Numeric.sequence("s1",1,1) function, but don t know how to restart my counter when a new invoices nummer is found, and of course for the total price it is just a basic multiplication
so my result should be some thing like this
Here is a draft of my talend job, i m doing a lookup on the invoicenummer between the table invoice_only and invoices
Any Advices? thanks.
A trick I use is to do the sequence like this:
Numeric.sequence("s" + row.InvoiceNummer, 1, 1)
This way, the sequence gets incremented while you're still on the same InvoiceNummer, and a new one is started whenever a new InvoiceNummer is found.
There are two ways to achieve it,
tJavaFlex
Sql
tJavaFlex
You can compare current data with the previous data and reset the sequence value using below function,
if () {
Numeric.resetSequence(seqName, startValue);
}
Sql
Once data is loaded into the tables, create a post job and use an update query to update the records. You have to select the records and take the rank of the values. On top of the select you have to perform the update.
select invoicenumber, row_number() over(partition by invoicenumber, order by invoicenumber) from table name where -- conditions if any.
Update statements vary with respect to the database, please provide which database are you using, so that can provide the update query.
I would recommend you to achieve this through Sql
I am implementing application specific data import feature from one database to another.
I have a CSV file containing say 10000 rows. These rows need to be inserted/updated into database.
I am using mysql database and inserting from Java.
There might be the case, where couple of rows may present in database that means those need to be updated. If not present in database, those need to be inserted.
One possible solution is that, I can read one by one line, check the entry in database and build insert/update queries accordingly. But this process may take much time to create update/insert queries and execute them in database. Some times my CSV file may have millions of records.
Is there any other faster way to achieve this feature?
I don't know how you determine "is already present", but if it's any kind of database level constraint (probably on a primary key?) you can make use of the REPLACE INTO statement, which will create a record unless it gets an error in which case it'll update the record that prevents it from being inserted.
It works just like INSERT basically:
REPLACE INTO table ( id, field1, field2 )
VALUES ( 1, 'value1', 'value'2 )
If a row with ID 1 exists, it's updated with these values; otherwise it's created.
Given that you're using MySQL you could use the INSERT ... ON DUPLICATE KEY UPDATE ... statement, which functions similarly to the SQL standard MERGE statement. MYSQL doc reference here and general Wikipedia reference to SQL MERGE functionality here. The statement would look something like
INSERT INTO MY_TABLE
(PRIMARY_KEY_COL, COL2, COL3, COL4)
VALUES
(1, 2, 3, 4)
ON DUPLICATE KEY
UPDATE COL2 = 2,
COL3 = 3,
COL4 = 4
In this example I'm assuming that PRIMARY_KEY_COL is a primary or unique key on MY_TABLE. If the INSERT statement would fail due to a duplicate value on the primary or unique key then the UPDATE clause is executed. Also note (on the MySQL doc page) that there are some gotcha's associated with auto-increment columns on an InnoDB table.
Share and enjoy.
Do you need to do this often or just once in a while?
I need to load csv files from time to time to a database for analysis and I created a SSIS-Datasolution with a Data Flow task which loads the csv-File into a table on the SQL Server.
For more infos look at this blog
http://blog.sqlauthority.com/2011/05/12/sql-server-import-csv-file-into-database-table-using-ssis/
Add a stored procedure in SQL for inserting. In the stored procedure use a try catch block to do the insert. If the insert fails do an update. Then you can simply call this method from your program.
Alternatively:
UPDATE Table1 SET (...) WHERE Column1='SomeValue'
IF ##ROWCOUNT=0
INSERT INTO Table1 VALUES (...)
I have a database with three tables stud_first, stud_second and stud_audit both stud_first and stud_second have the same column names which is
name,
stud-id,
age,
class
number_of-course_taken
I want stud_second to always take any data inserted in stud_first and at the same time stud_audit should keep record of the data copied i.e a log of the name of students and the time they were copied or deleted from stud_first to stud_second. The columns in stud_audit should look like this
name,
time copied
I want to do it mysql alone or combine it with java
Not a complete answer, but this may be enough to get you started in the right direction...
DELIMTER $$
CREATE TRIGGER stud_first_ar
AFTER INSERT ON stud_first
FOR EACH ROW
BEGIN
INSERT INTO stud_second
(`name`, `stud-id`, `age`,`class`,`number_of-course_taken`)
VALUES
(NEW.`name`,NEW.`stud-id`,NEW.`age`,NEW.`class`,NEW.`number_of-course_taken`);
INSERT INTO stud_audit (`name`, `time copied`)
VALUES (NEW.`name`,UTC_TIMESTAMP());
END$$
You could use NOW() in place of UTC_TIMESTAMP(), if you aren't concerned with timezone issues.
The choice of column names containing dashes and spaces is non-standard... it's allowed, but it's usually easier when you avoid doing that.
I would actually have just one audit table, rather than two separate ones. It could be copy of the table with additional columns for "action" (identifying whether the change was due to an INSERT, UPDATE or DELETE), "actor" (identifying the process or user that caused the action, and a UTC timestamp.
You may want to consider "audit" triggers for UPDATE and DELETE actions as well, where you have the special "OLD." record available.
Again, not a complete answer, but this may be enough to get you started in the right direction.
When using spring's JdbcTemplate, I am using the row mapper to map results coming back.
The benefit with this is that there are less places where I have to change my code if I change my mysql schema etc.
Are there any other tips on how to minimize changes in code when adding/removing columns in mysql?
If you are retrieving columns by name (SELECT col1, col2, col3) you will be immune to adding and rearranging of columns. Never use SELECT *.
However if you are removing columns, you have no choice. In fact, how was this suppose to work? Previously you fetched e.g. price column and used it in your business layer. Now the column does not exist - how to handle this?
But adding columns is safe, unless new columns are non-nullable. In this case you will have a problem when adding new records, since VALUES statement won't include new columns. Optional columns are fine.
One tip is to not do SELECT *, select on specific columns so in case you add stuff you don't break your code :)
I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.
If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194
If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.
The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.
MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1
Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.