I have a table with a unique column, "token", enforced by a unique constraint in the database. In a certain case, I need to change an existing row to have the same token as another existing row, changing the 2nd row to have a new value.
So, say I have:
id;token
0;'aaa'
1;'bbb'
I want id 0 ('aaa') to instead have the token 'bbb'. So I need to change 'bbb' to be 'jfeisefjse', and then I can change 'aaa' to be 'bbb'. This can be done in postgres in a single tranasction.
I've tried to do the same thing in code: In one transaction I get the token from the existing row (row 1), I set it to be a random value, I update the other row (row 0) to have row 1's token, then I commit. However hibernate doesn't respect the order I've done the commits in. It seems to always run the update statement for row 0 first, and postgres complains that it's violating a foreign key constraint.
How can I make hibernate do this? Either force a certain order of update statements, or some other way to do this?
Note: Doing this in two transactions (one for scrambling row 1, then another transaction to update row 0) is not an option.
Session.flush() will force hibernate to write any pending SQL without committing the transaction. It's a bit clunky but sometimes you do need to get a little less 'ORM' and make things just work :)
Related
I have got a table with auto increment primary key. This table is meant to store millions of records and I don't need to delete anything for now. The problem is, when new rows are getting inserted, because of some error, the auto increment key is leaving some gaps in the auto increment ids.. For example, after 5, the next id is 8, leaving the gap of 6 and 7. Result of this is when I count the rows, it results 28000, but the max id is 58000. What can be the reason? I am not deleting anything. And how can I fix this issue.
P.S. I am using insert ignore while inserting records so that it doesn't give error when I try to insert duplicate entry in unique column.
This is by design and will always happen.
Why?
Let's take 2 overlapping transaction that are doing INSERTs
Transaction 1 does an INSERT, gets the value (let's say 42), does more work
Transaction 2 does an INSERT, gets the value 43, does more work
Then
Transaction 1 fails. Rolls back. 42 stays unused
Transaction 2 completes with 43
If consecutive values were guaranteed, every transaction would have to happen one after the other. Not very scalable.
Also see Do Inserted Records Always Receive Contiguous Identity Values (SQL Server but same principle applies)
You can create a trigger to handle the auto increment as:
CREATE DEFINER=`root`#`localhost` TRIGGER `mytable_before_insert` BEFORE INSERT ON `mytable` FOR EACH ROW
BEGIN
SET NEW.id = (SELECT IFNULL(MAX(id), 0) + 1 FROM mytable);;
END
This is a problem in the InnoDB, the storage engine of MySQL.
It really isn't a problem as when you check the docs on “AUTO_INCREMENT Handling in InnoDB” it basically says InnoDB uses a special table to do the auto increments at startup
And the query it uses is something like
SELECT MAX(ai_col) FROM t FOR UPDATE;
This improves concurrency without really having an affect on your data.
To not have this use MyISAM instead of InnoDB as storage engine
Perhaps (I haven't tested this) a solution is to set innodb_autoinc_lock_mode to 0.
According to http://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html this might make things a bit slower (if you perform inserts of multiple rows in a single query) but should remove gaps.
You can try insert like :
insert ignore into table select (select max(id)+1 from table), "value1", "value2" ;
This will try
insert new data with last unused id (not autoincrement)
if in unique fields duplicate entry found ignore it
else insert new data normally
( but this method not support to update fields if duplicate entry found )
I have an application that uses a h2 Database to store records of data. Each record is assigned a unique ID that I have used the auto-increment feature in h2 to do so. I want the lowest number to always be 1, or at least fill up the numbers that are not filled when a record has been deleted. What I mean is if there are 5 records numbered 1-5 and I delete the third record, I want the next record added to be numbered 3 instead of 6. How should I go about achieving this?
So far, I've tried
ALTER TABLE <table_name> ALTER COLUMN <id_column> RESTART WITH 1
Which doesn't have the intended effect that I wanted.
Edit: I'm an idiot, I wrote the SQL Query without actually executing it. The does indeed restart from 1, but throws an exception whenever the increment value is in a value that already exists. How should I fix this?
I'm trying to implement a counter with Java, Spring, Hibernate and Oracle SQL. Each record represents a count, by a given timestamp. Let's say each record is uniquely identified by the minute, and each record holds a count column. The service should expect to receive a ton of concurrent requests and my update a counter column for possibly the same record.
In my table, if the record does not exist, just insert the record in and set its count to 1. Otherwise, find the record by timestamp and increase its existing counter column by 1.
In order to ensure that we're maintain data consistency and integrity, I'm using pessimistic locking. For example, if 20 counts come in at the same time, and not necessarily by the same user, it's possible that we may override the record from a stale read of that record before updating. With locking, I'm ensuring that if 20 counts come in, the net effect on the database should represent the 20 count.
So locking is fine, but the problem is that if the record never did exist in the first place, and we have two or more concurrent requests coming in trying to update the not-yet-existant record, I've observed that the a duplicate record gets inserted because we cannot lock on a record that doesn't exist yet. How can we ensure that no duplicates get created in the table? Should it be controlled via Oracle? Or can I manage this via my app and Hibernate?
Thank you.
One was to avoid this sort of problem altogether would be to just generate the count at the time you actually query the data. Oracle has an analytic function ROW_NUMBER() which can assign a row number to each record in the result set of a query. As a rough example, consider the following query:
SELECT
ts,
ROW_NUMBER() OVER (ORDER BY ts) rn
FROM yourTable
The count you want would be in the rn column, representing the number of records appearing since the first entry in the table. Of course, you could further restrict the query.
This approach is robust to removing records, as the count would always start with 1. One drawback is that row number functionality is not supported by Hibernate. You would have to run this either as a native query or a stored proc.
I have a table like (id INTEGER, sometext VARCHAR(255), ....) with id as the primary key and a UNIQUE constraint on sometext. It gets used in a web server, where a request needs to find the id corresponding to a given sometext if it exists, otherwise a new row gets inserted.
This is the only operation on this table. There are no updates and no other operations on this table. Its sole purpose is to persistently number of encountered values of sometext. This means that I can't drop the id and use sometext as the PK.
I do the following:
First, I consult my own cache in order to avoid any DB access. Nearly always, this works and I'm done.
Otherwise, I use Hibernate Criteria to find the row by sometext. Usually, this works and again, I'm done.
Otherwise, I need to insert a new row.
This works fine, except when there are two overlapping requests with the same sometext. Then an ConstraintViolationException results. I'd need something like INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE (Mysql syntax) or MERGE (Firebird syntax).
I wonder what are the options?
AFAIK Hibernate merge works on PK only, so it's inappropriate. I guess, a native query might help or not, as it may or may not be committed when the second INSERT takes place.
Just let the database handle the concurrency. Start a secondary transaction purely for inserting the new row. if it fails with a ConstraintViolationException, just roll that transaction back and read the new row.
Not sure this scales well if the likelihood of a duplicate is high, a lot of extra work if some percent (depends on database) of transactions have to fail the insert and then reselect.
A secondary transaction minimizes the length of time the transaction to add the new text takes, assuming the database supports it correctly, it might be possible for the thread 1 transaction to cause the thread 2 select/insert to hang until the thread 1 transaction is committed or rolled back. Overall database design might also affect transaction throughput.
I don't necessarily question why sometext can't be a PK, wondering why you need to break it out at all. Of course, large volumes might substantially save space if sometext records are large, it almost seems like you're trying to emulate a lucene index to give you a complete list of text values.
I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.
If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194
If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.
The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.
MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1
Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.