How to rollback data.sql script in integration tests?

How to rollback data.sql script in integration tests? - java

I came across this problem:
For out integration tests, we have an older database with already populated data. Some data don't have the right values (for example, for a boolean column, there is also null value). Now, when creating some integration tests, these ones are failing due to data not having correct values.
What I thought it would be a good idea was to have some scripts in the data.sql file that corrects the data (for example UPDATE my_table SET my_column = 0 WHERE my_column IS NULL) But the problem is that this update also commits to the database and thus the data is changed (now there are no more null values). Changing the database data in not an option, so what I'm trying to do is some sort of a rollback of the data.sql file at the end of each test / class. Can you please adivse?
The version is Spring boot 2.0.7.RELEASE, the depedency for testing is spring-boot-starter-test, the tests are annotated with #SpringBootTest and the database is Oracle.
application.yml:
spring:
datasource:
driver-class-name: oracle.jdbc.OracleDriver
url: ${URL}
username: ${USERNAME}
password: ${PASSWORD}
continue-on-error: true

You might be able to use Oracle's flashback table feature to rollback all DML changes that happened since data.sql was ran. I'm not sure how Spring Boot testing works, but I assume there is some way to call a pre- and post- actions that can call the below Oracle commands.
First, you will likely need to enable row movement on the relevant tables. This step is only needed once for each table and cannot be undone. (But the change is also pretty harmless. If I recall correctly, the only downside is a very tiny increase in metadata space.)
alter table my_table1 enable row movement;
alter table my_table2 enable row movement;
Right before the test begins, create a uniquely named restore point that is used to record the exact system change number to roll back to.
create restore point restore_point_1;
The run data.sql and all other testing changes.
When testing is done, run a FLASHBACK TABLE command that will restore all of the relevant tables back to their state as of the restore point.
flashback table my_table1, my_table2 to restore point restore_point_1;
As commenters have suggested, there are cleaner, more modern ways to instantly recreate data. But not everybody has containers or build scripts setup. I've seen this flashback approach used successfully by testers with only a small amount of effort.
There are some potential gotchas when using flashback. The restore point and changed data will only last so long. If your tests will go on for days, you may need to look into guaranteed restore points and adjusting your database's UNDO tablespace. Flashback table only works on DML, and will not work if a table is altered, and will not restore things like stored procedures or sequence values. If you need everything flashed back, then you might be able to use flashback database, but that command also has some complications.

Related

How to optimize one big insert with hibernate

For my website, I'm creating a book database. I have a catalog, with a root node, each node have subnodes, each subnode has documents, each document has versions, and each version is made of several paragraphs.
In order to create this database the fastest possible, I'm first creating the entire tree model, in memory, and then I call session.save(rootNode)
This single save will populate my entire database (at the end when I'm doing a mysqldump on the database it weights 1Go)
The save coasts a lot (more than an hour), and since the database grows with new books and new versions of existing books, it coasts more and more. I would like to optimize this save.
I've tried to increase the batch_size. But it changes nothing since it's a unique save. When I mysqldump a script, and I insert it back into mysql, the operation coast 2 minutes or less.
And when I'm doing a "htop" on the ubuntu machine, I can see the mysql is only using 2 or 3 % CPU. Which means that it's hibernate who's slow.
If someone could give me possible techniques that I could try, or possible leads, it would be great... I already know some of the reasons, why it takes time. If someone wants to discuss it with me, thanks for his help.
Here are some of my problems (I think): For exemple, I have self assigned ids for most of my entities. Because of that, hibernate is checking each time if the line exists before it saves it. I don't need this because, the batch I'm executing, is executed only one, when I create the databse from scratch. The best would be to tell hibernate to ignore the primaryKey rules (like mysqldump does) and reenabeling the key checking once the database has been created. It's just a one shot batch, to initialize my database.
Second problem would be again about the foreign keys. Hibernate inserts lines with null values, then, makes an update in order to make foreign keys work.
About using another technology : I would like to make this batch work with hibernate because after, all my website is working very well with hibernate, and if it's hibernate who creates the databse, I'm sure the naming rules, and every foreign keys will be well created.
Finally, it's a readonly database. (I have a user database, which is using innodb, where I do updates, and insert while my website is running, but the document database is readonly and mYisam)
Here is a exemple of what I'm doing
TreeNode rootNode = new TreeNode();
recursiveLoadSubNodes(rootNode); // This method creates my big tree, in memory only.
hibernateSession.beginTrasaction();
hibernateSession.save(rootNode); // during more than an hour, it saves 1Go of datas : hundreads of sub treeNodes, thousands of documents, tens of thousands paragraphs.
hibernateSession.getTransaction().commit();

It's a little hard to guess what could be the problem here but I could think of 3 things:
Increasing batch_size only might not help because - depending on your model - inserts might be interleaved (i.e. A B A B ...). You can allow Hibernate to reorder inserts and updates so that they can be batched (i.e. A A ... B B ...).Depending on your model this might not work because the inserts might not be batchable. The necessary properties would be hibernate.order_inserts and hibernate.order_updates and a blog post that describes the situation can be found here: https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/
If the entities don't already exist (which seems to be the case) then the problem might be the first level cache. This cache will cause Hibernate to get slower and slower because each time it wants to flush changes it will check all entries in the cache by iterating over them and calling equals() (or something similar). As you can see that will take longer with each new entity that's created.To Fix that you could either try to disable the first level cache (I'd have to look up whether that's possible for write operations and how this is done - or you do that :) ) or try to keep the cache small, e.g. by inserting the books yourself and evicting each book from the first level cache after the insert (you could also go deeper and do that on the document or paragraph level).
It might not actually be Hibernate (or at least not alone) but your DB as well. Note that restoring dumps often removes/disables constraint checks and indices along with other optimizations so comparing that with Hibernate isn't that useful. What you'd need to do is create a bunch of insert statements and then just execute those - ideally via a JDBC batch - on an empty database but with all constraints and indices enabled. That would provide a more accurate benchmark.
Assuming that comparison shows that the plain SQL insert isn't that much faster then you could decide to either keep what you have so far or refactor your batch insert to temporarily disable (or remove and re-create) constraints and indices.
Alternatively you could try not to use Hibernate at all or change your model - if that's possible given your requirements which I don't know. That means you could try to generate and execute the SQL queries yourself, use a NoSQL database or NoSQL storage in a SQL database that supports it - like Postgres.
We're doing something similar, i.e. we have Hibernate entities that contain some complex data which is stored in a JSONB column. Hibernate can read and write that column via a custom usertype but it can't filter (Postgres would support that but we didn't manage to enable the necessary syntax in Hibernate).

Detect when a row get deleted from PostgreSQL database

Recently my team have get a situation in which some records in our shared test database disappear with no clear reason. Because it's a shared database (which is utilized by so many teams), so that we can't track down if it's a programming mistake or someone just run a bad sql script.
So that I'm looking for a way to notify (at database level) when a row of a specific table A get deleted. I have looked at the Postgres TRIGGER, but it failed to give me the specific sql that cause the deletion.
Is there anyway I can log the sql statement which cause the deletion of some rows in table A?

You could use something like this.
It allows you to create a special triggers for PostgreSQL tables, that log all the changes to the chosen tables.
This triggers can log the query, that cause the change (via current_query()).
Using this as a base you can add more fields/information to log.

You would do this to the actual postgres config files:
http://www.postgresql.org/docs/9.0/static/runtime-config-logging.html
log_statement (enum)
Controls which SQL statements are logged. Valid values are none (off), ddl, mod, and all (all statements). ddl logs all data
definition statements, such as CREATE, ALTER, and DROP statements. mod
logs all ddl statements, plus data-modifying statements such as
INSERT, UPDATE, DELETE, TRUNCATE, and COPY FROM. PREPARE, EXECUTE, and
EXPLAIN ANALYZE statements are also logged if their contained command
is of an appropriate type. For clients using extended query protocol,
logging occurs when an Execute message is received, and values of the
Bind parameters are included (with any embedded single-quote marks
doubled).
The default is none. Only superusers can change this setting.
You want either ddl or all to be the selection. This is what you need to alter:
In your data/postgresql.conf file, change the log_statement setting to 'all'. Further the following may also need to be validated:
1) make sure you have turned on the log_destination variable
2) make sure you turn on the logging_collector
3) also make sure that pg_log actually exists relative to your data directory, and that the postgres user can write to it.
taken from here

Simple ways to note/log the last date/time when a database was updated, accessed, modified etc

I made Java/JDBC code which performs simple/basic operations on a database.
I want to add code which helps me to keep a track of when a particular database was accessed, updated, modified etc by this program.
I am thinking of creating another database inside my DBMS where these details or logs will be stored for each database involved.
Is this the best way to do it ? Are there any other ways (preferably simple) to do this ?
EDIT-
For now, I am using MySQL. But, I also want my code to work with at least
Oracle SQL and MS-SQL as well.

It is pretty standard to add a "last_modified" column to a table and then add an update trigger on the table to set it to the db current time. Then your apps don't need to worry about it. Also, a "create_time" is often used as well, populated by an insert trigger.
Update after comment:
Seems you are looking for audit logs. Some write apps where data manipulation only happens through stored procedures and not through inserts and updates. A fixed api. So you want to add an item to a table, you call the stored proc:
addItem(itemName, itemDescription)
Then the proc inserts into the item table and does what ever logging is necessary.
Another technique, if you are using some kind of framework for your jdbc access (say Spring) might be to intercept at that layer.

In almost all tables, I have the following columns:
CreatedBy
CreatedAt
These columns have default values of the current user and current time, respectively. They are populated when a row is added.
This solves only part of your problem. You can start adding triggers, but that gets complicated. Another method is to force modification access to the database through stored procedures, and then log the stored procedures. This has other advantages, in terms of controlling what users can do. But, you might want more flexibility.
A third possibility are auditing tools, that keep track of all queries being run on the database. I think most databases have a way of turning on internal auditing, although these are very specific to the database. There are also third party tools that allow you to see what has happened. Note, though, that these methods will affect performance if your database is doing high volume transactions.
For more information, you should revise your question to specify which database you are using or planning on using.

Populating a MySQL database with values

I have a locally installed MySQL server on my laptop, and I want to use the information in it for a unit-test, so I want to create a script to generate all the data automatically. I'm using MySQL Workbench which already generates the tables (from the model). Is it possible to use it, or another tool, to create an automatic script to populate it with data?
EDIT: I see now that I wasn't clear. I do have meaningful data for the unit test. When I said "generate all the data automatically", I meant the tool should take the meaningful data I have in my local DB today and create a script to generate the same data in other developers' DBs.

The most useful unit tests are those that reflect data you expect or have seen in practice. Pumping your schema full of random bits is not a substitute for carefully crafted test data. As #McWafflestix suggested mysqldump is a useful tool, but if you want something simplier, consider using LOAD DATA with INFILE, which populates a table from a CSV.
Some other things to think about:
Test with a database in a known state. Wrap all your database interaction unit tests in transactions that always roll back.
Use dbunit to achieve the same end.
Update
If you're in a Java environment, dbUnit is a good solution:
You can import and export data in an XML format through its APIs, which would solve the issue of going from your computer to other members on your team.
It's designed to restore database state. So it snapshots the database before tests are executed and then restores at then end. So tests are side effect free (i.e. they don't permanently change data).

You can populate with defaults (if defined)
CREATE TABLE #t(c1 int DEFAULT 0,c2 varchar(10) DEFAULT '-')
GO
--This insert 50 rows in table
INSERT INTO #t( c1, c2 )
DEFAULT VALUES
GO 50
SELECT * FROM #t
DROP TABLE #t

Java Audit table logging, MySQL equivalent of CONTEXT_INFO

I am looking for the MySQL equivalent of CONTEXT_INFO that is present in SQL Server. Or any other session variable like thing using which I can pass the username to the trigger.
I am currently working on logging table data for audit. I need to pass the username of the logged in user to the delete trigger.
Any ideas? We are deleting the rows from the table in a few cases and marking them as deleted in others.
Any alternate solutions are welcome. I thought of using AOP but it could prove problematic when deleting a cascade. I want to look into Hibernate Interceptors, not sure at this point if that works.
If I can find the MySQL equivalent of CONTEXT_INFO, my job is done and elegant as well.
Thanks,
Julia.

You should be able to get the current user with the USER() function. See the doc for details.
Ok, I didn't quite understand what you were asking. I think you may want to take a look at MySQL's support for connection-level user variables. Basically, in the connection that will run the UPDATE / INSERT / DELETE but before the actual query runs you need to run a set statement SET #user = 'my_user_id'. Then you should be able to use #user as the user in your trigger.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.