Lazy DB migration using Flyway

Lazy DB migration using Flyway - java

We are using Flyway for migration of our DB which and we literally have thousands of schemas (a.k.a Silos).
When we deploy a new build, the DB migration may take 10 minutes even if there is no migration needed for the software build.
I wonder if we could configure Flyway to do a lazy DB migration: check each the schema_version table for all the schemas, if latest DB version in the table is equal to that of the current software build, then there is no need to do anything; otherwise, if db version in software builder is newer, just do the necessary migration starting from latest version in the table.
If we can do the above, the migration time could shrink from 10 minutes to a few seconds.
Could anyone shed any light on how to do this? Thanks!

As stated by JB Nizet, Flyway checks the current version, and does not perform any update unless required.
However, if your goal is getting rid of this check because it takes time, you could maybe switch to Java-based migrations instead of SQL-based ones, because unlike the latter ones, Java migrations do not have checksums, and this is were you might reduce some execution time; still has to be tested though.
This is from the official doc in this matter:
Unlike SQL migrations, Java migrations by default do not have a
checksum and therefore do not participate in the change detection of
Flyway’s validation. This can be remedied by implementing the
MigrationChecksumProvider interface.
I do not have an idea if this will reduce execution time, but it may be worth testing at least.

Related

When to use schema.sql in spring boot app

I am trying to understand when to use schema.sql db creation technique and when to rely on Spring boot's creation based on my entity classes. How to decide?

Let's leave for the moment the schema.sql out.
ORM automatic schema creation (create or update or create-drop) is normally useful during the development of application. Even during release candidates and QA reviews it is still useful because the changes which happen from development team with issues arising could be live much faster.
When the application reaches a critical phase and is mature in production, then any changes happening automatically from ORM could be considered dangerous.
At this stage you would normally in big companies rollout in production database only some sql scripts that affect the database, which should be reviewed first and tested before rollout happens. Also a rollback sql might be necessary.
So normally ORM having effect in database schema is only during early stages of an application and not when it has matured enough in production.
Let's now come back to schema.sql. This file can be used just once to create the database from some sql commands. This also would not be expected to be run anytime the application executes. At least not in majority of applications.
I think a logical approach would be to use the ORM during intial stages of development and QA and then when you are about to reach a mature phase, you inspect what type of database the ORM has created and then you make a manual review just to be sure of everything and any optimizations which would make sense and at this stage you can create with the already existing schema of ORM your own permanent schema.sql.
The obvious reason for the above is that with schema.sql you have 100% control of how the database is created. Using ORM you depend on the ORM provider to build this for you. This ORM provider might provide some new library that affects how the previous was used and many other things. The result is that with ORM you don't have 100% control on the database to be created.

Changing flyway migration files after setting new baseline

I have flyway integrated in one of my projects. I have many migrations and it takes a long time to migrate a new empty database, mainly because there are seed data added along the way as well. Now I want to change that. Unfortunately those migrations were already pushed to production (and yes, at some point the seed data was migrated there as well).
My idea was to set a baseline for the current version of the production system and clean up the old migrations afterwards: Squash the schema-migrations and move the seed- and test-data to a new location, that is not deployed to production.
Now my questions are:
How can I set a baseline in my production database, without affecting all others? Call flyway baseline ... on the database directly? Or can I use any kind of special migration file? Maybe write the baseline line directly to the schema_version table of the database? How would such a query look like?
My newest migration is V4_6_3__.... So my baseline needs to be on V5__...? Or is V4__... enough and all migrations with the same major version are included?
When the baseline is set, is it possible/save to add, edit, and remove migrations, older than the baseline, without breaking my production database on the next migration task?
Sorry for the basic questions, but it seems to me, that the flyway documentation is no help at all...
Thanks in advance!

Sry for the late answer. I have done a pretty similar thing to the one #markdsievers suggested:
I assured that the production environment was at least on version X (flyway.setTarget(X)). Then I changed to a new schema version table (flyway.setTable('temporary_schema_version')) and ran a single migration, that deleted the old table. Afterwards I changed the schema version table back to the original one, set a baseline to version Y > X and ran another migration that deleted the temporary table.
Now I can edit/squash/delete all migrations with a version lower than Y without crashing my production-deployment.

I went through a very similar scenario, and even wrote our own in house tool called the "Rebaser" which does most of what you want. Our main motivation was to upgrade from Flyway 3.x to 4.3 but we also had a large history which needed to be squashed. The gist of it is this:
Squash all your migrations into however many makes sense. I typically have a V2__base_ddl.sql and V3__base_data.sql (Flyway can use the first couple of version numbers for schema creation etc). This is the manual part.
Your rebase tool detects the old schema_version table and deletes it.
Your rebase tool then runs init + migrate with your new baseline version set.
Rebase tool leaves behind a footprint (a rebase key in a custom table) that indicates it has been done.
For my integration test builds (that spin up a vanilla database and migrate forwards to latest) I add an extra folder of test data SQL scripts using Flyways locations argument, thus ensuring I have test data for integration tests but not in any non-test environments.
Our Rebaser is just a thin wrapper around the Flyway Java API, adding in the prestep to do the rebase if configured and then delegate to Flyway.
Flyway doesn't have a notion of rebasing but it's something we have found is necessary to do as your history gets large and contains obsolete data / DDL. So far this system has worked flawlessly.

java library to maintain database structure

My application is always developing, so occasionally - when the version upgrades - some tables need to be created/altered/deleted, some data modified, etc. Generally some sql code needs to be executed.
Is there a Java library that can be used to keep my database structure up to date (by analyzing something like "db structure version" information and executing custom sql to code to update from one version to another)?
Also it would be great to have some basic actions (like add/remove column) ready to use with minimal configuration, ie name/type and no sql code.

Try DBDeploy. Although I haven't used it in the past, it sounds like this project would help in your case. DBDeploy is a database refactoring manager that:
"Automates the process of establishing
which database refactorings need to be
run against a specific database in
order to migrate it to a particular
build."
It is known to integrate with both Ant and Maven.

Try Liquibase.
Liquibase is an open source (Apache
2.0 Licensed), database-independent library for tracking, managing and
applying database changes. It is built
on a simple premise: All database
changes are stored in a human readable
yet trackable form and checked into
source control.
Supported features:
Extensibility
Merging changes from multiple developers
Code branches
Multiple Databases
Managing production data as well as various test datasets
Cluster-safe database upgrades
Automated updates or generation of SQL scripts that can be approved and
applied by a DBA
Update rollbacks
Database ”diff“s
Generating starting change logs from existing databases
Generating database change documentation

We use a piece of software called Liquibase for this. It's very flexible and you can set it up pretty much however you want it. We have it integrated with Maven so our database is always up to date.

You can also check Flyway (400 questions tagged on SOW) or mybatis (1049 questions tagged). To add to the comparison the other options mentioned: Liquibase (663 questions tagged) and DBDeploy (24 questions tagged).
Another resource that you can find useful is the feature comparison in the Flyway website (There are other related projects mentioned there).

You should take a look into OR Mapping libraries, e.g. Hibernate

Most ORM mappers have logic to do schema upgrades for you, I have successfully used Hibernate which gets at least the basic stuff right automatically.

How to roll back migrations using Flyway?

MyBatis migrations splits each SQL file into two sections:
One for migrating forward one version
One for migrating back one version
How does one roll back versions using Flyway?

While Flyway supports rollbacks (as a commercial-only feature) its use is discouraged:
https://flywaydb.org/documentation/command/undo
While the idea of undo migrations is nice, unfortunately it sometimes breaks down in practice. As soon as you have destructive changes (drop, delete, truncate, …), you start getting into trouble. And even if you don’t, you end up creating home-made alternatives for restoring backups, which need to be properly tested as well.
Undo migrations assume the whole migration succeeded and should now be undone. This does not help with failed versioned migrations on databases without DDL transactions. Why? A migration can fail at any point. If you have 10 statements, it is possible for the 1st, the 5th, the 7th or the 10th to fail. There is simply no way to know in advance. In contrast, undo migrations are written to undo an entire versioned migration and will not help under such conditions.
An alternative approach which we find preferable is to maintain backwards compatibility between the DB and all versions of the code currently deployed in production. This way a failed migration is not a disaster. The old version of the application is still compatible with the DB, so you can simply roll back the application code, investigate, and take corrective measures.
This should be complemented with a proper, well tested, backup and restore strategy. It is independent of the database structure, and once it is tested and proven to work, no migration script can break it. For optimal performance, and if your infrastructure supports this, we recommend using the snapshot technology of your underlying storage solution. Especially for larger data volumes, this can be several orders of magnitude faster than traditional backups and restores.

This is supported since Flyway 5.0. Sadly it's a commercial only feature though.
https://flywaydb.org/documentation/command/undo

I assume you need a rollback strategy, when e.g. a partner fails at production stage and his deployment is essential for your release.
You could name your flyway SQL scripts like these:
V< YourReleaseNumber >.000_< description >.sql
Now you can leave
V< YourReleaseNumber >.998_rollback.sql for rollback
and make V< YourReleaseNumber >.999_reenroll.sql to reenroll.
In your CI/CD Environment you need 2 more Jobs (manually triggered) after your deployment job. One for rollback, which runs the rollback process including flyway migrate. Other for reenroll.
You just have to care for the target configuration in flyway.
For your deployment job your target should be < YourReleaseNumber >.997
For your rollback job < YourReleaseNumber >.998
When you start a new release, make sure you won't run the rollback/reenroll script of the old release.
As said before a well tested, backup and restore strategy is the recommended solution.
(sry for bad english)

Java equivalent for database schema changes like South for Django?

I've been working on a Django project using South to track and manage database schema changes. I'm starting a new Java project using Google Web Toolkit and wonder if there is an equivalent tool. For those who don't know, here's what South does:
Automatically recognize changes to my Python database models (add/delete columns, tables etc.)
Automatically create SQL statements to apply those changes to my database
Track the applied schema migrations and apply them in order
Allow data migrations using Python code. For example, splitting a name field into a first-name and last-name field using the Python split() function
I haven't decided on my Java ORM yet, but Hibernate looks like the most popular. For me, the ability to easily make database schema changes will be an important factor.

Wow, South sounds pretty awesome! I'm not sure of anything out-of-the-box that will help you nearly as much as that does, however if you choose Hibernate as your ORM solution you can build your own incremental data migration suite without a lot of trouble.
Here was the approach that I used in my own project, it worked fairly well for me for over a couple of years and several updates/schema changes:
Maintain a schema_version table in the database that simply defines a number that represents the version of your database schema. This table can be handled outside of the scope of Hibernate if you wish.
Maintain the "current" version number for your schema inside your code.
When the version number in code is newer than what's the database, you can use Hibernate's SchemaUpdate utility which will detect any schema additions (NOTE, just additions) such as new tables, columns, and constraints.
Finally I maintained a "script" if you will of migration steps that were more than just schema changes that were identified by which schema version number they were required for. For instance new columns needed default values applied or something of that nature.
This may sounds like a lot of work, especially when coming from an environment that took care of a lot of it for you, but you can get a setup like this rolling pretty quickly with Hibernate and it is pretty easy to add onto as you go on. I never ended up making any changes to my incremental update framework over that time except to add new migration tasks.
Hopefully someone will come along with a good answer for a more "hands-off" approach, but I thought I'd share an approach that worked pretty well for me.
Good luck to you!

as I'm looking for the same heres what I've achieved so far.
We first used dbdeploy. It manages the most stuff for you but you will have to write all the transition scripts by yourself! That means every change you make has to be in its own script which you will have to write from scratch. Not very handy, but works very reliable.
The second thing I encountered is liquibase. It stores the configuration in one single xml file. Not very intuitive to read, but managable. Plus there is an Intellij Idea plugin for it. At the moment of writing it still has some minor issues, but as the author assured me, they will be fixed soon.
The perfect solution would be to get south working with your java environment. That really would be a tool to marry :D

Maybe try flyaway. Seems like a good alternative.

I've been thinking about using django-jython just for db migrations in our legacy Java application. The latest Jython version is 2.5.4rc1, but I think I can mitigate the risk by just using it for South migrations.
Especially since I can use inspectdb to generate the models for me. And then replace parts of the Java with Python "seamlessly".

If you're using hibernate, then checkout liquibase
http://www.liquibase.org/databases.html
It's been around for 10 years so it's pretty solid. It may support other ORM's, just have a dig around on their website. Checkout the liquibase+hibernate extension here:
https://github.com/liquibase/liquibase-hibernate

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.