Dropwizard: Using migrations for a cloud based application

Dropwizard: Using migrations for a cloud based application - java

We are having an application i.e. exposing RESTful web services and we are targeting this application to be deployed in cloud. We need to one time setup a database schema for the application on some database instance in the cloud.
Can someone tell me if it is a good approach to use migrations with liquibase for the one time database schema setup. We will be using alter scripts in case some DDL modification needed in future releases.

Someone stop me if I'm wrong, but the fact you application will be deplyed on the cloud only mean it will be on a virtual server hosted by an extern compagnie, wich in the case of your question don't chance anything.
So the question is "is the database versionning system Liquibase on a database with an aimed stable shema worth it".
In absolute it could be considered overkill, and a lot of big companies still manage database schema evolution with bare sql scripts. You could simply export the final built script of your developpement database and go with it.
But since you know Liquibase, the overhead is pretty cheap, and the comfort of using if you happen to have to modify tour shema later is important.
So yes, I think it's a pretty good pratice (safer than hand applying script under the stress of a production server problem) which cost one or two hours(given you know how to use the tool) and can save dozens when having to handle hotfixing of a production database.

I assume that you will be deploying this application in more than one place - not just production in the cloud, but also development servers, test servers, staging, etc. If that is true, then it seems to me that you definitely want to have a process around how you make changes to the database schema.
For me, over the course of my 20+ years in software development, I have seen several things that I use now that were not in common use when I started but that have now become 'baseline' practices on any project I work on. Yeah, I used to work without source control, but that is an absolute must now. I used to write software without tests, but not any more. I used to work without continuous integration, but that is yet another practice that I consider a must-have. The most recent addition to my must-have list is some sort of automated database migration process.
Also, since Liquibase is built-in to Dropwizard, I don't see any reason not to use it.

Related

Oracle database (online) replication

I am a Java guy, I can work with Oracle Database, I know PLSQL, SQL. But I am not good at managing database servers. I think it is a completely different area.
My question is related to database replication. I googled it, found millions of answers but I am still confused.
I could see many times in my professional carrier that developers create complete (complicated) applications to keep sync the source database schema to a target one. It takes time to develop sync apps and very hard to maintain them, especially in case of any data structure modification for example in tables.
I could see that apps built with JPA, JDBC, Spring, myBatis, and PLSQL as well. Usually, they sync DBs during the night, scheduled by Cron, Quartz, Spring, etc. During the sync process usually, the source DB is only available for querying data, not for inserting and DB constraints and triggers are disabled.
These kinds of custom applications always scare me. I do not believe that there is no general, easy, and official way to keep sync two databases without developing a new application.
Now, I got a similar task and honestly, I would like to write zero lines of code related to this task. I believe that there are recommended and existing solutions, cover this topic offered by the database vendors.
That would be great if you can push me in the right direction. I feel that writing another new DB sync application is not the right way.
I need to focus on Oracle Database sync, but I would be happy to know a general, database vendor-independent way.

There are many ways to perform replication in a Oracle Database. Oracle has two replication techniques in the database "Advanced Replication" and "GoldenGate". GoldenGate us the new perferred method of replication which uses the redo logs files from the database. Both methods are geared for a Oracle DBA.
Often application developers will create a "interface" that will move data from one database to other. A interface is a program ( pl/sql, bash, c, etc ) that runs on a cron (database or system) that wakes on a event to move data. Interfaces are useful when data is needed to be process during replication.

Two Spring apps each use jpa to control a single database

Two Spring apps each use jpa to control a single database.
Each Spring app must use a single database.
Will spring.jpa.hibernate.ddl-auto = update work properly?

In my opinion, having 2 applications using directly the same database is a poor design.
Here is a quote from this sofware engineering answer
The more applications use the same database, the more likely it is
that you hit performance bottlenecks and that you can't easily scale
the load as desired. SQL Databases don't really scale. You can buy
bigger machines but they do not scale well in clusters!
Maintenance and development costs can increase: Development is harder
if an application needs to use database structures which aren't suited
for the task at hand but have to be used as they are already present.
It's also likely that adjustments of one application will have side
effects on other applications ("why is there such an unecessary
trigger??!"/"We don't need that data anymore!"). It's already hard
with one database for a single application, when the developers
don't/can't know all the use-cases.
Administration becomes harder: Which object belongs to which
application? Chaos rising. Where do I have to look for my data? Which
user is allowed to interact with which objects? What can I grant whom?
Upgrading: You'll need a version that is the lowest common denominator
for all applications using it. That means that certain applications
won't be able to use powerful features. You'll have to stick with
older versions. It also increases development costs a bit.
Concurrency: Can you really be sure that there're no chronological
dependencies between processes? What if one application modifies data
that is outdated or should've been altered by another application
first? What about different applications working on the same tables
concurrently?
What I would suggest to you is to create a service layer which will be responsible for dealing with database access. This service can then be accessed by differents ways (a REST webservice might be an option).

#Vinod Bokare comment is correct, you must create jar of POJO's and use in both projects,
and #Heejeong Jang, It will be okay if each of our Spring apps has different table areas for insert, update, and delete.

Usability: How do I provide & easily deploy a (preferably node.js + MongoDB based) server backend for my users?

I'm currently planing an application (brainstorming, more or less), designed to be used in small organizations. The app will require syncronization w/ a backend-server, e.g. for user management and some advanced, centralized functionality. This server has to be hosted locally and should be able to run on Linux, Mac and Windows. I haven't decided how I'm going to realize this, mainly I simply don't know which would be the smartest approach.
Technically speaking, a very interessting approach seemed to be node.js + mongoose, connecting to a local MongoDB. But this is where I'm struggeling: How do I ensure that it's easy and convienient for a organization's IT to set this up?
Installing node.js + MongoDB is tedious work and far from standartized and easy. I don't have the ressources to provide a detailled walthrough for every major OS and configuration or do take over the setup myself. Ideally, the local administrator should run some sort of setup on the machine used as server (a "regular" PC running 24/7 should suffice) and have the system up and running, similar to the way some games provide executables for hosting small game-servers for a couple friends (Minecraft, for instance).
I also thought about Java EE, though I haven't dug into an details here. I'm unsure about whether this is really an option.
Many people suggest to outsource the backend (BaaS), e.g. to parse.com or similar services. This is not an option, since it's mandatory that the backend will be hosted locally.
I'm sorry if this question is too unspecific, but unfortunately, I really don't know where to start.

I can give you advice both from the sysadmin's side and the developers side.
Sysadmin
Setting up node.js is not a big task. Setting up a MongoDB correctly is. But that is not your business as an application vendor, especially not when you are a one man show FOSS project, as I assume. It is an administrators task to set up a database, so let him do it. Just tell them what you need, maybe point out security concerns and any capable sysadmin will do his job and set up the environment.
There are some things you underestimate, however.
Applications, especially useful ones, tend to get used. MongoDB has many benefits, but being polite about resources isn't exactly one of them. So running on a surplus PC may work in a software development company or visual effects company, where every workstation has big mem, but in an accountant company your application will lack resources quite fast. Do not make promises like "will run on your surplus desktop" until you are absolutely, positively sure about it because you did extensive load tests to make sure you are right. Any sensible sysadmin will monitor the application anyway and scale resources up when necessary. But when you make such promises and you break them, you loose the single most important factor for software: the users trust. Once you loose it, it is very hard to get it back.
Developer
You really have to decide whether MongoDB is the right tool for the job. As soon as you have relations between your documents, in which the change of of document has to be reflected in others, you have to be really careful. Ask yourself if your decision is based on a rational, educated basis. I have seen some projects been implemented with NoSQL databases which would have been way better of with a relational database, just because NoSQL is some sort of everybody's darling.
It is a FAR way from node.js to Java EE. The concepts of Java EE are not necessarily easy to grasp, especially if you have little experience in application development in general and Java.
The Problem
Without knowing anything about the application, it is very hard to make a suggestion or give you advice. Why exactly has the mongodb to be local? Can't it be done with a VPC? Is it a webapp, desktop app or server app? Can the source ode be disclosed or not? How many concurrent users per installation can be expected? Do you want a modular or monolithic app? What are your communication needs? What is your experience in programming languages? It is all about what you want to accomplish and which services you want to provide with the app.

Simple and to the point: Chef (chef solo for vagrant) + Vagrant.
Vagrant provides a uniform environment that can be as closed to production as you want and Chef provides provisioning for those environments.
This repository is very close to what you want: https://github.com/TryGhost/Ghost-Vagrant
There are hundreds of thousands of chef recipes to install and configure pretty much anything in the market.

Designing an app to be deployed with AND without Google App Engine

After I finish developing an app using Google App Engine, how easy will it be to distribute if I ever need to do so without App Engine? The only thing I've thought of is that GAE has some proprietary API for using the datastore. So, if I need to deliver my app as a .war file (for example) which would not be deployed with App Engine, all I would need to do is first refactor any code which is getting/storing data, before building the .war, right?
I don't know what the standard way is to deliver a finished web app product - I've only ever used GAE, but I'm starting a project now for which the requirements for final deliverables are unsure at this time.
So I'm wondering, if I develop for GAE, how easy will it be to convert?
Also, is there anything I can do or consider while writing for GAE to optimize the project for whatever packaging options I may have in the end?

So long as your app does not have any elements that are dependent of Google App engines you should be able to deploy anywhere so long as the location can support a Tomcat or GlassFish server. Sometimes this requires that you manually install the server so you must read up on that. There are lots of youtubes that help on this subject just try to break down your issue to the lowest steps possible.
I also suggest using a framework like spring and hibernate to help lessen the headaches. They will take a while to understand but are worth the headache if you want to be programming for the rest of your life.

I disagree with Pbrain19.
The GAE datastore is quite different from SQL, and has its own interesting eventually consistent behavior for transactions. That means for anything that requires strong consistency or transactions, you're going to have to structure your data with appropriate ancestors. This is going to have a pretty big impact on your code.
You're also going to need to denormalize your data structures (compared to SQL) to minimize datastore costs and improve performance. There's also many queries you can do in SQL that you can't do in GAE, you'd have to structure your app in ways to work around this.
Once you do any of this, you'll probably have a significant chunk of the app to rebuild.
You also wouldn't want to use Spring because it'll make your instance start up time pretty painful.
So unless it's a very simple hello world app, the refactoring will not be trivial - particularly once you begin using ancestors in any of your data modelling.
I recommend not trying to design your app to be portable if you're using the GAE datastore.
You'll have better luck making a portable app if you're using Cloud SQL.

Am I crazy? Switching an established product from HSQLDB to Apache Derby

I have an established software product that uses HSQLDB as its internal settings database. Customer projects are stored in this database. Over the years, HSQLDB has served us reasonably well, but it has some stability/corruption issues that we've had to code circles around, and even then, we can't seem to protect ourselves from them completely.
I'm considering changing internal databases. Doing this would be fairly painful from a development perspective, but corrupted databases (and lost data) are not fun to explain to customers.
So my question is: Does anyone have enough experience to weigh in on the long-term stability of Apache Derby? I found a post via Google complaining that Derby was unstable, but it was from 2006 so I'd entertain the idea that it has been improved in the last 4 years. Or, is there another pure Java embedded (in-process) database that I could use (commercial or open-source). Performance isn't very important to me. Stability is king. Data integrity across power loss, good BLOB support, and hot-backups are all a must.
Please don't suggest something that isn't a SQL-based relational database. I'm trying to retrofit an existing product, not start from scratch, thanks.

For each database engine there is a certain risk of corruption. I am the main author of the H2 database, and I also got reports about broken databases. Testing can reduce the probability of bugs, but unfortunately it's almost impossible to guarantee some software is 'bug free'.
As for the three Java database HSQLDB, Apache Derby, and H2, I can't really say which one is the most stable. I can only speak about H2. I think for most operations, H2 is now stable. There are many test cases that specially test for databases getting corrupt. This includes automated tests on power loss (using a christmas light timer). With power failure tests I found out stability also depends on the file system: sometimes I got 'CRC error' messages meaning the operating system can't read the file (it was Windows). In that case, there is not much you can do.
For mission critical data, in any case I wouldn't rely on the software being stable. It's very important to create backups regularly, and test them. Some databases have multiple way to create backups. H2 for example has an online backup feature, and a feature to write a SQL script file. An alternative is to use replication or clustering. H2 supports a simple cluster mode, I believe Derby supports replication.

I ran Derby 24/7 as the internal database supporting a build automation and test management system for 4 years. It was used by a worldwide team, and never crashed, lost data, or corrupted my records. The only reason we stopped using it is because our company was bought by another and a higher-level decision was handed down. Derby is solid, reliable, and well worth your consideration.

This search shows 215 posts in HSQLDB Users mailing list containing the string "corrupt".
http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.java.hsqldb.user&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.java.hsqldb.user---A
This search shows 264 posts in Derby Users mailing list containing the same string.
http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.apache.db.derby.user&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.apache.db.derby.user---A
This one shows 1003 posts in Derby Dev mailing list with the same string
http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.apache.db.derby.devel&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.apache.db.derby.devel---A
A look at some of the posts shows possible or real cases of database corruption happen despite all the best efforts of database developers.
HSQLDB has had its own share of database corruption issues but has improved over the years. In the latest versions precautions and fixes have been introduced to prevent all the issues that were reported in the last few years.
The new lob storage feature however, turned out to have a logic bug that results in the lobs being "forgotten" after an update. This is being fixed right now, with more extensive tests to support the fix.
Users like CarlG have helped a lot over the years in the bug fixing efforts of both Derby and HSQLDB.
Fred Toussi, HSQLDB Project

Does anyone have enough experience to weigh in on the long-term stability of Apache Derby? (...)
Derby, ex IBM Cloudscape (and now also distributed by Sun as JavaDB) is an ACID-compliant database that can stand a lot of concurrent users, running embedded or in server mode, and is know to be robust and production ready. It is not as fast as HSQLDB (Derby uses durable operations), but it's robust. Still, you should run your own tests against it.
See also
François Orsini's blog

I have been using Apache Derby since 2009 in many of my projects, some of them with 24/7 operation and many millions of rows.
Never ever had a single event of data corruption. Rock solid and fast.
I keep choosing it as my RDBMS of choice, unless a good reason not to pops out.

Try looking into H2. It was created by the guy who originally made HSQLDB but built from scratch so doesn't use any HSQLDB code. Not sure how its stability compares to HSQL since I haven't used HSQL in ages and I'm only using H2 for short-lived databases currently. I personally found H2 to be easier to get going than Derby but maybe that's because H2 has a cheat sheet web page.
It might be possible to re-code to use an abstraction layer and then run tests to compare H2 and Derby with the issues you have found.
On the project management side of the fence, does your roadmap have a major version coming up? That might be a rather appropriate time to rip out the guts this way and I wouldn't say you were crazy cause it could potentially remove lots of hard to manage work arounds. If you wanted to make the change where it could affect live systems without plenty of warning and backups in place then you may be crazy.

With regard to HSQLDB, one thing that it doesn't have as a project that SQLite has is the documentation of a robust testing suite and online documentation of assiduous ACID compliance.
I don't mean to take anything away from HSQLDB. It's meant to serve as an alternative to MySQL not to fopen() as SQLite is intended. One can say that the scope of HSQLDB (all the Java RDBMS's really) is much more ambiitious. Fredt and his group have accomplished an extraordinary achievement with HSQLDB. Even so, doing the Google search "Is HSQLDB ACID compliant" doesn't leave an early adopter feeling as confident as one feels after reading about the testing harnesses on the SQLite website.
At http://sqlite.org/transactional.html
"SQLite is Transactional
A transactional database is one in which all changes and queries appear to be Atomic, Consistent, Isolated, and Durable (ACID). SQLite implements serializable transactions that are atomic, consistent, isolated, and durable, even if the transaction is interrupted by a program crash, an operating system crash, or a power failure to the computer.
We here restate and amplify the previous sentence for emphasis: All changes within a single transaction in SQLite either occur completely or not at all, even if the act of writing the change out to the disk is interrupted by
a program crash,
an operating system crash, or
a power failure.
The claim of the previous paragraph is extensively checked in the SQLite regression test suite using a special test harness that simulates the effects on a database file of operating system crashes and power failures."
At http://sqlite.org/testing.html
"1.0 Introduction
The reliability and robustness of SQLite is achieved in part by thorough and careful testing.
As of version 3.7.14, the SQLite library consists of approximately 81.3 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 1124 times as much test code and test scripts - 91421.1 KSLOC.
1.1 Executive Summary
Three independently developed test harnesses
100% branch test coverage in an as-deployed configuration
Millions and millions of test cases
Out-of-memory tests
I/O error tests
Crash and power loss tests
Fuzz tests
Boundary value tests
Disabled optimization tests
Regression tests
Malformed database tests
Extensive use of assert() and run-time checks
Valgrind analysis
Signed-integer overflow checks"

Give SQLite a try if you're looking for something self contained (no server involved). This is what backs android's db api, and is highly stable.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.