Java equivalent for database schema changes like South for Django? - java

I've been working on a Django project using South to track and manage database schema changes. I'm starting a new Java project using Google Web Toolkit and wonder if there is an equivalent tool. For those who don't know, here's what South does:
Automatically recognize changes to my Python database models (add/delete columns, tables etc.)
Automatically create SQL statements to apply those changes to my database
Track the applied schema migrations and apply them in order
Allow data migrations using Python code. For example, splitting a name field into a first-name and last-name field using the Python split() function
I haven't decided on my Java ORM yet, but Hibernate looks like the most popular. For me, the ability to easily make database schema changes will be an important factor.

Wow, South sounds pretty awesome! I'm not sure of anything out-of-the-box that will help you nearly as much as that does, however if you choose Hibernate as your ORM solution you can build your own incremental data migration suite without a lot of trouble.
Here was the approach that I used in my own project, it worked fairly well for me for over a couple of years and several updates/schema changes:
Maintain a schema_version table in the database that simply defines a number that represents the version of your database schema. This table can be handled outside of the scope of Hibernate if you wish.
Maintain the "current" version number for your schema inside your code.
When the version number in code is newer than what's the database, you can use Hibernate's SchemaUpdate utility which will detect any schema additions (NOTE, just additions) such as new tables, columns, and constraints.
Finally I maintained a "script" if you will of migration steps that were more than just schema changes that were identified by which schema version number they were required for. For instance new columns needed default values applied or something of that nature.
This may sounds like a lot of work, especially when coming from an environment that took care of a lot of it for you, but you can get a setup like this rolling pretty quickly with Hibernate and it is pretty easy to add onto as you go on. I never ended up making any changes to my incremental update framework over that time except to add new migration tasks.
Hopefully someone will come along with a good answer for a more "hands-off" approach, but I thought I'd share an approach that worked pretty well for me.
Good luck to you!

as I'm looking for the same heres what I've achieved so far.
We first used dbdeploy. It manages the most stuff for you but you will have to write all the transition scripts by yourself! That means every change you make has to be in its own script which you will have to write from scratch. Not very handy, but works very reliable.
The second thing I encountered is liquibase. It stores the configuration in one single xml file. Not very intuitive to read, but managable. Plus there is an Intellij Idea plugin for it. At the moment of writing it still has some minor issues, but as the author assured me, they will be fixed soon.
The perfect solution would be to get south working with your java environment. That really would be a tool to marry :D

Maybe try flyaway. Seems like a good alternative.

I've been thinking about using django-jython just for db migrations in our legacy Java application. The latest Jython version is 2.5.4rc1, but I think I can mitigate the risk by just using it for South migrations.
Especially since I can use inspectdb to generate the models for me. And then replace parts of the Java with Python "seamlessly".

If you're using hibernate, then checkout liquibase
http://www.liquibase.org/databases.html
It's been around for 10 years so it's pretty solid. It may support other ORM's, just have a dig around on their website. Checkout the liquibase+hibernate extension here:
https://github.com/liquibase/liquibase-hibernate

Related

Nutch: What version of Nutch + Cassandra actually works?

I'm trying to do some crawling with Nutch and I'd like to test out Cassandra as a backend, however using the latest version of nutch and its dependencies Cassandra throws a variety of errors as you move through the inject, generate, fetch, etc. process.
The errors are all related to actual problems in code, not out of memory or configuration. I've fixed some of them by modifying code within gora-cassandra, but it's still not functional.
My question is, does a working version of these 2 projects exist? By working i mean you can run through inject, generate, fech, parse, updatedb on at least a small set of urls, without error.
Here's an example of one of the classes giving an error during fetch:
java.lang.NullPointerException
at org.apache.gora.cassandra.query.CassandraSuperColumn.getUnionIndex
I have used HBase as the backend and that just works, although HBase itself is a monster to manage so that's why i'd like to test out Cassandra. However, i'm about to give up on this as I don't think I should be having to modify gora-cassandra code just to get a basic example to run.
Thanks
According to this link it's just broken, which is about 3 months old http://lucene.472066.n3.nabble.com/Re-user-Digest-3-Jun-2017-19-27-20-0000-Issue-2758-td4339060.html
Its unclear why backends that do not work are even documented.
HBase is most widely used, followed by MongoDB... on the other end of
the spectrum, Cassandra is least used and broken. It has not been
maintained for quite some time... and yes this is reflected by use of
Super Columns. We are currently re-writing the backend as part of a
GSoC project.
I would agree with the guy making the original statement, Its unclear why backends that do not work are even documented.
Really tired of this project and its lack usable documentation.

On mixing Java and SQL queries, how to do best? [duplicate]

This question already has answers here:
Java Programming - Where should SQL statements be stored? [closed]
(15 answers)
Closed 9 years ago.
As part of my Java program, I need to do a run a lot of queries against (Oracle) database.
Currently, we create a mix SQL and Java, which (i know) is a bad bad thing.
What is a right way to handle something like this? If possible, include examples.
Thank you.
EDIT:
A bit more information about the application. It is a web application that derives content mainly from the database (it takes user input and paints content to be seen next based on what database believes to be true).
The biggest concern I have with how it's done today is that mixing Java code and a SQL queries look "out-of-place" when coupled as tightly as it is (Queries hardcoded as part of source code)
I am looking for a cleaner way to handle this situation, which would improve maintainability and clarity of the project at hand
For what you've described, incorporating an object relational mapper (ORM) or rewriting as stored procedures is probably more work than you want to embrace. Both have non-trivial learning curves.
Instead a good practice is consolidating SQL in a class per table or purpose. Take a look at the table data gateway object and the data access object design patterns to see how this is done in practice.
The upshot of this approach is myriad. You are better positioned for reuse because queries are in one spot. Client code becomes more readable as you replace several lines of JDBC and SQL with a method call (e.g. userTableDataGateway.getContentToShow(pageId)). Finally, this will help you see the problem more clearly an ORM helps solve.
Well, one thing you could consider is an Object Relational Mapper (for example, Hibernate). This would allow you to map your database schema to Java objects, which would generally clean up your Java code.
However, if performance and speed is of the essence, you might be better off using a plain JDBC driver.
This would of course also be dependent upon the task your application is trying to accomplish. If, for example, you need to do batch updates based on a CSV file, I migh go with a pure JDBC solution. If you're designing a web application, I would definitely go with an ORM solution.
Also, note that a pure JDBC solution would involve having SQL in your Java code. Actually, for that matter, you would have to have some form of SQL, be it HQL, JPQL, or plain SQL, in any ORM solution as well. Point being, there's nothing wrong with some SQL in your Java application.
Edit in response the OP's edits
If I were writing a web application from scratch, I would use an ORM. However, since you already have a working application, making the transition from a pure JDBC solution to an ORM would be pretty painful. It would clean up your code, but there is a significant learning curve involved and it takes quite a bit of set-up. Some of the pain from setting-up would be alleviated if you are working with some sort of bean-management system, like Spring, but it would still be pretty significant.
It would also depend on where you want to go with your application. If you plan on maintaining and adding to this code for a significant period, a refactor may be in order. I would not, however, recommend a re-write of your system just because you don't like having SQL hard-coded in your application.
Based on your updates, I concur with Tim Pote's edits re: the learning curve to integrate ORM. However, instead of integrating ORM, you could do things like using prepared statements, which you in turn store in a properties file. Or even store your queries in the DB so that you can make subtle updates to them that can then be read in immediately without restarting your app server. Both of these strategies would declutter your Java code of hard-coded SQL.
Ultimately though, I don't think there's a clear answer to your question, because there's nothing inherently wrong with what you're doing. It's just a bit inflexible, but perhaps acceptably so for your circumstances.
That said, I'm posting this as an answer!
I'm not sure of the state of the project but you may also be able to find an 'alternate' object relational mapper called MyBatis. It has a lower learning curve than the popular hibernate or eclipselink and let's you actually write the queries so you know what the code is doing. That is if ORM is your thing.
I'm working with JPA right now (mainly because it is the current trend and it needs to be learned). JPA is the Java standard for ORM. If you are going to learn what is currently a typical ORM way of doing things, JPA is probably the best way to go. Frameworks like Hibernate and Eclipselink drive it. Depending on what framework you choose to underpin your JPA app, you can use proprietary features but that will tie you to that framework pretty much for good. JPA is not hard to start using, but can be very cryptic when it doesn't work since it obfuscates the interaction with the database quite a bit (mind you, it does allow the option using native SQL queries, but that kind of negates the reason why people say JPA style DB access is good).
And yes, there are still people using JDBC with prepared statements. And normally there are practices/patterns that you will use when programming with plain old JDBC that act like a very, very minimalist ORM... or really, closer to MyBatis. Again, if you go this route, use prepared statements. They negate a number of dangers.
This is a religious kind of question, so you will hear a lot of proselytizing the way you wrote the question. In fact someone might shoot down your question for this. I think the only thing you could ask that might be worse is whether emacs or vi is better to a crowd of unix geeks.
Your question seems too generic, however if you have a mix of Direct SQL on Oracle and Java SQL, it would be better to invest some time in an ORM like Hibernate or Apache Cayenne. The ORM is a separate design approach to segregate Database operations from the Java side. All the db interactions and DB design is implemented on the ORM and all the access and business logic will reside in Java, this is a suggestion. Still unclear about your actual problem though.
The biggest concern I have with how it's done today is that mixing
Java code and a SQL queries look "out-of-place" when coupled as
tightly as it is (Queries hardcoded as part of source code)
This assumption of yours is not really "correct" in a way that there is going to be a true / false answer to your question. This question here explains that there are several ways of dealing with mixing Java and SQL:
Java Programming - Where should SQL statements be stored?
It essentially distinguishes between SQL being:
Hardcoded in business objects
Embedded in SQLJ clauses
Encapsulated in separate classes e.g. Data Access Objects
Metadata driven (decouple the object schema from the data schema - describe the mappings between them in metadata)
Put into external files (e.g. Properties or Resource files)
Put into stored procedures
I'll add to that:
Embedded in CriteriaQuery statements
Embedded in jOOQ statements.
Apache Cayenne, is one of the easiest ORM to use. It comes with a Cayenne Modeller to Model data objects and does mappings. I would recommend Cayenne for a beginner in ORM. It can create mapping classes and DB sync through the modeller.

java library to maintain database structure

My application is always developing, so occasionally - when the version upgrades - some tables need to be created/altered/deleted, some data modified, etc. Generally some sql code needs to be executed.
Is there a Java library that can be used to keep my database structure up to date (by analyzing something like "db structure version" information and executing custom sql to code to update from one version to another)?
Also it would be great to have some basic actions (like add/remove column) ready to use with minimal configuration, ie name/type and no sql code.
Try DBDeploy. Although I haven't used it in the past, it sounds like this project would help in your case. DBDeploy is a database refactoring manager that:
"Automates the process of establishing
which database refactorings need to be
run against a specific database in
order to migrate it to a particular
build."
It is known to integrate with both Ant and Maven.
Try Liquibase.
Liquibase is an open source (Apache
2.0 Licensed), database-independent library for tracking, managing and
applying database changes. It is built
on a simple premise: All database
changes are stored in a human readable
yet trackable form and checked into
source control.
Supported features:
Extensibility
Merging changes from multiple developers
Code branches
Multiple Databases
Managing production data as well as various test datasets
Cluster-safe database upgrades
Automated updates or generation of SQL scripts that can be approved and
applied by a DBA
Update rollbacks
Database ”diff“s
Generating starting change logs from existing databases
Generating database change documentation
We use a piece of software called Liquibase for this. It's very flexible and you can set it up pretty much however you want it. We have it integrated with Maven so our database is always up to date.
You can also check Flyway (400 questions tagged on SOW) or mybatis (1049 questions tagged). To add to the comparison the other options mentioned: Liquibase (663 questions tagged) and DBDeploy (24 questions tagged).
Another resource that you can find useful is the feature comparison in the Flyway website (There are other related projects mentioned there).
You should take a look into OR Mapping libraries, e.g. Hibernate
Most ORM mappers have logic to do schema upgrades for you, I have successfully used Hibernate which gets at least the basic stuff right automatically.

Suggestion for annotation only ORM framework (Java)

I'm working on a medium-sized project in Java (GWT to be precise), and I'm still in the process of deciding what ORM to use.
I just refuse to write SQL queries unless utterly and completely necessary (not the case :D)
I want to use ONLY annotations, no XML configuring [except database location, username, etc], and I DON'T want to create any tables or define them. I want this to be done by the framework completely.
Call me lazy, but I like Java/GWT programming, not creating tables and coping with that sort of things, and it's a plus in my assignment if I actually use an ORM :D
I've considered so far:
Hibernate with annotations: I've found little documentation to get started from ground using this. I've found little examples and alike. It's as if they didn't actually want you to use 100% annotations.
DataNucleus
JDO: It seems interesting, I'd never heard of DataNucleus up to until this week, but it seems extremely mature, and I actually discovered it because Google uses it in GWT, so that's a good sign. I also like the fact that they mentioned I don't need to define any tables or columns, though I think hibernate can achieve this as well. I actually enjoyed reading though their documentation (though I haven't finished yet), something quite opposite to hibernate.
JPA I'm not totally sure if DataNucleus/JPA can work with annotation-only configuration, though I might need to take a deeper look into the documentation.
As you might guess, I'm quite inclined to JDO... but it'd be nice to hear what people who've used it have to say vs the other alternatives, and if i'm missing some very important point here.
Edit 1: I know I'll need to XML the database location/usr/pwd, I meant I don't want to use an XML to configure the mapping or database schema.
JPA (1 and 2) is pretty much XML free, depending on how it's packaged. You most certainly don't need it for the schema. It also supports annotations for details when the tables are generated.
The only issue with these is that while they can create a database, they're a DB MAPPING tool, not a DB DEFINITION tool. Specifically, most won't allow you to create the arbitrary indexes that you may well need to get the DB tuned properly to your queries.
But other than that, JPA should fill your needs, and it has a lot of implementations (Hibernate is just one implementation).
This is a self publicizing but I'm been working for a while on a simple Java ORM package called ORMLite. I wanted something much less complicated than hibernate but without writing SQL directly. It's completely annotation based and currently supports MySQL, Postgres, Derby, and H2. Adding other database would be simple if I have access to a server. It is completely annotation based and can create (and destroy) tables.
http://ormlite.com/
It has pretty flexible QueryBuilder and table paging. Joining is, however, not supported.

What's the best way to persist data in a Java Desktop Application?

I have a large tree of Java Objects in my Desktop Application and am trying to decide on the best way of persisting them as a file to the file system.
Some thoughts I've had were:
Roll my own serializer using DataOutputStream: This would give me the greatest control of what was in the file, but at the cost of micromanaging it.
Straight old Serialization using ObjectOutputStream and its various related classes: I'm not sold on it though since I find the data brittle. Changing any object's structure breaks the serialized instances of it. So I'm locked in to what seems to be a horrible versioning nightmare.
XML Serialization: It's not as brittle, but it's significantly slower that straight out serialization. It can be transformed outside of my program.
JavaDB: I'd considered this since I'm comfortable writing JDBC applications. The difference here is that the database instance would only persist while the file was being opened or saved. It's not pretty but... it does lend itself to migrating to a central server architecture if the need arises later and it introduces the possibility of quering the datamodel in a simpler way.
I'm curious to see what other people think. And I'm hoping that I've missed some obvious, and simpler approach than the ones above.
Here are some more options culled from the answers below:
An Object Database - Has significantly less infrastructure than ORM approaches and performs faster than an XML approach. thanks aku
I would go for the your final option JavaDB (Sun's distribution of Derby) and use an object relational layer like Hibernate or iBatis. Using the first three aproaches means you are going to spend more time building a database engine than developing application features.
Have a look at Hibernate as a simpler way to interface to a database.
In my experience, you're probably better off using an embedded database. SQL, while less than perfect, is usually much easier than designing a file format that performs well and is reliable.
I haven't used JavaDB, but I've had good luck with H2 and SQLite. SQLite is a C library which means a little more work in terms of deployment. However, it has the benefit of storing the entire database in a single, cross-platform library. Basically, it is a pre-packaged, generic file format. SQLite has been so useful that I've even started using it instead of text files in scripts.
Be careful using Hibernate if you're working with a small persistence problem. It adds a lot of complexity and library overhead. Hibernate is really nice if you're working with a large number of tables, but it will probably be cumbersome if you only need a few tables.
db4objects might be the best choice
XStream from codehaus.org
XML serialization/deserialization largely without coding.
You can use annotations to tweak it.
Working well in two projects where I work.
See my users group presentation at http://cjugaustralia.org/?p=61
I think it depends on what you need. Let's see the options:
1) Descarded imediatelly! I'll not even justify. :)
2) If you need a simple, quick, one-method persistence, stick with it. It will persist the complete data graph as it is! Beware of how long you'll be maintaning the persisted objects. As yourself pointed out, versioning can be a problem.
3) Slower than (2), need extra code and can be edited by the user. I would only use it the data is supposed to be used by a client in another language.
4) If you need to query your data in anyway, stick with the DB solution.
Well, I think you had already answered your question :)

Categories