We are developing a SAAS based application. One of the requirements is to record every change in database tables i.e. create date/time based version of data. Client should be able to revert back to any version of data.
I have almost 30 tables in database, and data insertion frequency is 80,000 records added/updated per day through bulk import. However, client can also use GUI to insert data through forms (other than bulk import).
Before creating any strategy to implement this requirement, I would love have your comments/suggestion on how to implement this.
On a side note, I have reviewed this blog post and found it very good starting point but I still doubt on how to restore past data.
Database snapshot is a promising solution, but as I said earlier that this is a SAAS based application and we are storing multiple clients data in a single database, and snapshot would restore data for other clients as well.
Please suggest any strategy/plan on how to execute this requirement.
If you plan on using JPA/Hibernate to fetch your data, you can give Envers a shot.
Envers is a JBoss open-source project for maintaining versions of Database Entities. You can mark certain columns of the entire table with #Audited annotation to start tracking audit history. It typically stores all the audit data in a table with _AUDIT name. It also provides API to query historical data.
For details please go thru http://www.jboss.org/envers
Related
I want to get some filtered data from one oracle db and refresh tables in other oracle db and this refresh needs to be done frequently. So what are best possible ways to do it?
Please suggest the optimal way to do it.
Using db links or using oracle schedule jobs or write java code.
There are numerous ways to do this, but the most straightforward is to use materialized views with queries that involve dblinks, which you can schedule refreshes for by using dbms_scheduler. There are a lot of docs online to help you. Here's one:
Working with Materialized Views
I don't know Java so I can't comment it.
As far as database is concerned, one option is to create database link between these two databases and a materialized view in one of them which fetches data over the database link from another database.
You can schedule refresh; there are various options. Read documentation to pick the right one for your situation. Have a quick look at Tim Hall's materialized views article; if you find it interesting, search Oracle documentation (related to version you use) for more info.
create a database link between source and target databases and follow any of these native tool options.
Create a materialized view using query that points to source database.
Write a procedure in target site using select queries to read data from source site and update/insert the target tables accordingly.Later schedule those procedures using scheduler jobs.
Use the Golden gate provided if table you chosen should have primary key or unique key.
you can write your own Java or python code which works like PUB and SUB mode to publish the data into target site.
Is there any tool/way to export tables data into a file which is inserted on that day, and this job should Run on everyday.
If you are targeting specific tables that are growing rapidly every day, I would suggest implementing daily partitions on these specific tables using interval partitioning using the column that determines the record's date. This way each day's data could be easily archived using exchange partition or backed up. Also need to ensure that the partitioning column chosen is used for querying across the application to avoid scanning all partitions for SQLs and benefit from partition pruning.
If you are targeting all the application tables in your database, my suggestion does not apply. Thanks.
I am working on a Spring-MVC application in which we are seeing that the database is growing big. The space is consumed by chat messages history mostly, and other stuff like old notifications, which are not that useful.
Because of which we thought of moving the guys to some text/XML file to give the DB some room to breath and increase the performance of queries thereby. Indexes are not that useful as too many insertions.
I wanted to know if there is any way, PostgreSQL or Hibernate has support for such a task, where data is picked out of db and saved in plain files, which can be accessed and result in atleast good performance gains.
I have only started looking up some stuff, so I don't have much in hand to show. Kindly let me know if there are any questions you guys have.
Thanks a lot.
I would use the PostgreSQL JSON storage and have two databases:
the current operations DB, the one where you are moving data away to slim it
the archive database where old data is aggregated to save storage
This way you can move data from the current database into the archive database without compromising ACID attributes and you can aggregate the old data to simplify retrieval, by grouping various related entities based on some common root entity, which you'll then use to access your old data.
This way the current operation database remains small enough, while the archive database can be shared. This way, it's easier to configure the current operation for high performance, while the archive one for scalability.
Anyway, hibernate doesn't support this out-of-the-box, but you can implement it using custom Hibernate types and JTA transactions.
I'm currently working on a simple Java application that calculates and graphs the different types of profit for a company. A company can have many branches, and each branch can have many years, and each year can have up to 12 months.
The hierarchy looks as follows:
-company
+branch
-branch
+year
-year
+month
-month
My intention was to have the data storage as simple as possible for the user. The structure I had in mind was an XML file that stored everything to do with a single company. Either as a single XML file or have multiple XML files that are linked together with unique IDs.
Both of these options would also allow the user to easily transport the data, as apposed to using a database.
The problem with a database that is stopping me right now, is that the user would have to setup a database by him/herself which would be very difficult for them if they aren't the technical type.
What do you think I should go for XML file, database, or something else?
It will be more complicated to use XML, XML is more of an interchange format, not a substitute for a DB.
You can use an embeddedable database such as H2 or Apache Derby / JavaDB, in this case the user won't have to set up a database. The data will be stored only locally though, so if this is ok for your application, you can consider it.
I would defintely go for the DB:
you have relational data, a thing DBs are very good at
you can query your data in that relational much easier than in XML
the CRUD operations (create, read, update, delete) are much more easier in DB than in XML
You can avoid the need for the user to install a DB engine by embedding SQLite with your app for example.
If it's a single-user application and the amount of data is unlikely to exceed a couple of megabytes, then using an XML file for the persistent storage might well make sense in that it reduces the complexity of the package and its installation process. But you're limiting the scalability: is that wise?
We are designing a fairly large brownfield application, and run into a bit of a issue.
We have a fairly large amount of information in a DB2 database from a legacy application that is still loading data. We also have information in an Oracle database that we control.
We have to do a 'JOIN' type of operation on the tables. Right now, I was thinking of pulling the information out of the DB2 table into a List<> and then iterating those into a SQL statement on the Oracle database such as:
select * from accounts where accountnum in (...)
Is there any easier way to interact between the databases, or at least, what is the best practice for this sort of action?
I've done this two ways.
With two Sybase databases on different boxes, I set up store procedures, and called then like functions, to send data back and forth. This additionally allowed the sprocs to audit/log, to convince the customer no data was being lost in the process.
On an Oracle to Sybase one way, I used a view to marshall the data and each vendors' C libraries called from a C++ program that gave the C APIs a common interface.
On a MySQL and DB2 setup, where like your situation, the Db2 was "legacy but live", I employed a setup similar to what you're describing: pulling the data out into a (Java) client program.
If the join is always one-to-one, and each box's resultset has the same key, you can pull them both with the same ordering and trivially connect them in the client. Even if they're one-to-many, stitching them together is just a one-way iteration of both of your lists.
If it gets to be many-to-many, then I might fall back to processing one item at a time (though you could use HashSet lookup).
Basically, though, your choices are sprocs (for which you'd need to and a client layer), or just doing it in the client.
You can export data from DB2 in flat file format and use this flat file as an external table or use sql loader, this is a batch process.
There is also something called heterogeneous connectivity. Here you create a database link from Oracle to DB2. This makes it possible to query your DB2 database real time and you can join a Oracle table with a DB2 table.
You can also use this database link in combination with materialized views.
There are different kinds of heterogeneous connectivity so read the documentation carefully.
Does it have to be real time data?. If so, there are products available for heterogeneous connectivity especially db2 relational connect which is part of federated server. If the lag is accepted, you can setup scripts to replicate the data to oracle using which you can do a native join.
You will get poor performance with pulling data to client application. If this is the only option, try to create a db2 stored procedure to return the data which will make the performance slightly better.
If it is possible to copy the data from the legacy database to the database you control, you can think to a data extraction job that copies once per day (or as often as possible) the new records from the legacy DB to the Oracle DB. It might not be so simple, if you can't identify the new records that are produced in the legacy database since the last data loading.
Then, you can do the joins in your Oracle instance.
If you ask the vendors, probably the best practice would be to buy another product.
From the IBM side, there is IBM Federation Server, which can "Combine data from disparate sources such as DB2, Oracle, and SQL Server into a single virtual view." I imagine there is also one from Oracle but I'm less familiar with their products.
Oracle Transparent Gateway for DRDA http://www.oracle.com/technetwork/database/gateways/index.html
IBM Infosphere Federation Server
http://www-03.ibm.com/software/products/en/ibminfofedeserv/
Note if you have DB2 Advanced Enterprise Server Edition (AESE), Infosphere Federation Server is included.
Both products would allow you to use a single join query sent to one DB that returns data from both DBs. The Oracle product is really nice in that it allows Oracle to see the DB2 database as another Oracle DB and for DB2 to see the Oracle database as another DB2 database. (Thanks to IBM publishing the specs for both the client and server side of the DRDA protocol DB2 uses. Too bad no other vendor is willing to do so, though they have no trouble taking advantage of the fact IBM did so.)
Neither product is what I would call cheap.
For cheap, you could take advantage of Oracle Database Gateway for ODBC
http://docs.oracle.com/cd/E16655_01/gateways.121/e17936/toc.htm