I have a working code that basically copies records from one database to another one using JPA. It works fine but it takes a while, so I wonder if there's any faster way to do this.
I thought Threads, but I get into race conditions and synchronizing those pieces of the code end up being as long as the one by one process.
Any ideas?
Update
Here's the scenario:
Application (Core) has a database.
Plugins have default data (same structure as Core, but with different data)
When the plugin is enabled it checks in the Core database and if not found it copies from it's default data into the core database.
Most databases provide native tools to support this. Unless you need to write additional custom logic to transform the data in some way, I would recommend looking at the export/import tools provided by your database vendor.
Related
I am creating a webapp in Spring Boot (Spring + Hibernate + MySQL).
I have already created all the CRUD operations for the data of my app, and now I need to process the data and create reports.
As per the complexity of these reports, I will create some summary or pre proccesed tables. This way, I can trigger the reports creation once, and then get them efficiently.
My doubt is if I should build all the reports in Java or in Stored Procedures in MySQL.
Pros of doing it in Java:
More logging
More control of the structures (entities, maps, list, etc)
Catching exceptions
If I change my db engine (it would not happen, but never know)
Cons of doing it in Java:
Maybe memory?
Any thoughts on this?
Thanks!
Java. Though both are possible. It depends on what is most important and what skills are available for maintenance and the price of maintaining. Stored procedures are usually very fast, but availability and performance also depends on what exact database you use. You will need special skills, and then you have it all working on that specific database.
Hibernate does come with a special dialect written for every database to get the best performance out of the persistence layer. It’s not that fast as a stored procedure, but it comes pretty close. With Spring Data on top of that, all difficulty is gone. Maintenance will not cost that much and people who know Spring Data are more available than any special database vendor.
You can still create various “difficult” queries easily with HQL, so no block there. But Hibernate comes with more possibilities. You can have your caching done by eh-cache and with Hibernate envers you will have your audit done in no time. That’s the nice thing about this framework. It’s widely used and many free to use maven dependencies are there for the taking. And if in future you want to change your database, you can do it by changing like 3 parameters in your application.properties file when using Spring Data.
You can play with some annotations and see what performs better. For example you have the #Inheritance annotation where you can have some classes end up in the same table or split it to more tables. Also you have the #MappedSuperclass where you can have one JpaObject with the id which all your entities can extend. If you want some more tricks on JPA, maybe check this post with my answer on how to use a superclass and a general repository.
As per the complexity of these reports, I will create some summary or
pre proccesed tables. This way, I can trigger the reports creation
once, and then get them efficiently.
My first thought is, is this required? It seems like adding complexity to the application that perhaps isn't needed. Premature optimisation and all that. Try writing the reports in SQL and running an execution plan. If it's good enough, you have less code to maintain and no added batch jobs to administer. Consider load testing using E.G. jmeter or gatling to see how it holds up under stress.
Consider using querydsl or jooq for reporting. Both provide a database abstraction layer and fluent API for querying databases, which deliver the benefits listed in the "Pros of doing it in Java" section of the question and may be more suited to the problem. This blog post jOOQ vs. Hibernate: When to Choose Which is well worth a read.
I'm investigating the possibility of using neo4j to handle some of the queries of our java web application that simply take too long to run on MSSQL as they require so many joins on large tables, even with indexes implemented.
I am however concerned about the time that it might take to complete the ETL ultimately impacting on how outdated the information may be when queries.
Can someone advise on either a production strategy or toolkit / library that can assist in reading a production sql-server database (using deltas if possible to optimise) and updating a running instance of a neo4j database? I imagine that there will have to be some kind of mapping configuration but the idea is to have this run in an automated manner, updating the neo4j database with one or more sql-server table or view contents.
The direct way to connect a MS SQL database to a Neo4j database would be using the apoc.load.jdbc procedure.
For an initial load you can use Neo4j ETL (https://neo4j.com/blog/rdbms-neo4j-etl-tool/).
There is however no way around the fact that some planning and work will be involved if you want to keep two databases in sync (and if the logic involved goes beyond a few simple queries) continiously. You might want to offload a delta every so often (monthly, daily, hourly, ...) into CSV files and load those (with CYPHER syntax determining what needs to be added, removed, changed or connected) with LOAD CSV.
Sadly enough there's no such thing as a free lunch.
Hope this helps,
Tom
I've got an Oracle database that has two schemas in it which are identical. One is essentially the "on" schema, and the other is the "off" schema. We update data in the off schema and then switch the schemas behind an alias which our production servers use. Not a great solution, but it's what I've been given to work with.
My problem is that there is a separate application that will now be streaming data to the database (also handed to me) which is currently only updating the alias, which means it is only updating the "on" schema at any given time. That means that when the schemas get switched, all the data from this separate application vanishes from production (the schema it is in is now the "off" schema).
This application is using Hibernate 3.3.2 to update the database. There's Spring 3.0.6 in the mix as well, but not for the database updates. Finally, we're running on Java 1.6.
Can anyone point me in a direction to updating both "on" and "off" schemas simultaneously that does not involve rewriting the whole DAO layer using Spring JDBC to load two separate connection pools? I have not been able to find anything about getting hibernate to do this. Thanks in advance!
You shouldn't be updating two seperate databases this way, especially from the application's point of view. All it should know/care about is whether or not the data is there, not having to mess with two separate databases.
Frankly, this sounds like you may need to purchase an ETL tool. Even if you can't get it to update the 'on' schema from the 'off' one (fast enough to be practical), you will likely be able to use it to keep the two in sync (mirror changes from 'on' to 'off').
HA-JDBC is a replicating JDBC Driver we investigated for a short while. It will automatically replicate all inserts and updates, and distribute all selects. There are other database specific master-slave solutions as well.
On the other hand, I wouldn't recommend doing this for 4-8 hour procedures. Better lock the database before, update one database, and then backup-restore a copy, and then unlock again.
My application is always developing, so occasionally - when the version upgrades - some tables need to be created/altered/deleted, some data modified, etc. Generally some sql code needs to be executed.
Is there a Java library that can be used to keep my database structure up to date (by analyzing something like "db structure version" information and executing custom sql to code to update from one version to another)?
Also it would be great to have some basic actions (like add/remove column) ready to use with minimal configuration, ie name/type and no sql code.
Try DBDeploy. Although I haven't used it in the past, it sounds like this project would help in your case. DBDeploy is a database refactoring manager that:
"Automates the process of establishing
which database refactorings need to be
run against a specific database in
order to migrate it to a particular
build."
It is known to integrate with both Ant and Maven.
Try Liquibase.
Liquibase is an open source (Apache
2.0 Licensed), database-independent library for tracking, managing and
applying database changes. It is built
on a simple premise: All database
changes are stored in a human readable
yet trackable form and checked into
source control.
Supported features:
Extensibility
Merging changes from multiple developers
Code branches
Multiple Databases
Managing production data as well as various test datasets
Cluster-safe database upgrades
Automated updates or generation of SQL scripts that can be approved and
applied by a DBA
Update rollbacks
Database ”diff“s
Generating starting change logs from existing databases
Generating database change documentation
We use a piece of software called Liquibase for this. It's very flexible and you can set it up pretty much however you want it. We have it integrated with Maven so our database is always up to date.
You can also check Flyway (400 questions tagged on SOW) or mybatis (1049 questions tagged). To add to the comparison the other options mentioned: Liquibase (663 questions tagged) and DBDeploy (24 questions tagged).
Another resource that you can find useful is the feature comparison in the Flyway website (There are other related projects mentioned there).
You should take a look into OR Mapping libraries, e.g. Hibernate
Most ORM mappers have logic to do schema upgrades for you, I have successfully used Hibernate which gets at least the basic stuff right automatically.
I have a Java app using a MySQL database through hibernate. The database is really used as persistence layer: The database is read at the initial load of the program, and the records are then maintained in memory.
However, we are adding extra complexity, where another process may change the database as well, and it would be nice for the changes to reflect on the Java app. Yet, I don't particularly like pulling mechanisms to query the database every few seconds, especially that the database is rarely updated.
Is there a way to have a callback to listen to database changes? Would triggers help?
Or change both applications so the Java app is truly the owner of the MySQL database and exposes it as a service. You're coupling the two apps at the database level by doing what you're proposing.
If you have one owner of the data you can hide schema changes and such behind the service interface. You can also make it possible to have a publish/subscribe mechanism to alert interested parties about database changes. If those things are important to you, I'd reconsider letting another application access MySQL directly.
Is there a way to have a callback to listen to database changes? Would triggers help?
To my knowledge, such a thing doesn't exist and I don't think a trigger would help. You might want to check this similar question here on SO.
So, I'd expose a hook at the Java application level (it could be a simple servlet in the case of a webapp) to notify it after an update of the database and have it invalidate its cache.
Another way would be to use a self compiled MySQL server with the patches from this project
ProjectPage External Language Stored Procedures
Check this blog post for a more detailed introduction
Calling Java code in MySQL
One option would be tail the binary logs (or setup a replication slave) and look for changes relevant to your application. This is likely to be a quite involved solution.
Another would be to add a "last_updated" indexed column to the relevant tables (you can even have mysql update this automatically) and poll for changes since the last time you checked. The queries should be very cheap.
Instead of caching the database contents within the memory space of the Java app, you could use an external cache like memcached or Ehcache. When either process updates (or reads) from the database, have it update memcached as well.
This way whenever either process updates the DB, its updates will be in the cache that the other process reads from.