Populate h2 in memory database from a db2 database Spring Boot - java

I'm currently building a Spring Boot Service with a h2 in-memory database.
This Database acts as an cache for a part of the data on a central db2 database with a different database schema.
Now when the Spring boot service starts it needs to populate the h2 database with the latest data from the central database.
How can I do this in the best way performance wise?
I'm currently looking to create an different data-source in my service to first get the data and then save the data to the h2.
This doesn't feel like a good solution and it would take quite a long time to populate the database.

If you want to use H2 instead of your DB2 database ... and if you don't want to re-create the database each time you run your app ...
... then consider using an H2 file, instead of in-memory:
http://www.h2database.com/html/features.html
jdbc:h2:[file:][<path>]<databaseName>
jdbc:h2:~/test
jdbc:h2:file:/data/sample
jdbc:h2:file:C:/data/sample (Windows only)
You can "initialize" the file whenever you want (perhaps just once).
Performance should be excellent.
Per your update:
I still need to access the central db to get the latest data in the
fastest way possible. The central db needs to stay for other services
also accessing this
The "fastest" way to get the very latest data ... is to query the central db directly. Period - no ifs/ands/buts.
But, if for whatever reason, you want to "cache" a subset of "recent" data ... then H2 is an excellent choice.
And if you don't want to "rebuild" each time you start your H2 database, then save H2 to a file instead of making it in-memory.
The performance difference between H2:mem and H2:file is small, compared to the network overhead of querying your central db.
'Hope that helps...

Related

get the size of a PostgreSQL Database in Java

i have a spring boot Web application with Spring Data JPA and Hibernate and a want to write a Java handler to check in time intervals the fill level of my postgresql database and delete old data accordingly. for example if it reach a max of 10 GB the database has to delete the old data. Is there any Java Library or any suggestion for this purpose? I found only SELECT statements to perform directly on the PostgreSQL Database (https://wiki.postgresql.org/wiki/Disk_Usage)

Verify Cassandra data migration

I want to migrate some data from an existing database to Cassandra DB.
Post migration, I want to verify whether all the data were migrated successfully or not.
I was wondering whether Cassandra Driver for JAVA provides any internal implementation feature to verify the same so that I can reduce the unnecessary overhead incurred during the interaction with Cassandra DB?
It all depends on the type of database that you are migrating from you can either check row per row in your previous database and then make queries against cassandra to see if the rows are there. That would be the safest approach imho.
Then you can do some very complex stuff like having spark jobs that do the comparisons.
Or you can iterate over all the rows in cassandra and check against the original database. Something like this: Fetch all rows in cassandra
The list could go on and on. For details you would have to tell more about the originating database, data model in cassandra and give some context what it means for a row to be verified ... other than it's there.

How to best decouple data base from application?

We have a command and control system which persists historical data in a database. We'd like to make the system independent of the database. So if the database is there, great we will persist data there, if it is not, we will do some backup storage to files and memory until the database is back. The command and control functionality must be able to continue uninterrupted by the loss or restoration of the database; it should not even know the database exists. So the database and DAO functionality needs to be decoupled from the rest of the application.
We are using RESTful service calls, Spring framework, ActiveMQ, JDBCTemplate with SQL Server database. Currently following standard connection practices using Hikari datasource and JTDS driver. The problem is that if the database goes down or the database connection is lost we start to have data issues as too many service calls (mainly the getters) are still too dependent on the database existence for processing. This dependence is what we'd like to eliminate.
What are the best practices/technologies for totally decoupling the database from the application? We are considering using AMQ to broadcast data updates and have the DAO listen for those messages and then do the update to the database if it is available or flat files as a backup. Then for the getters, provide replies based on what is available either from the actual database or from the short-term backup.
My team has little experience with this and we want to know what others have done that works well.

what is the better way to index data from Oracle/relational tables into elastic search?

What are the options to index large data from Oracle DB to elastic search cluster? Requirement is to index 300Million records one time into multiple indexes and also incremental updates having around approximate 1 Million changes every day.
I have tried JDBC plugin for elasticsearch river/feeder, both seems to be running inside or require locally running elastic search instance. Please let me know if there is any better option for running elastic search indexer as a standalone job (probably java based). Any suggestions will be very helpful.
Thanks.
We use ES as a reporting db and when new records are written to SQL we take the following action to get them into ES:
Write the primary key into a queue (we use rabbitMQ)
Rabbit picks up the primary key (when it has time) and queries the relation DB to get the info it needs and then writes the data into ES
This process works great because it handles both new data and old data. For old data just write a quick script to write 300M primary keys into rabbit and you're done!
there are many integration options - I've listed out a few to give you some ideas, the solution is really going to depend on your specific resources and requirements though.
Oracle Golden Gate will look at the Oracle DB transaction logs and feed them in real-time to ES.
ETL for example Oracle Data Integrator could run on a schedule and pull data from your DB, transform it and send to ES.
Create triggers in the Oracle DB so that data updates can be written to ES using a stored procedure. Or use the trigger to write flags to a "changes" table that some external process (e.g. a Java application) monitors and uses to extract data from the Oracle DB.
Get the application that writes to the Oracle DB to also feed ES. Ideally your application and Oracle DB should be loosely coupled - do you have an integration platform that can feed the messages to both ES and Oracle?

Best approach for Spring+MyBatis with Multiple Databases to support failovers

I need to develop some services and expose an API to some third parties.
In those services I may need to fetch/insert/update/delete data with some complex calculations involved(not just simple CRUD). I am planning to use Spring and MyBatis.
But the real challenge is there will be multiple DB nodes with same data(some external setup will takes care of keeping them in sync). When I got a request for some data I need to randomly pick one DB node and query it and return the results. If the selected DB is unreachable or having some network issues or some unknown problem then I need to try to connect to some other DB node.
I am aware of Spring's AbstractRoutingDataSource. But where to inject the DB Connection Retry logic? Will Spring handle transactions properly if I switch the dataSource dynamically?
Or should I avoid Spring & MyBatis out-of-the-box integration and do Transaction management by myself using MyBatis?
What do you guys suggest?
I propose to you using of NoSQL database like MongoDB. It is easy clustering. You can configure for example use 10 servers and do replication of data 3 times.
Thats mean that if 2 of your 10 servers will fails - you still got data save.
NoSQL databases is different comparing to RDBS, but they can give hight performance for clustering.
Also, there is no transactions support for NoSQL - you have to do it manually in case of financial operations.
Actually you should thing in different way when developing with NoSQL.
Yes, it will work. Get AbstractRoutingDataSource and code your own one. The only thing you cannot do is to change the target database while a transaction is running.
So what you have to do is putting the db retry code in the getConnection. If during the transaction that connection becomes invalid you should let it fail.

Categories