What is the best way to serialize an EMF model instance?

What is the best way to serialize an EMF model instance? - java

I have an Eclipse RCP application with an instance of an EMF model populated in memory. What is the best way to store that model for external systems to access? Access may occur during and after run time.
Reads and writes of the model are pretty balanced and can occur several times a second.
I think a database populated using Hibernate + Teneo + EMF would work nicely, but I want to know what other options are out there.

I'm using CDO (Connected Data Objects) in conjunction with EMF to do something similar. If you use the examples in the Eclipse wiki, it doesn't take too long to get it running. A couple of caveats:
For data that changes often, you probably will want to use nonAudit mode for your persistence. Otherwise, you'll save a new version of your EObject with every commit, retaining the old ones as well.
You can choose to commit every time your data changes, or you can choose to commit at less frequent intervals, depending on how frequently you need to publish your updates.
You also have fairly flexible locking options if you choose to do so.
My application uses Derby for persistence, though it will be migrated to SQL Server before long.
There's a 1 hour webinar on Eclipse Live (http://live.eclipse.org/node/635) that introduces CDO and gives some good examples of its usage.

I'd go with Teneo to do the heavy lifting unless performance is a real problem (which it won't be unless your models are vast). Even if it is slow you can tune it using JPA annotations.

Related

Hibernate + MySQL Best practices for reporting data

I am creating a webapp in Spring Boot (Spring + Hibernate + MySQL).
I have already created all the CRUD operations for the data of my app, and now I need to process the data and create reports.
As per the complexity of these reports, I will create some summary or pre proccesed tables. This way, I can trigger the reports creation once, and then get them efficiently.
My doubt is if I should build all the reports in Java or in Stored Procedures in MySQL.
Pros of doing it in Java:
More logging
More control of the structures (entities, maps, list, etc)
Catching exceptions
If I change my db engine (it would not happen, but never know)
Cons of doing it in Java:
Maybe memory?
Any thoughts on this?
Thanks!

Java. Though both are possible. It depends on what is most important and what skills are available for maintenance and the price of maintaining. Stored procedures are usually very fast, but availability and performance also depends on what exact database you use. You will need special skills, and then you have it all working on that specific database.
Hibernate does come with a special dialect written for every database to get the best performance out of the persistence layer. It’s not that fast as a stored procedure, but it comes pretty close. With Spring Data on top of that, all difficulty is gone. Maintenance will not cost that much and people who know Spring Data are more available than any special database vendor.
You can still create various “difficult” queries easily with HQL, so no block there. But Hibernate comes with more possibilities. You can have your caching done by eh-cache and with Hibernate envers you will have your audit done in no time. That’s the nice thing about this framework. It’s widely used and many free to use maven dependencies are there for the taking. And if in future you want to change your database, you can do it by changing like 3 parameters in your application.properties file when using Spring Data.
You can play with some annotations and see what performs better. For example you have the #Inheritance annotation where you can have some classes end up in the same table or split it to more tables. Also you have the #MappedSuperclass where you can have one JpaObject with the id which all your entities can extend. If you want some more tricks on JPA, maybe check this post with my answer on how to use a superclass and a general repository.

As per the complexity of these reports, I will create some summary or
pre proccesed tables. This way, I can trigger the reports creation
once, and then get them efficiently.
My first thought is, is this required? It seems like adding complexity to the application that perhaps isn't needed. Premature optimisation and all that. Try writing the reports in SQL and running an execution plan. If it's good enough, you have less code to maintain and no added batch jobs to administer. Consider load testing using E.G. jmeter or gatling to see how it holds up under stress.
Consider using querydsl or jooq for reporting. Both provide a database abstraction layer and fluent API for querying databases, which deliver the benefits listed in the "Pros of doing it in Java" section of the question and may be more suited to the problem. This blog post jOOQ vs. Hibernate: When to Choose Which is well worth a read.

Copying a database JPA

I have a working code that basically copies records from one database to another one using JPA. It works fine but it takes a while, so I wonder if there's any faster way to do this.
I thought Threads, but I get into race conditions and synchronizing those pieces of the code end up being as long as the one by one process.
Any ideas?
Update
Here's the scenario:
Application (Core) has a database.
Plugins have default data (same structure as Core, but with different data)
When the plugin is enabled it checks in the Core database and if not found it copies from it's default data into the core database.

Most databases provide native tools to support this. Unless you need to write additional custom logic to transform the data in some way, I would recommend looking at the export/import tools provided by your database vendor.

What is the best way to write a data loader?

I am using Spring 2.5 and the Hibernate that goes with it. I'm running against an Oracle 11g database.
I have created my DAOs which extend HibernateTemplate. Now I want to write a loader that inserts 5 million rows in my person table. I have written this in a simple minded fashion like read a row from a CSV file, turn it into a person, save into the table. Keep doing this until CSV file is empty.
The problem is that I run out of heap space at about 450000 rows. So I double the size of memory from 1024m to 2048m and now I run out of memory after about 900000 rows.
Hmmmmm....
So I've read some things about turning off the query cache for Hibernate, but I'm not using a L2 cache, so I don't think this is the issue.
I've read some things about JDBC2 batching, but I don't think that applies to hibernate.
So, I'm wondering if maybe there's a fundamental thing about Hibernate that I'm missing.

To be honest I wouldn't be using hibernate for that. ORMs are not designed to load million of rows into DBs. Not saying that you can't, but it's a bit like digging a swimming pool with a electric drill; you'd use an excavator for that, not a drill.
In your case, I'd load the CSV directly to the DB with a loader application that comes with databases. If you don't want to do that, yes, batch inserts will be way more efficient. I don't think Hibernate let's you do that easily though. If I were you I'd just use plain JDBC, or at most Spring JDBC.
If you have complicated businesslogic in the entities and absolutely have to use Hibernate, you could flush every N records as Richard suggests. However, I'd consider that a pretty bad hack.

In my experience with EclipseLink, holding a single transaction open while inserting/updating many records results in the symptoms you've experienced.
You are working with an EntityManager (of some sort, JPA or Hybernate specific - it's still managing Entitys). It's trying to keep the working set in memory, for the life of the transaction.
A general solution was to commit & the restart the transaction after every N inserts; a typical N for me was 1000.
As a footnote, with some version (undefined, it's been a few years) of EclipseLink, a session flush/clear didn't solve the problem.

It sounds like you are running out of space due to your first-level cache (the Hibernate session). You can flush the Hibernate session periodically to keep memory usage down, and break up the work into chunks by committing every few thousand rows, keeping the database's transaction log from getting too big.
But using Hibernate for a load task like that will be slow, because JDBC is slow. If you have a good idea what the environment will be like, you have a cap on the amount of data, and you have a big enough window for processing, then you can manage, but in a situation where you want it to work in multiple different client sites and you want to minimize the time spent on figuring out problems due to some client site's load job not working, then you should go with the database's bulk-copy tool.
The bulk-copy approach means the database suspends all constraint checking and index-building and transaction logging, instead it concentrates on slurping the data in as fast as possible. Because JDBC doesn't get anything like this level of cooperation from the database it can't compete. At a previous job we replaced a JDBC loader task that took over 8 hours to run with a SQLLoader task that took 20 minutes.
You do sacrifice database independence, but all databases have a bulk-copy tool (because DBAs rely on them) so you will have a very similar process for each database, only the exe you invoke and the way the file formatting is specified should change. And this way you make the best use of your processing window.

Running Neo4j purely in memory without any persistence

I don't want to persist any data but still want to use Neo4j for it's graph traversal and algorithm capabilities. In an embedded database, I've configured cache_type = strong and after all the writes I set the transaction to failure. But my write speeds (node, relationship creation speeds) are a slow and this is becoming a big bottleneck in my process.
So, the question is, can Neo4j be run without any persistence aspects to it at all and just as a pure API? I tried others like JGraphT but those don't have traversal mechanisms like the ones Neo4j provides.

As far as I know, Neo4J data storage and Lucene indexes are always written to files. On Linux, at least, you could set up a ramfs filing system to hold the files in-memory.
See also:
Loading all Neo4J db to RAM

How many changes do you group in each transaction? You should try to group up to thousands of changes in each transaction since committing a transaction forces the logical log to disk.
However, in your case you could instead begin your transactions with:
db.tx().unforced().begin();
Instead of:
db.beginTx();
Which makes that transaction not wait for the logical log to force to disk and makes small transactions much faster, but a power outage could have you lose the last couple of seconds of data potentially.
The tx() method sits on GraphDatabaseAPI, which for example EmbeddedGraphDatabase implements.

you can try a virtual drive. It would make neo4j persist to the drive, but it would all happen in memory
https://thelinuxexperiment.com/create-a-virtual-hard-drive-volume-within-a-file-in-linux/

Why does Hibernate seem to be designed for short lived sessions?

I know this is a subjective question, but why does Hibernate seem to be designed for short lived sessions? Generally in my apps I create DAOs to abstract my data layer, but since I can't predict how the entity objects are going to be used some of its collections are lazy loaded, or I should say fail to load once the session is closed.
Why did they not design it so that it would automatically re-open the session, or have sessions always stay open?

Becuase once you move out of your transaction boundary you can't hit the database again without starting a new transaction. Having long running transactions 'just in case' is a bad thing (tm).
I guess you want to lazy load object from your view - take a look here for some options. I prefer to define exactly how much of the object map is going to be returned by my session facade methods. I find this makes it easier to unit test and to performance test my business tier.

I worked on a desktop app that used EJB and Hibernate. We had to set lazy=false everywhere, because when the objects get serialized, they lose their ability to be fetched from the backend. That's just how it goes, unfortunately.
If you are concerned with performance, you could use caching on the backend so that your non-lazy fetches are not as painful.

You're looking for the OpenSessionInView pattern, which is essentially a conceptual filter (and sometimes implemented as a servlet filter) that detects when a session needs to be transparently reopened. Several frameworks implement this so it handles it automagically.

I'm writing a desktop application so using a filter isn't applicable.

Connections are a scarce resource that need to be recycled as soon as you are done using them. If you are also using connection pooling, getting another one when you need it should be quick. This is the architecture that you have to use to make websites scale -- even though you are a desktop app, their use-cases probably concentrate on scalable sites.
If you look at MS ADO.NET, you will see a similar focus on keeping connections open for a short time -- they have a whole offline model for updating data disconnected and then applying to a database when you are ready.

Hibernate is designed as a way to map Objects to Relational Database tables. It accomplishes that job very well. But, it can't please everybody all of the time. I think there is some complexity in learning how initialization works but once you get the hang of it it makes sense. I don't know if it was necessarily "designed" to specifically to anger you, it's just the way it happened.
If it was going to magically reopen sessions in non-webapps I think the complexity of learning the framework would far outweight the benefits.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.