Concerns with NoSQL/MongoDB - java

I'm starting to build a new Spring-based multi-user document management application and I would like to venture into the world of NoSQL/MongoDB. Coming from a RDBMS background, I have several concerns with MongoDB, primarily:
Lack of transactions
More focused on performance/scalability than data integrity
Lack of a JPA standard
To start with, I do not expect high loads or massive reads vs writes. I suspect reads to writes will be about 10 to 1. Additionally, I do not expect very high loads - especially to start.
1) From what I can tell, there is no easy way to do multi-collection transactions. Where in a RDBMS I can easily have a per-user document ID counter maintained in a separate table, there does not seem to be a way to do this reliably in MongoDB given that it would be in a separate collection/document. Consequently, I'm not sure if/how one resolves this problem.
2) Additionally, from what I have read, NoSQL is great where data integrity isn't the primary concern (ex: blog comments, etc). However, I'm not sure how this translates to being the primary data store for an application. Does this mean that one can update a document and have it fail? I ran across an older unaccredited rant which discusses failed commits/etc which further flames the concerns.
3) The seemingly lack of a JPA-like standard for NoSQL would imply that I have to choose my DB and stick with it. Unlike JPA where I can easily swap one DB vendor for another JPQ/SQL compliant vendor, I have to code with MongoDB in mind and redesign my structure/queries if ever I wanted to switch to another NoSQL DB. I've seen Hibernate OGM, but it seems that it is still very much in its infancy and only provides rudimentary support. Definitely not something that would avoid mongodb specific queries.
Are these concerns easily mitigated? Being new to the NoSQL world, I'm still having trouble understanding the right business case when to use NoSQL.

These are good questions. Here's my 2 cents about MongoDB and some references to help you learn more. I won't speak about any other NoSQL thingies as there are a lot out there and there's no real unifying principle to NoSQL other than "it doesn't use SQL", except sometimes people make it work with SQL, so, yeah.
MongoDB does not do joins. Period. MongoDB does not have transactions - whether within one collection or involving multiple collections. The unit of atomicity is the document. How does this work in an application? Via schema designand some techniques for recovering parts of ACID semantics if necessary, for example using two-phase commits. In relational databases, schema design is straightforward and is based on the structure of the data and not its use case. Joins and transactions fill in the gap between the abstract, normalized data representation and the concrete ways the data will be used. The data modeling intro already linked explains the situation for MongoDB, for contrast:
The key challenge in data modeling is balancing the needs of the application, the performance characteristics of the database engine, and the data retrieval patterns. When designing data models, always consider the application usage of the data (i.e. queries, updates, and processing of the data) as well as the inherent structure of the data itself.
That specific "rant" is clearly very old as it talks about writes being unacknowledged by default. This isn't the case anymore. Given any distributed computer system operating over a network, it's pretty easy to come up with a way for it to behave poorly . The MongoDB blog covered a lot of this stuff in a series on consistency. I'd suggest touring the docs about journaling, replication, and write concern and see if that makes you feel better about MongoDB as a primary data store.
Yup. This comes with the NoSQL territory. What doesn't exist is common data access languages or standards because everything is new and trying to be different. Check back in 30 years.

Related

Fast and safe way to save and retrieve data in java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
In order to add data persistance to an oriented billing software, i wonder what is the best way to save and retrieve data in my situation.
I work with JavaFX's TableView populated by custom objects (with many string, int, booleans, ...), each one representing one bill. The user must be able to add, read and edit data on the fly. Everything is stored locally, no need to use a cloud or something like.
I usually use serialization to write my objects, but is it a safe and fast way to store around 10.000 custom objects ?
Should i use XML, Serialization, a local database (with JavaDB ?) or ... ?
By fast, i mean that the user can write, and edit data. I have no problem with a small loading time when the app is launched.
By safe, i don't mean encrypted, it is safe in the "data won't get lost or corrupted" way.
Eventually if there are multiple solutions, why one over another ?
Any persistence mechanism (flat file, relational db, nosql) can be safe if used as designed or unsafe if abused/misunderstood. Your question is a very open ended question and can get very involved, or be very light.. it all depends.
Typically the choices come down to:
flat file (say binary serialisation, csv, json or xml). Very simple mechanism which takes effort to scale to large files and care must be taken when making changes to the code base; as changes could prevent older files from being readable. One has also got to bare in mind when the data is written in relation to changes coming in from the user and the possibility of a machine crash. ie there are not transactions and so data can corrupt, not a simple topic in its own right. As for which format is best, well many a religious war has been fought over that but typically a textual format (json, xml or csv) has the advantage of being human readable which helps debugging/maintenance tasks. XML and Json support nested structures which is an advantage over CSV. As for performance, text manipulation typically slows the parser down by about 10x compared to a binary one. However there are fast implementations and slow ones, and for 10k objects you are unlikely to notice the difference.
relational database. Very useful for apps that benefit from using relational queries (SQL), and a lot of effort has gone into making them transactional and robust to machine crashes. They are generally the persistence mechanism of choice for large businesses and require some knowledge to setup and maintain the DB itself. H2 is a very simple, low cost entry provider and Oracle is at the other extreme end of the spectrum. Relational databases suffer from a domain mismatch, specifically Object design and SQL design do not map together without some effort from the developer. They also typically suffer from scaling problems as they are not usually clusterable, not a problem for 10k rows.
no sql databases (eg redis, cassandra, mongo, couch, neo4j). Generally not transactional, but they are often faster than the relational dbs and offer clusters from the get-go making them very robust. They also support different data modelling paradigms such as graph, list, document making the NoSQL landscape much richer than the relational SQL one.
I assume that you are not working on a professional project and lack a mentor, so I will wrap up by suggesting that you focus on flat files first and then pick a DB product of some kind to experiment with (H2 is very good for learning relational products, and Mongo or Redis for ease of learning one of the NoSql products).
You can use H2 database (http://www.h2database.com/), it's really convenient way to store data, and you can use embedded database this looks like
Class.forName("org.h2.Driver");
Connection conn = DriverManager.getConnection("jdbc:h2:~/test", "sa", "");
// add application code here
conn.close();
H2 creates file test in user home directory named test.db.
Object serialization is safe. It's not particularly fast and you have to be very careful about how you make changes to the class to ensure that you can consistently deserialize. This is the biggest disadvantage in my opinion about object serialization.
XML (or JSON) aren't bad either. There are binding technologies like JAXB, Jackson or Gson which allow you to seamlessly map objects to XML or JSON. Permissive binding makes these formats easier to use than object serialization, having the additional benefit of being human readable and editable, but with the cost of being more verbose (consider file compression). If your storage format is a giant XML file, you can also search for records using XPath.
JavaDB (or H2 or SQLite) is good in that it implements a relational data store, so you can perform SQL queries on the data. Managing lots of records is much more straightforward with a proper database. You could probably save on disk space, too. I would recommend this approach.
Will there be multiple clients reading these files? In that case, for safety, you would have to implement some kind of file locking scheme to prevent data corruption. You can get around this with some kind of out-of-process data management, using a lightweight server such as H2 or one of the NoSQL datastores like Mongo or Cassandra.

Always 'Entity first' approach, when designing java apps from scratch?

I'm just reading the book here: http://www.amazon.com/Java-Architects-Handbook-Second-Edition/dp/0972954880/ trying to find a strategy about how to efficiently design a (generic) medium to large application (200 tables or more) - for instance a classic, multi-layered, corporate intranet. I'm trying to adapt my past experience (as a database designer, but also OOAD) in order to architect such a java application. From what I've read, if you define your entities first, there is no recommended way to infer your database directly (automatically).
The book says that you would build the entity/object model first (OOAD) and THEN there is the db admin/dev.(?) job to build/infer the database (schema, normalization etc.) based on the entity model already built. If this is the case, I'm afraid the architect/developer could lose control over important aspects - normalization, entity-attribute-value modeling etc.
Perhaps like many older developers (back-end developers, architects etc) I feel more comfortable defining the database schema first - and spending a good amount of time on aspects like normalization etc. While this would be certainly possible nowadays, I'm asking myself if this would become (pretty soon, if not already) the 'old fashioned way' and not the norm - as a classic/recommended approach when designing applications from scratch.
I know Entity Framework (.NET) already have these approaches explicitly defined - 'entities first', 'database first', 'code first' and and these could be mixed, if necessary. I surely know that they recommend 'entity first' for newly designed apps, and 'database first' if you have already defined database schema (which is the case for many older applications, when migrating etc. I'm just asking if there is something similar for the java world.
So, the questions are: (although I know there is no silver bullet etc.)
'Entities first' for newly built apps - this is the norm nowadays?
What tools do you use (if any) in order to assist inferring db schema process? - your experience, pros & cons with concrete UML
tools etc.
What if you have parts/older/sub-domain database schema (which you'd want to preserve, mainly)? In such case, you would infer entities model from
database and then refactor the model using your preferred UML tool?
From labor force perspective (let's say for db of 200-500 tables): what is the best approach: for instance, to have 2 different people
involved in designing OOAD/entities and database respectively,
working together with an architect?
As you expect - my answer is it depends.
The problem is that there are so many possible flavours and dimensions to a good design you really need to take the widest view possible first.
Ask yourself some of the big questions:
Where is the core of the system? Is the database really the core or is it actually just a persistence layer for the code. It could also perhaps be that the database is the core and the code is really just a snazzy UI on the data. There can also be a mix - where some of the tables are core along with some of the entities.
What do you see in the future? Remember that there are developments going on as we speak that are moving database technology rapidly forward. There are some databases that are all in-ram. Some are designed for a distributed architecture. Some are primarily cloud. If you build your schema first you risk locking yourself in to a certain technology.
What scale do you want to achieve? By insisting on a specific database you may be closing doors to perhaps hand-held presence.
I generally find entity first as the best initial approach because you can always derive a schema from the entities and some meta-data. It is certainly possible to go schema first and grow the entities out of the schema but that way you generally find the database influences the design too much.
1) I've done database first in the past but now I usually do Entity first but that's mainly because of the tools I'm using in creating the applications. Entity first has a few good advantages over trying to match your entities to your defined schema later. You're also not locking yourself to tightly to your schema. What your application is for matters alot as well, if it's just a basic CRUD application, write once read many or does it actually 'do' something that will inform your choice over how to architect your application.
2) I use hibernate a lot which encourages creating your model first, designing all your entities etc and then generating the schema from that, hibernate can generate your whole schema from the models you've created (though you may need to tweak them to make sure they're not crazy). If you have 200 entities in your model then you probably want to do a significant amount of UML modelling ahead of time to ensure your model is consistent.
3) If you're working with partially legacy database then it can sometimes be good to fall into line with the schema design for that so your entities and schema are consistent. It can be a bit of a pain but then so it trying to explain why part of your app is just different to other parts. So yes I would probably infer my entities from the schema in that case. But again if it was totally crazy then it may be to do some very specific DAO code to hide that part of the schema from that app and pretend it's not there.
4) I can't really give you a good answer on this as I'm not sure what you're driving at really. Once you have the design standards for your schema it's turning the handle to crank it out.
So after all that my answer is 'It depends'
While the answers already posted cover a lot of points - and ultimately, all answers probably have to all sum up to "it depends" - I'd like to expand on a point that's been touched on already.
My focus is on data - I'm a business intelligence and data warehousing developer, and I deal with issues like data quality, data governance, having a set of master data, etc. To this end, I have to pull data from other systems - data which is in varying conditions.
When considering whether the core of your system is really the database or the front end (as suggested by OldCurmudgeon), I strongly suggest thinking outside of your own area. I have seen and heard about many systems where it's clear that the database has been treated as an afterthought (sometimes created via an entity-first model, but also sometimes hand-built), despite the fact that most of the business value is in the data. More and more companies are of course realising that their data is valuable and are adopting tools to make use of it - but it's difficult to do if poor transactional databases mean that data has been lost, was never saved in the first place, has been overwritten when a history is needed, or is inconsistent.
While I don't want to do myself and others with similar roles out of a job: If the data that a system you're working on is or might be valuable, if there's any reason it might be accessed by anything other than the front end you're creating, then it is worth the time and effort to create a sound data model to hold it. If the system is for an organisation or is going to be sold to organisations, there's a decent chance they'll want to report out of it, will want to run output from it into a data warehouse or other data stores, and will want to carry out analysis on the data it creates and holds.
I don't know enough about tools like Hibernate to know if it's possible to both use them to work in an entity-first manner and still create a good quality database, but I know that I have come across some problematic databases created in this manner. At the very least, as has been suggested, if you are going to work that way, make sure it is producing something sane and perhaps adjust it where necessary to maintain data integrity. If data integrity is a key requirement and you cannot get such a tool to create a suitable database that will ensure data integrity, then perhaps consider going back to doing things the "old fashioned" way.
I would also suggest that there's real value in developers working alongside any data specialists, analysts, architects, etc. they may have as colleagues to do some up-front modelling, even if the system they then produce uses entity-first and even if it veers away from the more conceptual models produced early on for technical reasons. I have seen many baked-in problems in systems which have been caused by a lack of understanding of the wider business entities and relationships, and which could have been avoided if time had been spent understanding the overall structure in this way. I've been personally responsible for building those problems when I was an application developer myself, so this shouldn't be read as criticism of front-end developers - just a vote in favour of cross-functional and collaborative analysis and modelling before development approaches and designs are decided.

Disadvantages of Object Relational Mapping

I am a fan of ORM - Object Relational Mapping and I have been using it with Rails for the past year and a half. Prior that, I use to write raw queries using JDBC and make Database do the heavy lifting via Stored Procedures. With ORM, I was initially happy to do stuff like coach.manager and manager.coaches which were very simple and easy to read.
But as time went by there were in-numerous associations creeping up and I ended up doing a.b.c.d which were firing queries in all directions, behind the scenes. With rails and ruby, the garbage collector went nuts and took insane time to load a very complex page which involves relatively lesser data. I had to replace this ORM style code by a simple Stored procedure and the result I saw was enormous. A page that took 50 seconds to load now takes only 2 seconds.
With this huge difference, should I continue using ORM? It is very clear it has severe overheads compared to a raw query.
In general, what are the general pitfalls of using an ORM framework like Hibernate, ActiveRecord?
An ORM is only a tool. If you don't use it correctly, you'll have bad results.
Nothing stops you from using dedicated HQL/criteria queries, with fetch joins or projections, to return the information that your page must display in as few queries as possible. This will take more or less the same time as dedicated SQL queries.
But of course, if you just get everything by ID and navigate through your objects without realizing how many queries it generates, it will lead to long loading times. The key is to know exactly what the ORM does behind the scene, and decide if it's appropriate or if another strategy must be adopted.
I think you've already identified the major tradeoff associated with ORM software. Every time you add a new layer of abstraction that tries to provide a generalized implementation of something that you used to do by hand there is going to be some loss of performance/efficiency.
As you noted, traversing multiple relationships such as a.b.c.d can be inefficient, because most ORM software will be doing an independent database query for each . along the way. But I'm not sure that means you should eliminate ORM altogether. Most ORM solutions (or at least, certainly Hibernate) allow you to specify custom queries where you can bring back exactly what you want in a single database operation. This should be about as fast as your dedicated SQL.
Really the issue is about understanding how the ORM layer is working behind the scenes, and realizing that while something like a.b.c.d is simple to write, what it causes the ORM layer to do as it is evaluated is not. As a general rule I always go with the simplest possible approach to begin, and then write optimized queries in areas where it makes sense/where it is obvious that the simple approach will not scale.
I'd say, one should use the appropriate tool for different tasks.
E.g., for CRUD operations, ORM frameworks like Hibernate can speed up development and it will perform well enough. Sometimes you need to do some necessary tweaks to achieve acceptable performance. I'm not sure, your task (what took 50 sec with Hibernate) could not be done properly with Hibernate, because you did not provide us with the details.
On the other hand, for example bulk operations involving hundreds of thousands of records is not the type of task you'd expect Hibernate will do without significant performance penalty.
As it was mentioned already, ORM is only a tool and you can use it eiter good or bad.
One of the most typical performance problems in ORMs is 1+N queries problem. It is caused by loading additional objects for each of objects from the list. This is caused by eager fetch of 1-to-n-relation entities for each element on list, the dealing is using HQL queries, specifying fields in projection or marking fetching 1-to-n relations to lazy.
Any time, you must exactly know what the ORM is doing in order to achieve good performance. Not understanding what operations are done in background is a way to disaster (slow, buggy and hard to analyze code because of unnecessary and wrongly written work-arounds).
I'm with Petar from your comments regarding the lazy fetching. Say you have an html table filled fields from object a.b.c.d. You could find your framework round-tripping the database thousands of times(possibly many more) . The disadvantage of ORM in this case is you have to read the documentation thoroughly. Most frameworks support disabling lazy fetching and many even support adding your own processing logic to bind the data set.
The net out is that almost any ORM is almost undoubtedly better than anything you are going to write yourself. You will find yourself saddled with maintaining huge libraries of boilerplate or worse writing the same code over and over again.
We are currently investigating to switch from our own data store layer with clean separation of transfer objects and data access objects to JPA. We used a generator to create the TOs, the DAOs and the SQL DDL as well from some documentation in docbook format. By this all of our stuff from documentation, the database structure and the generated Java classes where always in sync with a good documentation of the database itself.
What we discovered so far by using JPA:
Foreign key references cannot be used for imports, some special
queries and so on because they must not be placed in a managed
entity. JPA only allows the target class there.
Access to some user session scope is difficult upto impossible. We
still have no clue how to get the users id into the column
'userWhoLastMadeAnUpdate' in some PrePersist method.
Something expected to be quite easy with an ORM, namely "class
mapping" does not work at all. We are using HalDateTime
(http://sourceforge.net/projects/haldatetime/) internally.
Especially in the client. Mapping it with JPA directly is not
possible although HalDateTime supports it. Due to JPA restrictions
we have to use two fields in the entity.
JPA uses either one XML file to describe the mapping. So you have to
look at least into two files to even understand the relationship
between the Java class and the database. And the XML file becomes
huge for large applications.
Alternatively ORMs provide annotations in the Java class itself. So
its easier to learn and understand the relationship. But it forces
you to see all that database stuff in the client layer (which
completely breaks a proper layering).
You will have to restrict yourself to stay as close to a clean
database structure as anyhow possible. Otherwise you will for sure
end up with a mess of queries and statements by the ORM.
Use an ORM which provides a query language which is close to SQL
itself (JPA seems quite acceptable here). An ORM induced language
makes supporting a large application really expensive.

Migrating from hand-written persistence layer to ORM

We are currently evaluating options for migrating from hand-written persistence layer to ORM.
We have a bunch of legacy persistent objects (~200), that implement simple interface like this:
interface JDBC {
public long getId();
public void setId(long id);
public void retrieve();
public void setDataSource(DataSource ds);
}
When retrieve() is called, object populates itself by issuing handwritten SQL queries to the connection provided using the ID it received in the setter (this usually is the only parameter to the query). It manages its statements, result sets, etc itself. Some of the objects have special flavors of retrive() method, like retrieveByName(), in this case a different SQL is issued.
Queries could be quite complex, we often join several tables to populate the sets representing relations to other objects, sometimes join queries are issued on-demand in the specific getter (lazy loading). So basically, we have implemented most of the ORM's functionality manually.
The reason for that was performance. We have very strong requirements for speed, and back in 2005 (when this code was written) performance tests has shown that none of mainstream ORMs were that fast as hand-written SQL.
The problems we are facing now that make us think of ORM are:
Most of the paths in this code are well-tested and are stable. However, some rarely-used code is prone to result set and connection leaks that are very hard to detect
We are currently squeezing some additional performance by adding caching to our persistence layer and it's a huge pain to maintain the cached objects manually in this setup
Support of this code when DB schema changes is a big problem.
I am looking for an advice on what could be the best alternative for us. As far as I know, ORMs has advanced in last 5 years, so it might be that now there's one that offers an acceptable performance. As I see this issue, we need to address those points:
Find some way to reuse at least some of the written SQL to express mappings
Have the possibility to issue native SQL queries without the necessity to manually decompose their results (i.e. avoid manual rs.getInt(42) as they are very sensitive to schema changes)
Add a non-intrusive caching layer
Keep the performance figures.
Is there any ORM framework you could recommend with regards to that?
UPDATE To give a feeling of what kind of performance figures we are talking about:
The backend database is TimesTen, in-memory database that runs on the same machine as the JVM
We found out that changing rs.getInt("column1") to rs.getInt(42) brings the performance increase we consider significant.
If you want a standard persistence layer that lets you issue native SQL queries, consider using iBATIS. It's a fairly thin mapping between your objects and SQL. http://ibatis.apache.org/
For caching and lazy joins, Hibernate might be a better choice. I haven't used iBATIS for these purposes.
Hibernate provides a lot of flexibility in allowing you to specify certain defaults for lazy loading as you traverse your object graph, yet also pre-fetch data with SQL or HQL queries to your heart's content when you need better-known load times. However, the conversion effort will be complicated for you as it has a fairly high bar to entry in terms of learning and configuration. Annotations made this easier for me.
Two benefits you didn't mention about switching to a standard framework:
(1) running down bugs becomes easier when you have a wealth of sites and forums out there to support you.
(2) new hires are cheaper, easier and faster.
Good luck in addressing your performance and usability issues. The tradeoffs you point out are very common. Sorry if I evangelized.
For the bulk of your queries, I'd go with hibernate. It's widely used,well documented, and generally performant. You can drop down to hand-written SQL if hibernate isn't producing efficient enough queries. Hibernate gives you a lot of control in specifying the table names and columns that the domain objects map to, and in most cases you can retro fit it to an exisitng schema.
Find some way to reuse at least some of the written SQL to express mappings
The mappings are expressed in JPA using annotations. You can use the existing SQL as a guide when creating JPQL queries.
Add a non-intrusive caching layer
Caching in hibernate is automatic and transparent, unless you specifically choose to get involved. You can mark entities as read only, or evict from the cache, control when changes are flushed to the database (inside a transaction of course - automatic use of batching improves performance when network latency is a concern.)
Have the possibility to issue native
SQL queries without the necessity to
manually decompose their results (i.e.
avoid manual rs.getInt(42) as they
are very sensitive to schema changes)
Hibernate allows you to write SQL, and have this mapped to your entities. You don't deal with the ResultSet directly - hibernate takes care of the deconstruction into your entity. See Chpt 16, Native SQL in the hibernate manual.
Support of this code when DB schema changes is a big problem.
Managing schema changes can still be a pain, since you now effectively have two schemata - the database schema and the JPA mapping (an object schema). if you choose to let hibernate generate the db schema and move your data to that, you are no longer directly responsible for what goes into the database, and so you are then faced with manging automatic changes to a machine generated schema. There are tools that can assist, such as dbmigrate, and liquibase, but it's no walk in the park. Conversely, if you are managing the db schema by hand, then you will have to carefully recraft your entities, JPA annotations and queries to accomodate the schema changes. Adding columns and new entities is relatively trivial, but more complex changes such as changing a single property to a collection of properties, or restructing an object hierarchy will involve considerably more extensive changes. There is no easy way out of this - either the db or hibernate is the "master" that decides the schema, and when one changes, the other must follow. The code changes aren't so bad - in my experience, it's migrating the data that's difficult. But this is a basic issue with databases, and will be present in any solution you choose.
So, to sum up, I'd go with hibernate, and use the JPA interface.
I've recently drilled through a bunch of Java ORMs and didn't come up with anything much better than Hibernate. Hibernate's performance may get you there and satisfy your performance goals.
Lots of people think that moving to Hibernate will make everything so awesome, but it's really just moving a set of problems from JDBC queries into Hibernate tuning. Read a bunch of books or (better) hire a "Hibernate guy" to come in and help.
During your refactor, I'd recommend using JPA so you can un-plug and re-plug a new persistence provider when the Next Big Thing comes along (or you move to Oracle)
Do you really need to migrate? What's forcing you to move? Is there some REAL need here or someone just inventing work (an 'Astronaut architect')?
I agree with the above answers though - if you HAVE to move - Hibernate or iBatis are good choices. iBatis especially if you want to stay 'closer' to the SQL.
If you need more performance: drop the database (for on-line work) and handle the persistence direct. Adding caching is not going to help you with a TimesTen DB, it just adds an extra copy (slowing you down).
You might want to take a look at GemFire.
There is a lot of good advice already in here that I won't repeat. The only thing I didn't see suggested that might work for you is caching reference data in memory.
I have done quite a bit of this in the past and it does save a lot of time. If you have a large number of fairly static reference tables, load them all into memory at startup time and refresh them every couple minutes. That way you're not hitting the DB over and over again for data that never changes.

hibernate object vs database physical model

Is there any real issue - such as performance - when the hibernate object model and the database physical model no longer match? Any concerns? Should they be keep in sync?
Our current system was original designed for a low number of users so not much effort was done to keep the physical and objects in sync. The developers went about their task and the architects did not monitor. Now that we are in the process of rewriting/importing the legacy system into the new system, a concern has been raised in that the legacy system handles a lot of user volume and might bring the new system to its knees.
Update 20090331
From Pete's comments below - the concern was about table/data relationships in the data layer vs the object layer. If there is no dependencies between the two, then there is no performance hits if these relationships do not match? Is that correct?
The concern from my view is that the development team spends a lot of time "tuning" the hibernate queries/objects but nothing at the database layer to improve the performance of the application. I would have assumed that they would tune at both layers.
Could these issue be from just a poor initial design of the database to begin with and trying to cover/make up the difference by the use of Hibernate?
(I am new to this project so playing catchup)
Update: in response to comment: It is CRUCIAL that the database be optimized in addition to the Hibernate use. When you think about it, after all the work hibernate does, in the end it is just querying the database. If the database doesn't perform well (wrong or missing indexes, poorly set up table spaces, etc) it doesn't matter how much you tune Hibernate. On the flip side if your database is set up well but Hibernate isn't (perhaps the caching is not set up properly, etc., and you are going back to the database a lot more then you need to) then performance will suffer as well. It is always important to tune the system end to end, but start at the foundation (database) and work up.
End Update
I'm curious what you mean about 'don't match' - do you mean columns have been added to tables that aren't represented in the hibernate data objects? Tables have been added? I don't think anything like that would affect performance (more likely data integrity if you are not inserting/updating all columns)
In general, the goal of the object model should NOT be match the database schema verbatim. You want to abstract the underlying data complexity / joins / normalization, that is the whole point of using something like Hibernate.
So for example lets say you have (keeping things very simple) 'orders' and 'order items',
your application code should be able to do something like
order.getItems()
without having to know that underneath it is a one to many relationship. The details in your hibernate code control how the load is done (lazy, caching, etc).
If that doesn't answer your question then please provide more detail
You could of course code your abstraction layer in asm - "might" (awful word for a developer) be faster.
This is premature optimization - maybe breaking a clean project-layout.
As in the hibernate-manual - optimization can look different ways - plain coding some parts "might" be part of it.
It's certainly possible that the changes you describe could cause performance problems.
I would have thought that this should have been part of the design spec.
So when you're coding it, you bear the performance critiera in mind.
The only way to really know though is to load the data onto a test environment, and run some tests.
This should definately be done before going live, as it might produce some quite interesting results.

Categories