How to avoid loading duplicate objects into main memory?

How to avoid loading duplicate objects into main memory? - java

Suppose I am using SQL and I have two tables. One is Company, the other is Employee. Naturally, the employee table has a foreign key referencing the company he or she works for.
When I am using this data set in my code, I'd like to know what company each employee works for. The best solution I've thought of it to add an instance variable to my Employee class called Company (of type Company). This variable may be lazy-loaded, or populated manually.
The problem is that many employees work for the same company, and so each employee would end up storing a completely identical copy of the Company object, unnecessarily. This could be a big issue if something about the Company needs to be updated. Also, the Company object would naturally store a list of its employees, therefore I could also run into the problem of having an infinite circular reference.
What should I be doing differently? It seems object oriented design doesn't work very well with relational data.
This is more of a design/principles sort of question, I do not have any specific code, I am just looking for a step in the right direction!
Let me know if you have any questions.

Do not try design your business objects to mirror database schema.
Design objects to serve your business requirements.
For example in case when you need to display list of employees without company information, you can create function which retrieve only required information from database to the object
public class EmployeeBasicInfo
{
public int Id;
public string Name;
}
For next requirements you need a list of employees with full information - then you will have function which retrieve full data from database
public class Employee
{
public int Id;
public string Name;
public int Age;
public CompanyBasicInfo Company;
}
Where Company class will not have collection of employees, but will have only information required for Employee class.
public class CompanyBasicInfo
{
public int Id;
public string Name;
}
Of course in last case you end up with bunch of different Company objects which will have same data. But it should be Ok.
If you afraid that having same copy of data in different object will cause a performance problem, it will not until you will load millions of employees - which should be good sign of something gone wrong in your application design.
Of course in situation where you actually need to load millions of employees - then you can use approach that class which loads employees - will first load all companies in the Map<int, Company>, and then when loading employees you will refer same Company instance for employees.

Am I really the only person who is running into this issue? There must be some way to do this without relying on lazy-loading every property.
This problem has been solved many times before already. Avoid re-inventing the wheel by using any of the widely available ORM frameworks.
In a database table, the primary key identifies a record; in a running application, the reference tracks an object; and, at an even lower abstraction, a memory address points to the bytes that represent that object.
When you initialise an object and assign it to a variable, the variable is sufficient to track the object in memory so that you can subsequently access it. However, in the database layer, a primary key is needed to locate the record in a database table. Therefore, to bridge the gap between the relational model and the object model, the artificial identifier property is required in your object.

Related

"Encapsulation helps make sure clients have no dependence on the choice of representation"

I am familiar with the concept of encapsulation. However, recently I have found the following statement regarding encapsulation (which is according to the author correct):
Encapsulation helps make sure clients have no dependence on the choice of representation
Could you please give me a hint what is meant by clients and the choice of representation. Many thanks in advance.

What the author is trying to explain is the fact that encapsulation allows you to modify the inner representation of the data in some without have any side effect in the clients. The clients could be any other classes that are using yours, and the choice of representation the way you decide to store the data in your class.
As an example, imagine that you have a class where you store the Employees of some Company. It could be something like this:
public class Company {
private List<Employee> employees;
public List<Employee> getEmployees() {
return this.employees;
}
public Employee getEmployee(String employeeId) {
//search for employee
}
}
You store the employees of the company in a List, and provide two methods, one that retrieves all the employees, and another that searches for a given one. Some day you realize that maybe a Set or a Map would be a better structure to store the Employees, and you decide to refactor the code. As long as you have provided methods to retrieve the information of the employees, instead of giving direct access to the "employees" structure, you could change the implementation of these functions to fit with the new definition of your class and your clients wouldn't notice those changes.

Object Relational mapping and performance

I am currently working on a product that works with Hibernate (HQL) and another one that works with JPQL. As much as I like the concept of the mapping from a relational structure (database) to an object (Java class), I am not convinced of the performance.
EXAMPLE:
Java:
public class Person{
private String name;
private int age;
private char sex;
private List<Person> children;
//...
}
I want to get attribute age of a certain Person. A person with 10 children (he has been very busy). With Hibernate or JPQL you would retrieve the person as an object.
HQL:
SELECT p
FROM my.package.Person as p
WHERE p.name = 'Hazaart'
Not only will I be retrieving the other attributes of the person that I don't need, it will also retrieve all the children of that person and their attributes. And they might have children as well and so on... This would mean more tables would be accessed on database level than needed.
Conclusion:
I understand the advantages of Object Relational Mapping. However it would seem that in a lot of cases you will not need every attribute of a certain object. Especially in a complex system. It would seem like the advantages do not nearly justify the performance loss. I've always learned performance should be the main concern.
Can anyone please share their opinion? Maybe I am looking at it the wrong way, maybe I am using it the wrong way...

I'm not familiar with JPQL, but if you set up Hiernate correctly, it will not automatically fetch the children. Instead it will return a proxy list, which will fetch the missing data transparently if it is accessed.
This will also work with simple references to other persistent objects. Hibernate will create a proxy object, containing only the ID, and load the actual data only if it is accessed. ("lazy loading")
This of couse has some limitations (like persistent class hierarchies), but overall works pretty good.
BTW, you should use List<Person> to reference the children. I'm not sure that Hibernate can use a proxy List if you specify a specific implementation.
Update:
In the example above, Hibernate will load the attributes name, age and sex, and will create a List<Person> proxy object that initially contains no data.
Once the application accesses calls any method of the List that requires knowledge of the data, like childen.size() or iterates over the list, the proxy will call Hibernate to read the children objects and populate the List. The cildren objects, being instances of Person, will also contain a proxy List<Person> of their children.
There are some optimizations hibernate might perform in the background, like loading the children for other Person objects at the same time that might be in this session, since it is querying the database anyways. But whether this is done, and to what extend, is configurable per attribute.
You can also tell hibernate to never use lazy-loading for certain references or classes, if you are sure you'll need them later, or if you continue to use the persistent oject once the session is closed.
Be aware that lazy loading will of course fail if the session is no longer active. If for example you load a Person oject, don't access the children List, and close the session, a call to children.size() for example will fail.
IIRC the hibernate session class has method to populate all not-yet-loaded references in a persistent oject, if needed.
Best read the hibernate documentation on how to configure all this.

How to acess object fields/properties from another object contained in that object efficiently

I am trying to design a problem that is similar to this, I have an object that contains another objects as in the example below. I am trying to figure out a way to acess the first object properties from the second object without having to replicate properties unnecessarily between them which doesnt sound like a good programming practice.
For example:
class employee{
String name;
int age;
}
class company{
String companyName;
List<employee> employeeList;
}
My question is, given an employee object, how can I accesss the companyName that this employee works for? One solution is to add companyName to each employee object but that seems to be a redundant waste of memory as it is "guaranteed" that each employee works for only one company.

The way you have it now you have a unidirectional dependency. In other words, company depends on employee. However, employee knows nothing about the company it belongs to.
You can simply add a company field to employee and every time you create an employee or add it to a company's employeeList, remember to update both sides of the now bidirectional dependency.
Note that Java naming conventions state that type names should start with a capitalized letter.

C++: You could give the employee a company pointer, passed as part of the employee constructor and access companyName through that.

Is it good practice to use domain objects as keys?

Is is good practice to use domain objects as keys for maps (or "get" methods), or is it better to just use the id of the domain object?
It's simpler to explain with an example. Let's say I have Person class, a Club class, and a Membership class (that connects the other two). I.e.,
public class Person {
private int id; // primary key
private String name;
}
public class Club {
private String name; // primary key
}
public class Membership {
private Person person;
private Club club;
private Date expires;
}
Or something like that. Now, I want to add a method getMembership to Club. The question is, should this method take a Person object:
public Membership getMembership(Person person);
or, the id of a person:
public Membership getMembership(int personId);
Which is most idiomatic, which is most convenient, which is most suitable?
Edit: Many very good answers. I went with not exposing the id, because the "Person" (as you might have realized, my real domain does not have anything to do with people and clubs...) instances are easily available, but for now it is internally stored in a HashMap hashed on the id - but at least I am exposing it correctly in the interface.

Don't use the id's man, this is just a bad idea for all the reasons mentioned. You'll lock yourself into a design. Let me give an example.
Right now you define you're Membership as a mapping between Clubs to People. Rightfully, your Membership should be a map of Clubs to "Members", but you are assuming that all Members are People and that since all of the people id's are unique you think you can just use the ID.
But what if in the future you want to extend your membership concept to "family memberships", for which you create a Family table and a Family class. In good OO fashion you extract an interface of Family and Person called Member. As long as both classes implement the equals and hashCode methods properly, no other code will have to be touched. Personally, I would have defined the Member interface right up front.
public interface Member {
}
public class Person implements Member {
private int id; // primary key
private String name;
}
public class Family implements Member {
private int id;
private String name;
}
public class Club {
private String name; // primary key
}
public class Membership {
private Member member;
private Club club;
private Date expires;
}
If, you had used ID's in your interface, you will either need to enforce cross-table uniqueness of key values, or maintain two separate Maps and forgo the nice polymorphic interface stuff.
Believe me, unless you are writing one-off, disposable applications, you want to avoid using ID's in your interface.

Assuming this is a database ID or something used just for indexing (rather than something like an SSN), then in an ideal system, the presence of an ID is an implementation detail.
As an implementation detail, I would prefer to hide it in the interface of other domain objects. Thus, membership involves, fundamentally, individuals rather than numbers.
Of course, I'd make sure I implemented hashCode and equals() and documented well what they meant.
In that case, I would explicitly document that the equality of two Person objects is determined solely based on ID. This is somewhat a risky proposition, but makes code more readable if you can ensure it. I feel more comfortable making it when my objects are immutable, so I would not actually end up with two Person objects with the same ID but different names in the lifetime of the program.

I think the first case would be considered "purer" in the sense that the getMembership method might require more specific data from the person itself other than its id (Let's assume you do not know the internals of the getMembership method, even though this makes little sense since it's most likely in the same domain).
If it turns out that it actually requires data from the Person entity then it will not require a DAO or factory for the person in question.
This can be easily called if your language and/or ORM allows you to use proxy objects (and if you have a convenient way to create these proxies).
But lets be honest. If you're inquiring about some membership of a person, you most likely already have this Person instance in memory at hand when you call this method.
Further down the road in the "infrastructure land" there's also this notion about implementation details which Uri already mentioned while I was writing this answer (damn, that was fast bro'!). To be specific, what if you decided that this 'Person' concept suddenly has a composite primary key/identifier in the underlying database... Would you now use an identifier class? Perhaps use that proxy we were talking about?
TL;DR version
Using ID's is really easier in the short run, but if you're already using a solid ORM, I see no reason not to use proxies or some other means to express the object oriented identity of an Entity which doesn't leak implementation details.

If you are really practicing object oriented design, then you want to invoke the idea of information hiding. As soon as you start hanging internal field types of the person object in the public interface of the membership object's methods, you start forcing external developers (users) of your objects to start learning all kinds of information about what a person object is, and how it is stored, and what kind of ID it has.
Better yet, since a person can have memberships, why don't you just hang the "getMemberships" method onto the person class. It seems much more logical to ask a person which memberships they have, than to ask a "membership" which clubs a given person may belong to...
Edit - since the OP has updated to indicate that it is the membership itself that he is interested in, and not just used as a relation between Person and Club, I'm updating my answer.
Long story short, the "Club" class that you are defining, you are now asking to behave as a "club roster". A club has a roster, it isn't is a roster. A roster could have several features, including ways to look up persons belonging to the club. In addition to looking up a person by their club ID, you might want to look them up by SSN, name, join date, etc.. To me, this says there is a method on class "Club" called getRoster(), which returns a data structure that can lookup all the persons in the club. Call it a collection. The question then becomes, can you use the methods on pre-existing collections classes to fulfill the needs you have defined so far, or do you need to create a custom collection subclass to provide the appropriate method to find the membership record.
Since your class heirarchy is most likely backed by a database, and you are probably taking about loading info out of the database, and don't necessarily want to get the entire collection just to get one membership, you may want to create a new class. This class could be called as I said "Roster". You would get the instance of it from the getRoster() call on class "club". You would add "searching" methods to the class based on any search criteria you wanted that was "publicly available" information about the person.. name, clubID, personID, SSN, etc...
My original answer only applies if the "membership" is purely a relation to indicate which clubs which persons belong to.

IMO, I think it very much depends on the flow of the application - do you have the Person available when you want to get the Membership details? If yes, go with:
public Membership getMembership(Person person);
Also, I don't see any reason why the Club cannot keep track of memberships based on the Person's ID and not the actual object - I think that would mean you don't need to implement the hashCode() and equals() methods. (Although that is always a good best-practice).
As Uri said, you should document the deceleration that two Person objects are equal if their ID is equal.

Whoa. Back up a sec here. The getMembership() method doesn't belong in Club. It belongs to the set of all memberships, which you haven't implemented.

I would probably use IDs. Why? By taking IDs, I'm making safer assumptions about the caller.
If I have an ID, how much work is it to get the Person? Might be 'easy', but it does require hitting a datastore, which is slow...
If I have Person object, how much work is it to get the ID? Simple member access. Fast and available.

As described by others: use the object.
I work on a system where we had some old code that used int to represent transaction ids. Guess what? We started running out of ids because we used int.
Changing to long or BigNumber proved tricky because people had become very inventive with naming. Some used
int tranNum
some used
int transactionNumber
some used
int trannNum
(complete with spelling mistakes).
Some people got really inventive...
It was a mess and sorting it out was a nightmare. I ended up gping through all of the code manually and converting to a TransactionNumber object.
Hide the details wherever possible.

I would typically stick with less is more. The less information required to invoke your method the better. If you know the ID, only require the ID.
If you want, provide extra overloads which accept extra parameters, such as the entire class.

If you already have the object, there's no reason to pull out the ID to get a hash key.
As long as the IDs are always unique, implement hashCode() to return the ID, and implement equals() as well.
Odds are every time you'll need the Membership, you'll already have the Person, so it saves code and confusion later.

First of all I'd put any getters of such nature inside a DAO (and not on the model). Then I'd use the entity itself as a parameter, and what happens inside the method is an implementation detail.

Unless there's a significant benefit derived elsewhere, it can be said that keys in map should single-valued things, if at all possible. That said, through paying attention to equals() and hashCode() you can make any object work as key, but equals() and hashCode() aren't very pleasing things to have to pay attention to. You'll be happier sticking to IDs as keys.

Actually, what I would do is call it by id, but refactoring a bit the original design:
public class Person {
private int id; // primary key
private String name;
}
public class Club {
private String name; // primary key
private Collection<Membership> memberships;
public Membership getMembershipByPersonId(int id);
}
public class Membership {
private Date expires;
private Person person;
}
or
public class Person {
private int id; // primary key
private String name;
private Membership membership;
public Membership getMembership();
}
public class Club {
private String name; // primary key
private Collection<Person> persons;
public Person getPersonById(int id);
}
public class Membership {
private Date expires;
}

Extending JPA entity data at runtime

I need to allow client users to extend the data contained by a JPA entity at runtime. In other words I need to add a virtual column to the entity table at runtime. This virtual column will only be applicable to certain data rows and there could possibly be quite a few of these virtual columns. As such I don't want to create an actual additional column in the database, but rather I want to make use of additional entities that represent these virtual columns.
As an example, consider the following situation. I have a Company entity which has a field labelled Owner, which contains a reference to the Owner of the Company. At runtime a client user decides that all Companies that belong to a specific Owner should have the extra field labelled ContactDetails.
My preliminary design uses two additional entities to accomplish this. The first basically represents the virtual column and contains information such as the field name and type of value expected. The other represents the actual data and connects an entity row to a virtual column. For example, the first entity might contain the data "ContactDetails" while the second entity contains say "555-5555."
Is this the right way to go about doing this? Is there a better alternative? Also, what would be the easiest way to automatically load this data when the original entity is loaded? I want my DAO call to return the entity together with its extensions.
EDIT: I changed the example from a field labelled Type which could be a Partner or a Customer to the present version as it was confusing.

Perhaps a simpler alternative could be to add a CLOB column to each Company and store the extensions as an XML. There is a different set of tradeoffs here compared to your solution but as long as the extra data doesn't need to be SQL accessible (no indexes, fkeys and so on) it will probably be simple than what you do now.
It also means that if you have some fancy logic regarding the extra data you would need to implement it differently. For example if you need a list of all possible extension types you would have to maintain it separately. Or if you need searching capabilities (find customer by phone number) you will require lucene or similar solution.
I can elaborate more if you are interested.
EDIT:
To enable searching you would want something like lucene which is a great engine for doing free text search on arbitrary data. There is also hibernate-search which integrates lucene directly with hibernate using annotations and such - I haven't used it but I heard good things about it.
For fetching/writing/accessing data you are basically dealing with XML so any XML technique should apply. The best approach really depends on the actual content and how it is going to be used. I would suggest looking into XPath for data access, and maybe look into defining your own hibernate usertype so that all the access is encapsulated into a class and not just plain String.

I've run into more problems than I hoped I would and as such I decided to dumb down the requirements for my first iteration. I'm currently trying to allow such Extensions only on the entire Company entity, in other words, I'm dropping the whole Owner requirement. So the problem could be rephrased as "How can I add virtual columns (entries in another entity that act like an additional column) to an entity at runtime?"
My current implementation is as follow (irrelevant parts filtered out):
#Entity
class Company {
// The set of Extension definitions, for example "Location"
#Transient
public Set<Extension> getExtensions { .. }
// The actual entry, for example "Atlanta"
#OneToMany(fetch = FetchType.EAGER)
#JoinColumn(name = "companyId")
public Set<ExtensionEntry> getExtensionEntries { .. }
}
#Entity
class Extension {
public String getLabel() { .. }
public ValueType getValueType() { .. } // String, Boolean, Date, etc.
}
#Entity
class ExtensionEntry {
#ManyToOne(fetch = FetchType.EAGER)
#JoinColumn(name = "extensionId")
public Extension getExtension() { .. }
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(name = "companyId", insertable = false, updatable = false)
public Company getCompany() { .. }
public String getValueAsString() { .. }
}
The implementation as is allows me to load a Company entity and Hibernate will ensure that all its ExtensionEntries are also loaded and that I can access the Extensions corresponding to those ExtensionEntries. In other words, if I wanted to, for example, display this additional information on a web page, I could access all of the required information as follow:
Company company = findCompany();
for (ExtensionEntry extensionEntry : company.getExtensionEntries()) {
String label = extensionEntry.getExtension().getLabel();
String value = extensionEntry.getValueAsString();
}
There are a number of problems with this, however. Firstly, when using FetchType.EAGER with an #OneToMany, Hibernate uses an outer join and as such will return duplicate Companies (one for each ExtensionEntry). This can be solved by using Criteria.DISTINCT_ROOT_ENTITY, but that in turn will cause errors in my pagination and as such is an unacceptable answer. The alternative is to change the FetchType to LAZY, but that means that I will always "manually" have to load ExtensionEntries. As far as I understand, if, for example, I loaded a List of 100 Companies, I'd have to loop over and query each of those, generating a 100 SQL statements which isn't acceptable performance-wise.
The other problem which I have is that ideally I'd like to load all the Extensions whenever a Company is loaded. With that I mean that I'd like that #Transient getter named getExtensions() to return all the Extensions for any Company. The problem here is that there is no foreign key relation between Company and Extension, as Extension isn't applicable to any single Company instance, but rather to all of them. Currently I can get past that with code like I present below, but this will not work when accessing referenced entities (if for example I have an entity Employee which has a reference to Company, the Company which I retrieve through employee.getCompany() won't have the Extensions loaded):
List<Company> companies = findAllCompanies();
List<Extension> extensions = findAllExtensions();
for (Company company : companies) {
// Extensions are the same for all Companies, but I need them client side
company.setExtensions(extensions);
}
So that's were I'm at currently, and I have no idea how to proceed in order to get past these problems. I'm thinking that my entire design might be flawed, but I'm unsure of how else to try and approach it.
Any and all ideas and suggestions are welcome!

The example with Company, Partner, and Customer is actually good application for polymorphism which is supported by means of inheritance with JPA: you will have one the following 3 strategies to choose from: single table, table per class, and joined. Your description sounds more like joined strategy but not necessarily.
You may also consider just one-to-one( or zero) relationship instead. Then you will need to have such relationship for each value of your virtual column since its values represent different entities. Hence, you'll have a relationship with Partner entity and another relationship with Customer entity and either, both or none can be null.

Use pattern decorator and hide your entity inside decoratorClass bye

Using EAV pattern is IMHO bad choice, because of performance problems and problems with reporting (many joins). Digging for solution I've found something else here: http://www.infoq.com/articles/hibernate-custom-fields

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.