Pattern/Library for sending objects over network, keeping pointers

Pattern/Library for sending objects over network, keeping pointers - java

Let's say you have a Client and a Server that wants to share/synchronize the same Models/Objects. The models point to each other, and you want them to keep pointing at the same object after being sent/serialized between the client and the server. My current solution roughly looks like this:
class Person {
static Map<Integer,Person> allPeople;
int myDogId;
static Person getPerson(int key){
return allPeople.get(key);
}
Dog getMyDog() {
return Dog.getDog(myDogId);
}
}
class Dog {
static Map<Integer,Dog> allDogs;
int myOwnersId;
static Dog getDog(int key) {
return allDogs.get(key);
}
Person getMyOwner() {
return Person.getPerson(myOwnersId);
}
}
But i'm not too satisfied with this solution, fields being integer and stuff. This should also be a pretty common problem. So what I'm looking for here is a name for this problem, a pattern, common solution or a library/framework.

There are two issues here.
Are you replicating the data in the Client and the Server (if so, why?) or does one, the other, or
a database agent hold the Model?
How does each agent access (its/the) model?
If the model is only held by one agent (Client, Server, Database), then the other agents
need a way to remotely query the model (e.g., object enumerators, getters and setters for various fields)
operating on abstract model entities (e.g, model element identifiers, which might be implemented
as integers as you have done).
Regardless of who holds the model (one or all), each model can be implemented naturally.
THe normal implementation has each object simply refer to other objects using normal object references,
as if you had coded this without any thought of sharing between agents, and unlike what
you did.
You can associate an objectid with each object, as you have, but your application
code doesn't need to use it; it is only necessary when referencing a remote copy of
of the model. Whether this objectid is associated with each object as a special
field, a hash table, or is computed on the fly is just an implementation detail.
One way to handle this is to compute the objectid on-the-fly. You can do this
if there is a canonical spanning tree over the entire model. In this case,
the objectid is "just" the path from root of the spanning tree to the location
of object. If you don't have a spanning tree or it is too expensive to compute,
you can assign objectids as objects are created.
The real problem with a duplicated, distributed model as you have suggested you have,
is keeping it up to date with both agents updating it. How do you prevent
one from creating an object (an assigning an objectid) at the same time
as the other, but the objects being created are different with the same objectid,
or the same with with different Objectids? You'll need remote locking
and signalling to keep the models in sync (this is the same problem as
"cache coherency" for multiple CPUs; just think of each object as acting like a cache line). The way it is generally solved
is to designate who holds the master copy (perhaps of the entire model,
perhaps of individual objects within the model) and then issue queries,
reads, reads-with-intent-to-modify, or writes to ensure that the
"unique" entire model gets updated.

The only solution I am aware of is to send the complete structure, i.e. Dogs and Persons over the network. Then they will end up pointing to the correct copy on the other side of the network. The implementation of this solution however depends on a lot of circumstances. For example when your inclusion relation defines a tree you can go at this problem differently than if it is a graph with cycles.
Have a look at this for more information.

I guess one can use the proxy pattern for this.

Related

DDD - Value Object flavor of an Entity

I've seen some DDD projects with value object representations of entities.
They usually appear like EmployeeDetail, EmployeeDescriptor, EmployeeRecord, etc. Sometimes it holds the entity ID, sometimes not.
Is that a pattern? If yes, does it have a name?
What are the use cases?
Are they value objects, parameter objects, or anything else?
Are they referenced in the domain model (as property) or are they "floating" just as parameters and returns of methods?
Going beyond...
I wonder if I can define any aggregate as an ID + BODY (detail, descriptor, etc) + METHODS (behavior).
public class Employee {
private EmployeeID id;
private EmployeeDetail detail; //the "body"
}
Could I design my aggregates like this to avoid code duplication when using this kind of object?
The immediate advantage of doing this is to avoid those methods with too many parameters in the aggregate factory method.
public class Employee {
...
public static Employee from(EmployeeID id, EmployeeDetail detail){...};
}
instead of
public class Employee {
...
public static Employee from(EmployeeID id, + 10 Value Objects here){...};
}
What do you think?

What you're proposing is the idiomatic (via case classes) approach to modeling an aggregate in Scala: you have an ID essentially pointing to a mutable container of an immutable object graph representing the state (and likely some static functions for defining the state transitions). You are moving away from the more traditional OOP conceptions of domain-driven design to the more FP conceptions (come to the dark side... ;) ).
If doing this, you'll typically want to partition the state so that operations on the aggregate will [as] rarely [as possible] change multiple branches of the state, which enables reuse of as much of the previous object graph as possible.

Could I design my aggregates like this to avoid code duplication when using this kind of object?
What you are proposing is representing the entire entity except its id as a 'bulky' value object. A concept or object's place in your domain (finding that involves defining your bounded contexts and their ubiquitous languages) dictates whether it is treated as a value object or an entity, not coding convenience.
However, if you go with your scheme as a general principle, you risk tangling unrelated data into a single value object. That leads to many conceptual and technical difficulties. Take updating an entity for example. Entities are designed to evolve in their lifecycle in response to operations performed on it. Each operation updates only the relevant properties of an entity. With your solution, for any operations, you have to construct a new value object (as value objects are defined to be immutable) as replacement, potentially copying many irrelevant data.
The examples you are citing are most likely entities with only one value object attribute.

OK - great question...
DDD Question Answered
The difference between an entity object and a value object comes down to perspective - and needs for the given situation.
Let's take a simple example...
A airplane flight to your favourite destination has...
Seats 1A, 10B, 21C available for you too book (entities)
3 of 22 Seats available (value object).
The first reflects individually identifiable seat entities that could be filled.
The second reflects that there are 3 seats available (value object).
With value object you are not concerned with which individual entities (seats) are available - just the total number.
It's not difficult to understand that it depends on who's asking and how much it matters.
Some flights you book a seat and others you book a (any) seat on a plane.
General
Ask yourself a question! Do I care about the individual element or the totality?
NB. An entity (plane) can consider seats, identity and / or value object - depending on use case. Also worth noting, it has multiple depends - Cockpit seats are more likely to be entity seats; and passenger seats value objects.
I'm pretty sure I want the pilot seat to have a qualified pilot; and qualified co-pilot; but I don't really care that much where the passengers seats. Well except I want to make sure the emergency exit seats are suitable passengers to help exit the plane in an emergency.
No simple answer, but a complex set of a pieces to thing about, and to consider for each situation and domain complexity.
Hope that explains some bits, happy to answer follow-up questions...

Domain Logic - Read Model

I have two aggregates.
Person {
private personID personID;
private nodeID nodeID; //belongs to node
}
Node {
private nodeID nodeID; //node's id
private nodeID parent; //parent node reference by id
public void assign(Person person);
}
Now I have domain logic for my person assigning service:
Person can be assigned to node "X", only if he belongs to node "Y" which is parent or great-grandfather or great-great-grandfather or... of node "X".
To find out it out I would need to query Read Model.
I am in Domain so I can't just use my Read Model to query it.
I don't think I can just add to my repository, connection to read model, since it's connected to my event store. Specially, when Read Model can be placed at another server and be another application.
What is the proper way to implement it?

The following is a contraint:
Person can be assigned to node "X", only if he belongs to node "Y"
which is parent or great-grandfather or great-great-grandfather or...
of node "X".
If it is a constraint that must be enforced, you can model the hierarchy in a separate aggregate on the write side (e.g., Graph) whose sole purpose is ensuring integrity.

The proper way to do this is to support ancestry checks in the command model. This is where you want to enforce the invariant, so the model needs to support this.
Tree structures often lead to performance problem if you need to be able to make unbounded ancestry checks. So you probably need to implement a performance optimization that improves these kinds of queries.
I see the following possibilities:
Use a data store that directly supports the queries you need. This may be difficult if you want to do ES.
Use snapshotting. This may or may not be feasible depending on your tree structures.
Use caching. This is similar to snapshotting, but stores the information in a cache instead of in the event store.
Use the read model. Be sure you understand the consequences, especially the asynchronous data propagation and the increased complexity. I'd only suggest this as a last resort, but YMMV.

XStream - treat objects with the same identifier as one

I am working on a legacy system where XStream is being used to serialize objects in order to keep two databases in sync. A new object is first stored in one database, then the stored object is serialized and sent to be stored in the other database.
Up until recently, the structure of the object in question was like this:
public class Project {
List<Milestone> milestones;
[...]
}
But, after changes to the requirements, the structure is supposed to be like this:
public class Project {
List<Goal> goals;
}
public class Goal {
List<Milestone> milestones;
}
In order to keep the milestones of legacy data, which knew nothing about goals, the final structure of project was this:
public class Project {
List<Goal> goals;
List<Milestone> milestones;
}
So, there are two paths from a Project, to a Milestone, one directly and one through a Goal. The problem occurs when this structure is deserialized and stored. When it is being deserialized by XStream, the objects for the Milestones connected to the Project directly becomes different objects from the ones connected through Goals, even though they have the same id.
As long as Hibernate's Session#merge() was used to persist this object, it was no problem, since merge() doesn't care about the object identifiers as long as the db identifiers are the same.
But, I can no longer use merge() for this purpose, and have to rely on Session#save() instead. And save() DO care about the object identifiers! So now I get a org.hibernate.NonUniqueObjectException when trying to store the deserialized object.
I figure the least intrusive way to solve this is, if it's possible, to make XStream create 1 object per database id. But is this possible?

After some consideration, it is appearant to me that the problem is not XStream, as it has mechanisms for object references. The problem is another nifty "feature" of the project I'm working on - it has 2 versions of each domain class, one for commmunication with Hibernate, and one for "logic use" (don't ask me why...) In the conversion between these two versions (which basically moves values from one object to another), objects are new-ed uncritically, resulting in the same "Hibernate-object" being transformed into multiple "Java-objects". Then, I can't really blame XStream for not understanding that these should be the same :)

Is a Data Transfer Object the same as a Value Object?

Is a Data Transfer Object the same as a Value Object or are they different? If they are different then where should we use a DTO and where should we use a VO?
The programming language we are talking about is Java and the context is - there is a web application, which fetches data from a database and then processes it and ultimately the processed information is displayed on the front-end.

A value object is a simple object whose equality isn't based on identity.
A data transfer object is an object used to transfer data between software application subsystems, usually between business layers and UI. It is focused just on plain data, so it doesn't have any behaviour.

A Data Transfer Object is a kludge for moving a bunch of data from one layer or tier to another, the goal is to minimize the number of calls back and forth by packing a bunch of stuff into the same data structure and sending it together. Some people also use it, like Michael points out in his post here, so that the classes used by one layer are not exposed to the layer calling it. When I refer to DTO as a kludge, I mean there's not a precise abstract concept getting implemented, it's a practical workaround for helping with communication between application layers.
A Value Object is something where we're only interested in its value, like a monetary amount, a date range, or a code from a lookup table. It does not have an identity, meaning you would not be concerned, if you had several of them, of keeping track of which is which, because they are not things in themselves.
Contrast Value Objects to things that do have a unique identity in your system, which are called Entities. If you have a system where it tracks a customer making a payment, the customer and the payment are entities, because they represent specific things, but the monetary amount on the payment is just a value, it doesn't have an existence by itself, as far as your system is concerned. How something relates to your system determines if it is a Value Object or an Entity.

use a DTO at the boundary of your services if you don't want to send the actual domain object to the service's clients - this helps reduce dependencies between the client and service.
values objects are simply objects whose equality isn't based on identity e.g. java.lang.Integer
DTOs and value objects aren't really alternatives to each other.

They are different, but I've even used the two interchangeably in the past, which is wrong. I read that DTO (Data Transfer Object) was called a VO ( Value Object) in the first edition of the Core J2EE Patterns book, but wasn't able to find that reference.
A DTO, which I've sometimes called a Dumb Transfer Object to help me remember it's a container and shouldn't have any business logic is used to transport data between layers and tiers. It just should be an object with attributes that has getters/setters.
A VO however is similar to a JAVA Enum and represents a fixed set of data. A VO doesn't have object identity (the address of the object instance in memory), it is identified by its value and is immutable.

Martin Fowler, talking about Data Transfer Objects (DTOs):
Many people in the Sun community use the term "Value Object" for this pattern. I use it to mean something else.
So the term "Value Object" has been used to mean DTO, but as of him (and the other posters), its use as a DTO seems discouraged.

Good detailed answer in Matthias Noback article Is it a DTO or a Value Object?
In short a DTO:
Declares and enforces a schema for data: names and types
Offers no guarantees about correctness of values
A value object:
Wraps one or more values or value objects
Provides evidence of the correctness of these values

Maybe because of lack of experience, but I would put it this way: It's the matter of scope.
DTO has word transfer in it so it means some parts of the system will communicate using it.
Value object has smaller scope, you will pass set of data in value object instead in array from one service to the other.
As much as I understood niether of them is "object whose equality isn't based on identity".

NoSQL Schemaless data and statically typed language

One of the key benefits of NoSQL data stores like MongoDB is that they're schemaless. With dynamically typed languages this seem to be a natural fit. You can receive some arbitrary JSON inputs, perform business logic on the known fields, and persist the whole thing without first having to define the object.
What if your choice of language is limited to the statically typed, say Java? How could I achieve the same level of flexibility?
A typical data flow like the following:
JSON Input
Serialize to Java Object to perform business logic
Deserialize into BSON to persist in Mongo
where the serialization to object step is necessary since you want to perform business logic with POJOs, not JSON strings. However, before I can serialize the input into objects, I must define it first. What if the input contains additional fields undefined in the object? While they may not be used in the business logic, I may still want to be able to persist them. I have seem implementations where the undefined fields are put into a map, but am not sure if that's the best approach. For one, the undefined fields may be complex objects as well.

Schemaless data doesn't necessarily mean structureless data; the fields are typically known in advance and some type-safe pattern can be applied on top of it to avoid the Magic Container anti-pattern But this is not always the case. Sometimes keys are entered by the user and cannot be known in advance.
I've used the Role Object Pattern several times to give coherence to a dynamic structure. I think it is well suited here for both cases.
The Role Object Pattern defines a way to access different views of an object. The canonical example being a User that can assume several roles such as Customer, Vendor, and Seller. Each of these views has different operations it can perform and can be accessed from any of the other views. Common fields are typically available at the interface level (especially userId(), or in your case toJson()).
Here's an example of using the pattern:
public void displayPage(User user) {
display(user.getName());
if (user.hasView(Customer.class))
displayShoppingCart(user.getView(Customer.class);
if (user.hasView(Seller.class))
displayProducts(user.getView(Seller.class));
}
In the case of data with a known structure, you can have several views bringing different sets of keys into cohesive units. These different views can read the json data on construction.
In the case of data with a dynamic structure, an authoritative RawDataView can have the data in it's dynamic form (ie. a Magic Container like a HashMap<String, Object>). This can be used to query the dynamic data. At the same time, type-safe wrappers can be created lazily and can delegate to the RawDataView to assist in program readability/maintainability:
public class Customer implements User {
private final RawDataView data;
public CustomerView(UserView source) {
this.data = source.getView(RawDataView.class);
}
// All User views must specify this
#Override
public long id() {
return data.getId();
}
#Override
public <T extends UserView> T getView(Class<T> view) {
// construct or look up view
}
#Override
public Json toJson() {
return data.toJson();
}
//
// Specific to Customer
//
public List<Item> shoppingCart() {
List<Item> items = (List<Item>) data.getValue("items", List.class);
}
// etc....
}
I've had success with both of these approaches. Here are some extra pointers that I've discovered along the way:
Have a static structure structure to your data as much as possible. This makes things a lot easier to maintain. I had to break this rule and use the RawDataView approach when working on a legacy system. You may also have to break it with dynamically-entered user data as mentioned above. In which case, use a convention for non-dynamic field names such as a leading underscore (_userId)
Have equals() and hashcode() implemented such that user.getView(A.class).equals(user.getView(B.class)) is always true for the same user.
Have a UserCore class that does all the heavy lifting of common code such as creating views; performing common operations (like toJson()) returning common fields (like userId()); and implementing equals() and hashcode(). Have all views delegate to this core object
Have an AbstractUserView that delegates to the UserCore and implements equals() and hashcode()
Use a type-safe heterogeneous container (like ClassToInstanceMap) constructing/caching views.
Allow the existence of a view to be queried. This can be done with either a hasView() method or by having getView return Optional<T>

You can always have a class which provides both:
easy access to attributes you know about and optional fallback cases to older formats (for example it can return "name" if it exists, or older case of "name.first" + "name.last" if it doesn't (or some similar scenario))
easy access to unknown elements simulating the map interface
Whether you do a full validation or not, whether you allow extra undefined attributes or not depends on what you want to achieve. But I think that creating an abstraction which allows you either way of accessing the data is the best solution.
Hopefully over time, you'll get to the stage where your schema is pretty much stable and messing directly with the attributes is not needed anymore.

This is not well solved in Java due to the lack of dynamic types. One way this can be solved is using Maps.
Map
The object can again be a Map of objects.
This is not an elegant way but works in Java. An example : SnakeYaml library for YAML allows traversal in this way.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.