Collection as a metaphor for real world containers - java

I find modeling physical containers using collections very intuitive. I override/delegate add methods with added capacity constraints based on physical attributes such as volume of added elements, sort based on physical attributes, locate elements by using maps of position to element and so on.
However, when I read the documentation of collection classes, I get the impression that it's not the intended use, that it's just a mathematical construct and a bounded queue is just meant to be constrained by the number of elements and so forth.
Indeed I think that I unless I'm able to model this collection coherently, I should perhaps not expose this class as a collection but only delegate to it internally. Opinions?

Many structures in software development do not have a physical counterpart. In fact, some structures and algorithms are quite abstract, and do not model objects directly in the physical world. So just because an object does not serve as a suitable model for physical objects in the real world does not necessarily mean it cannot be used effectively to solve problems within a computer program.

Indeed I think that I unless I'm able to model this collection coherently, I should perhaps not expose this class as a collection but only delegate to it internally. Opinions?
Firstly, you don't want to get too hung up with the modeling side of software engineering. UML style models (usually) serve primarily as a way of organizing and expressing the developer's high level ideas about how an application should be implemented. There is no need to have a strict one-to-one relationship between the classes in the model and the implementation classes in the application code.
Second, you don't want to get too hung up about modeling "real world" (i.e. physical) objects and their behavior. Most of the "objects" that are used in a typical applications have no real connection with the real world. For example, a "folder" or "directory" is really little more than an analogy of the physical objects with the same names. There's typically no need for the computer concept to be constrained by the physical behavior of the real world objects.
Finally, there are a number of software engineering reasons why it is a bad idea to have your Java domain classes extend the standard collection types. For example:
The collections have a generic behavior that it is typically not appropriate to expose in a domain object. For instance, you typically don't want components of a domain object to be added and removed willy-nilly.
By extending a collection type, you are implicitly giving permission for some part of your application to treat domain objects as just lists or sets or whatever.
By extending collection classes, you would be hard-wiring implementation details into your domain APIs. For example, you would need to decide between extending ArrayList or LinkedList, and changing your mind would result (at least) in a binary API incompatibility ... and possibly worse.

Not entirely sure that I've understood you correctly. I gather that you want to know if you should expose the collection (subclassing) or wrap it (have a private field).
As Robert says, it really depends on the case. It's pretty much your choice. Nonetheless I'd say that in many cases the better choice is to not expose the collection because the constraints define the object you are modelling and are not fully congruent with the underlying collection. In short: users of your object shouldn't need to know that they are dealing with a collection unless it is really a collection with some speciality e.g. has all properties of a collection but allows only a certain number of objects.

Related

Pluggable Adapter as mentioned in the GOF

The related Posts on Stackover flow for this topic :
Post_1 and Post_2
Above posts are good but still I could not get answer to my confusion, hence I am putting it as a new post here.
MY Questions based on the GOF's Elements of Reusable Object-Oriented Software book content about Pluggable Adapters (mentioned after questions below), hence I would appreciate if the discussions/answers/comments are more focused on the existing examples from GOF regarding the pluggable Adapters rather than other examples
Q1) What do we mean by built-in interface adaptation ?
Q2) How is Pluggable Interface special as compared to usual Adapters ? Usual Adapters also adapt one interface to another.
Q3) Even in the both the use cases, we see both the methods of the Extracted "Narrow Interface" GetChildren(Node) and CreateGraphicNode(Node)depending on Node. Node is an internal to Toolkit. Is Node same as GraphicNode and is the parameter passed in CreateGraphicNode only for populating the states like (name, parentID, etc) of an already created Node object ?
As per the GOF (I have marked few words/sentences as bold to emphasis the content related to my Questions)
ObjectWorks\Smalltalk [Par90] uses the term
pluggable adapter to describe classes with built-in interface adaptation.
Consider a TreeDisplay widget that can display tree structures graphically.
If this were a special-purpose widget for use in just one application, then
we might require the objects that it displays to have a specific interface;
that is, all must descend from a Tree abstract class. But if we wanted to
make TreeDisplay more reusable (say we wanted to make it part of a toolkit
of useful widgets), then that requirement would be unreasonable.
Applications will define their own classes for tree structures. They
shouldn't be forced to use our Tree abstract class. Different tree
structures will have different interfaces.
Pluggable adapters. Let's look at three ways to implement pluggable adapters
for the TreeDisplay widget described earlier, which can lay out and display
a hierarchical structure automatically.
The first step, which is common to all three of the implementations discussed
here, is to find a "narrow" interface for Adaptee, that is, the smallest
subset of operations that lets us do the adaptation. A narrow interface
consisting of only a couple of operations is easier to adapt than an
interface with dozens of operations. For TreeDisplay, the adaptee is any
hierarchical structure. A minimalist interface might include two
operations, one that defines how to present a node in the hierarchical
structure graphically, and another that retrieves the node's children.
Then there are two use cases
"Narrow Interface" being made as abstract and part of the TreeDisplay
Class
Narrow Interface extracted out as a separate interface and having a composition of it in the TreeDisplay class
(There is a 3rd approach of Parameterized adapter also but skipping it for simplicity, Also this 3rd one is I guess more specific to Small talk)
When we talk about the Adapter design pattern, we typically consider two preexisting APIs that we would like to integrate, but which don't match up because they were implemented at different times with different domains. An Adapter may need to do a lot of mapping from one API to the other, because neither API was designed with this sort of extensibility in mind.
But what if the Target API had been designed with future adaptations in mind? A Target API can simplify the job of future Adapters by minimizing assumptions and providing the narrowest possible interface for Adapters to implement. Note this design requires a priori planning. Unlike typical use cases for the Adapter pattern, you cannot insert a Pluggable Adapter between any two APIs. The Target API must have been designed to support pluggable adaptations.
Q1) This is what the GoF means by built-in interface adaptation: an interface is built into the Target API in order to support future adaptations.
Q2) As mentioned, this is a relatively unusual scenario for an Adapter, since the typical strength of the pattern is its ability to handle APIs that have no common design.
The GoF lists three different approaches to design a Target API for adaptation. The first two are recognizable as a couple of their Behavioral design patterns.
Template Method
Strategy
Closures (what Smalltalk calls code blocks)
Q3) Without getting caught up in details of the GoF's GUI examples, the basic idea behind designing what they call a "narrow interface" is to remove as much domain specificity as possible. In Java, the starting point for a domain-agnostic API would almost certainly be the functional interfaces.
A Target API with dependencies on these interfaces should be much simpler to adapt than an API built around domain-specific methods. The former allows for creation of Pluggable Adapters, while the latter would require a more typical Adapter with heavy mapping between APIs.
Let me share a couple of thoughts.
First, since the question has been posted with the Smalltalk tag, I'll use the Smalltalk syntax which is less verbose (e.g. #children instead of GetChildren(Tree,Node), etc.)
As an introduction to this issue (which may be useful for some readers), let's say that (generic) frameworks need to use a generic language (e.g. #children). However, generic terms may not be natural for the specific object you are considering. For example, in the case of a File System, one usually has #files, #directories, etc., but may not have the selector #children. Even if adding these selectors won't kill anyone, you don't want to populate your classes with new "generic" selectors every time an "abstract" class imposes its naming conventions. In the real life, if you do that, sooner or later you will end-up having collisions with other frameworks for which the very same selector has a different meaning. This implies that every framework has the potential of producing some impedance (a.k.a. friction) with the objects that try to benefit from it. Well, adapters are meant to mitigate these side effects.
There are several ways to do this. One is making your framework pluggable. This means that you will not require the clients to implement specific behavior. Instead you will ask the clients to provide a selector or a block whose evaluation will produce the required behavior.
In the Directory example, if your class Directory happens to implement, say #entities, then instead of creating #children as a synonym, you will tell the appropriate class in the framework something like childrenSelector: #entities. The object receiving this method will then "plug" (remember) that it has to send you #entities when looking for children. If you don't have such a method, you still can provide the required behavior using a block that does what's needed. In our example the block would look like
childrenSelector: [self directories, self files].
(Side note: the pluggable framwork could provide a synonym #childrenBlock: so to make its interface more friendly. Alternatively, it could provide a more general selector such as childrenGetter:, etc.)
The receiver will now keep the block in its childrenGetter ivar and will evaluate it every time it needs the client's children.
Another solution one might want to consider consists in requiring the client to subclass an abstract class. This has the advantage of exposing very clearly the client's behavior. Note however than this solution has some drawbacks because, in Smalltalk, you can only inherit from one parent. So, imposing the superclass may result in an undesirable (or even unfeasible) constraint.
The other option you mention consists in adding one indirection to the previous one: instead of subclassing the main "object" you offer an abstract superclass for subclassing the behavoir your object needs to adapt. This is similar to the first approach in that you don't need to change the client, except that this time you put the adapted protocol in a class by itself. This way, instead of plugging several parameters into the framework, you put them all in an object and pass (or "plug") that object to the framework. Note that these adapting objects act as wrappers in that they know the real thing, and know how to deal with it for translating the few messages the framework needs to send. In general, the use of wrappers provides a lot of flexibility at the cost of populating your system with more classes (which entails the risk of duplicated hierarchies). Moreover, wrapping many objects might impact the performance of your system. Note by the way that GraphicNode also looks like a wrapper of the intrinsic/actual Node.
I'm not sure I've answered your question, but since you asked me to somehow expand my comment, I've happily tried so.
Q1) Interface adaptation just means adapting one interface to implement another, i.e., what adapters are for. I'm not sure what they mean by "built-in", but it sounds like a specific feature of Smalltalk, with which I'm not familiar.
Q2) A "Pluggable Adapter" is an adapter class that implements the target interface by accepting implementations for its individual methods as constructor arguments. The purpose is to allow adapters to be expressed succinctly. In all cases, this requires the target interface to be small, and it usually requires some kind of language facility for succinctly providing a computation - a lambda or delegate or similar. In Java, the facility for inline classes and functional interfaces means that a specific adapter class that accepts lambda arguments is unnecessary.
Pluggable adapters are a convenience. They are not important beyond that. However...
Q3) The quoted text isn't about pluggable adapters, and neither of the two use cases has a pluggable adapter in it. That part is about the Interface Segregation Principle, and it is important.
In the first example, TreeDisplay is subclassed. The actual adapter interface is the subset of methods in TreeDisplay that require implementation. This is less than ideal, because there is no concise definition of the interface that the adapter must implement, and the DirectoryTreeDisplay cannot simultaneously implement another similar target interface. Also such implementations tend interact with the subclass in complex ways.
In the second example, TreeDisplay comes with a TreeAccessorDelegate interface that captures the requirements for things it can display. This is a small interface that that can be easily implemented in a variety of ways, including by a pluggable adapter. (although the example DirectoryBrowser is not pluggable). Also, interface adaptation does not have to be the sole purpose of the adapter class. You see that DirectoryBrowser class implements methods that have nothing to do with tree display.
The Node type in these examples would be an empty/small interface, i.e., another adapter target, or even a generic type argument so that no adaptation is required. I think this design could be improved, actually, by making Node the only adaptation target.

How does MicroStream (de)serialization work?

I was wondering how the serialization of MicroStream works in detail.
Since it is described as "Super-Fast" it has to rely on code-generation, right? Or is it based on reflections?
How would it perform in comparison to the Protobuf-Serialization, which relies on Code-generation that directly reads out of the java-fields and writes them into a bytebuffer and vice-versa.
Using reflections would drastically decrease the performance when serializing objects on a huge scale, wouldn't it?
I'm looking for a fast way to transmit and persist objects for a multiplayer-game and every millisecond counts. :)
Thanks in advance!
PS: Since I don't have enough reputation, I can not create the "microstream"-tag. https://microstream.one/
I am the lead developer of MicroStream.
(This is not an alias account. I really just created it. I'm reading on StackOverflow for 10 years or so but never had a reason to create an account. Until now.)
On every initialization, MicroStream analyzes the current runtime's versions of all required entity and value type classes and derives optimized metadata from them.
The same is done when encountering a class at runtime that was unknown so far.
The analysis is done per reflection, but since it is only done once for every handled class, the reflection performance cost is negligible.
The actual storing and loading or serialization and deserialization is done via optimized framework code based on the created metadata.
If a class layout changes, the type analysis creates a mapping from the field layout that the class' instances are stored in to that of the current class.
Automatically if possible (unambiguous changes or via some configurable heuristics), otherwise via a user-provided mapping. Performance stays the same since the JVM does not care if it (simplified speaking) copies a loaded value #3 to position #3 or to position #5. It's all in the metadata.
ByteBuffers are used, more precisely direct ByteBuffers, but only as an anchor for off-heap memory to work on via direct "Unsafe" low-level operations. If you are not familiar with "Unsafe" operations, a short and simple notion is: "It's as direct and fast as C++ code.". You can do anything you want very fast and close to memory, but you are also responsible for everything. For more details, google "sun.misc.Unsafe".
No code is generated. No byte code hacking, tacit replacement of instances by proxies or similar monkey business is used. On the technical level, it's just a Java library (including "Unsafe" usage), but with a lot of properly devised logic.
As a side note: reflection is not as slow as it is commonly considered to be. Not any more. It was, but it has been optimized pretty much in some past Java version(s?).
It's only slow if every operation has to do all the class analysis, field lookups, etc. anew (which an awful lot of frameworks seem to do because they are just badly written). If the fields are collected (set accessible, etc.) once and then cached, reflection is actually surprisingly fast.
Regarding the comparison to Protobuf-Serialization:
I can't say anything specific about it since I haven't used Protocol Buffers and I don't know how it works internally.
As usual with complex technologies, a truly meaningful comparison might be pretty difficult to do since different technologies have different optimization priorities and limitations.
Most serialization approaches give up referential consistency but only store "data" (i.e. if two objects reference a third, deserialization will create TWO instances of that third object.
Like this: A->C<-B ==serialization==> A->C1 B->C2.
This basically breaks/ruins/destroys object graphs and makes serialization of cyclic graphs impossible, since it creates and endlessly cascading replication. See JSON serialization, for example. Funny stuff.)
Even Brian Goetz' draft for a Java "Serialization 2.0" includes that limitation (see "Limitations" at http://cr.openjdk.java.net/~briangoetz/amber/serialization.html) (and another one which breaks the separation of concerns).
MicroStream does not have that limitation. It handles arbitrary object graphs properly without ruining their references.
Keeping referential consistency intact is by far not "trying to do too much", as he writes. It is "doing it properly". One just has to know how to do it properly. And it even is rather trivial if done correctly.
So, depending on how many limitations Protobuf-Serialization has ("pacts with the devil"), it might be hardly or even not at all comparable to MicroStream in general.
Of course, you can always create some performance comparison tests for your particular requirements and see which technology suits you best. Just make sure you are aware of the limitations a certain technology imposes on you (ruined referential consistency, forbidden types, required annotations, required default constructor / getters / setters, etc.).
MicroStream has none*.
(*) within reason: Serializing/storing system-internals (e.g. Thread) or non-entities (like lambdas or proxy instances) is, while technically possible, intentionally excluded.

Lightweight collection w/ Set contract

I'm developing in an application requiring lots of objects in memory. One of the largest structures is of the type
Map<String,Set<OwnObject>> (with Set as HashSet)
with OwnObject being a heavyweight object representing records in a database. The application works, but has a rather large memory footprint. Reading this Java Specialists newsletter from 2001, I've analyzed the memory usage of my large structure above. The HashSet uses a HashMap in the back, which in turn is quite a heavyweight object, and I guess this is where most of my additional memory goes.
Trying to optimize the memory usage of the structure, I tried around with multiple versions:
Map<String,List<OwnObject>> (with List as ArrayList)
Map<String,OwnObject[]>
Both work, and both are much more lean than the version using the Set<>. However, I'd like to keep the Set contract in place (uniqueness of entries).
One way would be to implement the logic myself. I could extend ArrayList and ensure the contract in add().
Are there frameworks implementing lightweight collections that honor the Set contract? Or do I miss something from the Java collections that I could use without ensuring uniqueness by myself?
The solution I implemented is the following:
Map<String,OwnObject[]>
Adding and removing to the array was done using Arrays.binarySearch() and 2 slice System.arraysCopy()s, by which sorting and uniqueness happen on the side.

Data Model Evolution

When writing code I am seeing requirements to change data models (e.g. adding/changing/removing data members from a class). When these data models belong to an interface, it seems difficult to change without breaking the existing client codes. So I am wondering if there is any best practice of designing interfaces/data models in a way to minimize the impact during evolution.
The closest thing I can find from google is data contract versioning. But that seems to be a .net specific topic. I am wondering if the same practice applies to the Java world, or there is a different or generic way to deal with data model evolution.
Thanks
There are some tools which can help, have a look at LiquiBase.
This article goves a good overview on developerworks
There are no easy answers to this in either the Java or data modeling domains.
Some changes are upwards compatible; e.g. addition of new methods, optional fields, subclasses and so on.
Some changes are not compatible, but can be handled using a simple transformation; e.g. addition of a mandatory field could supported by a transformation that adds an extra constructor argument.
Some changes unavoidably require major programmer intervention.
Another point to note is that the problem gets a lot harder when the data corresponding to the data models is persistent, and cannot be thrown away when the data model changes. This is referred to as the "schema evolution" problem, and I believe that it has been proven that there is no general solution.

How to avoid having very large objects with Domain Driven Design

We are following Domain Driven Design for the implementation of a large website.
However by putting the behaviour on the domain objects we are ending up with some very large classes.
For example on our WebsiteUser object, we have many many methods - e.g. dealing with passwords, order history, refunds, customer segmentation. All of these methods are directly related to the user. Many of these methods delegate internally to other child object but
this still results in some very large classes.
I'm keen to avoid exposing lots of child objects
e.g. user.getOrderHistory().getLatestOrder().
What other strategies can be used to avoid this problems?
The issues you are seeing aren't caused by Domain Driven Design, but rather by a lack of separation of concerns. Domain Driven Design isn't just about placing data and behavior together.
The first thing I would recommend is taking a day or so and reading Domain Driven Design Quickly available as a free download from Info-Q. This will provide an overview of the different types of domain objects: entities, value objects, services, repositories, and factories.
The second thing I would recommend is to go read up on the Single Responsibility Principle.
The third thing I would recommend is that you begin to immerse yourself in Test Driven Development. While learning to design by writing tests first won't necessarily make you designs great, they tend to guide you toward loosely coupled designs and reveal design issues earlier.
In the example you provided, WebsiteUser definitely has way too many responsibilities. In fact, you may not have a need for WebsiteUser at all as users are generally represented by an ISecurityPrincipal.
It's a bit hard to suggest exactly how you should approach your design given the lack of business context, but I would first recommend doing some brain-storming by creating some index cards representing each of the major nouns you have in your system (e.g. Customer, Order, Receipt, Product, etc.). Write down candidate class names at the top, what responsibilities you feel are inherent to the class off to the left, and the classes it will collaborate with to the right. If some behavior doesn't feel like it belongs on any of the objects, it's probably a good service candidate (i.e. AuthenticationService). Spread the cards out on the table with your colleges and discuss. Don't make too much of this though, as this is really only intended as a brainstorming design exercise. It can be a little easier to do this at times than using a whiteboard because you can move things around.
Long term, you should really pick up the book Domain Driven Design by Eric Evans. It's a big read, but well worth your time. I'd also recommend you pick up either
Agile Software Development, Principles, Patterns, and Practices or Agile Principles, Patterns, and Practices in C# depending on your language preference.
Although real humans have lots of responsibilities, you're heading towards the God object anti-pattern.
As others have hinted, you should extract those responsibilities into separate Repositories and/or Domain Services. E.g.:
SecurityService.Authenticate(credentials, customer)
OrderRepository.GetOrderHistoryFor(Customer)
RefundsService.StartRefundProcess(order)
Be specific with naming conventions (i.e. use OrderRepository or OrderService, instead of OrderManager)
You've run into this problem because of convenience. i.e. it's convenient to treat a WebsiteUser as an aggregate root, and to access everything through it.
If you place more emphasis on clarity instead of convenience, it should help separate these concerns. Unfortunately, it does mean that team members must now be aware of the new Services.
Another way to think of it: just as Entities shouldn't perform their own persistence (which is why we use Repositories), your WebsiteUser should not handle Refunds/Segmentation/etc.
Hope that helps!
A very simple rule of thumb to follow is "most of the methods in your class HAVE to use most of the instance variables in your class" - if you follow this rule the classes will be automatically of the right size.
I ran into the same problem, and I found that using child "manager" objects was the best solution in our case.
For example, in your case, you might have:
User u = ...;
OrderHistoryManager histMan = user.getOrderHistoryManager();
Then you can use the histMan for anything you want. Obviously you thought of this, but I don't know why you want to avoid it. It seperates concerns when you have objects which seem to do too much.
Think about it this way. If you had a "Human" object, and you had to implement the chew() method. Would you put it on the Human object or the Mouth child object.
You may want to consider inversing some things. For example, a Customer doesn't need to have an Order property (or a history of orders) - you can leave those out of the Customer class. So instead of
public void doSomethingWithOrders(Customer customer, Calendar from, Calendar to) {
List = customer.getOrders(from, to);
for (Order order : orders) {
order.doSomething();
}
}
you could instead do:
public void doSomethingWithOrders(Customer customer, Calendar from, Calendar to) {
List = orderService.getOrders(customer, from, to);
for (Order order : orders) {
order.doSomething();
}
}
This is 'looser' coupling, but still you can get all the orders belonging to a customer. I'm sure there's smarter people than me that have the right names and links referring to the above.
I believe that your problem is actually related to Bounded Contexts. For what I see, "dealing with passwords, order history, refunds, customer segmentation", each one of these can be a bounded context. Therefore, you might consider splitting your WebsiteUser into multiple entities, each one corresponding to a context. There may arise some duplication, but you gain focus on your domain and get rid off very large classes with multiple responsibilities.

Categories