Secret Handshake Anti-Pattern - java

I've just come across a pattern I've seen before, and wanted to get opinions on it. The code in question involves an interface like this:
public interface MyCrazyAnalyzer {
public void setOptions(AnalyzerOptions options);
public void setText(String text);
public void initialize();
public int getOccurances(String query);
}
And the expected usage is like this:
MyCrazyAnalyzer crazy = AnalyzerFactory.getAnalyzer();
crazy.setOptions(true);
crazy.initialize();
Map<String, Integer> results = new HashMap<String, Integer>();
for(String item : items) {
crazy.setText(item);
results.put(item, crazy.getOccurances);
}
There's reasons for some of this. The setText(...) and getOccurances(...) are there because there are multiple queries you might want to do after doing the same expensive analysis on the data, but this can be refactored to a result class.
Why I think this is so bad: the implementation is storing state in a way that isn't clearly indicated by the interface. I've also seen something similar involving an interface that required to call "prepareResult", then "getResult". Now, I can think of well designed code that employs some of these features. Hadoop Mapper interface extends JobConfigurable and Closeable, but I see a big difference because it's a framework that uses user code implementing those interfaces, versus a service that could have multiple implementations. I suppose anything related to including a "close" method that must be called is justified, since there isn't any other reasonable way to do it. In some cases, like JDBC, this is a consequence of a leaky abstraction, but in the two pieces of code I'm thinking of, it's pretty clearly a consequence of programmers hastily adding an interface to a spaghetti code class to clean it up.
My questions are:
Does everyone agree this is a poorly designed interface?
Is this a described anti-pattern?
Does this kind of initialization ever belong in an interface?
Does this only seem wrong to me because I have a preference for functional style and immutability?
If this is common enough to deserve a name, I suggest the "Secret Handshake" anti-pattern for an interface that forces you to call multiple methods in a particular order when the interface isn't inherently stateful (like a Collection).

Yes, it's an anti-pattern: Sequential coupling.
I'd refactor into Options - passed to the factory, and Results, returned from an analyseText() method.

I'd expect to see the AnalyzerFactory get passed the necessary params and do the construction itself; otherwise, what exactly is it doing?
Not sure if it does have a name, but it seems like it should :)
Yes, occassionally it's convenient (and the right level of abstraction) to have setters in your interface and expect classes to call them. I'd suggest that doing so requires extensive documentation of that fact.
Not really, no. A preference for immutability is certainly a good thing, and setter/bean based design can be the "right" choice sometimes too, but your given example is taking it too far.

I'm not sure whether it's a described anti-pattern but I totally agree this is a poorly designed interface. It leaves too much opportunity for error and violates at least one key principle: make your API hard to misuse.
Besides misuse, this API can also lead to hard-to-debug errors if multiple threads make use of the same instance.
Joshua Bloch actually has an excellent presentation (36m16s and 40m30s) on API design and he addresses this as one of the characteristics of a poorly designed API.

I can't see anything bad in here. setText() prepares the stage; after that, you have one or more calls to getOccurances(). Since setText() is so expensive, I can't think of any other way to do this.
getOccurances(text, query) would fix the "secret handshake" at a tremendous performance cost. You could try to cache text in getOccurances() and only update your internal caches when the text changes but that starts to look more and more like sacrifice to some OO principle. If a rule doesn't make sense, then don't apply it. Software developers have a brain for a reason.

One possible solution - use Fluent chaning. That avoids a class containing methods that need to called in a certain order. It's a lot like the builder pattern which ensures you don't read objects that are still in the middle of being populated.

Related

how can I return an object by giving only an interface [duplicate]

when programming in Java I practically always, just out of habit, write something like this:
public List<String> foo() {
return new ArrayList<String>();
}
Most of the time without even thinking about it. Now, the question is: should I always specify the interface as the return type? Or is it advisable to use the actual implementation of the interface, and if so, under what circumstances?
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList:
List bar = foo();
List myList = bar instanceof LinkedList ? new ArrayList(bar) : bar;
but that just seems horrible and my coworkers would probably lynch me in the cafeteria. And rightfully so.
What do you guys think? What are your guidelines, when do you tend towards the abstract solution, and when do you reveal details of your implementation for potential performance gains?
Return the appropriate interface to hide implementation details. Your clients should only care about what your object offers, not how you implemented it. If you start with a private ArrayList, and decide later on that something else (e.g., LinkedLisk, skip list, etc.) is more appropriate you can change the implementation without affecting clients if you return the interface. The moment you return a concrete type the opportunity is lost.
For instance, if I know that I will
primarily access the data in the list
randomly, a LinkedList would be bad.
But if my library function only
returns the interface, I simply don't
know. To be on the safe side I might
even need to copy the list explicitly
over to an ArrayList.
As everybody else has mentioned, you just mustn't care about how the library has implemented the functionality, to reduce coupling and increasing maintainability of the library.
If you, as a library client, can demonstrate that the implementation is performing badly for your use case, you can then contact the person in charge and discuss about the best path to follow (a new method for this case or just changing the implementation).
That said, your example reeks of premature optimization.
If the method is or can be critical, it might mention the implementation details in the documentation.
Without being able to justify it with reams of CS quotes (I'm self taught), I've always gone by the mantra of "Accept the least derived, return the most derived," when designing classes and it has stood me well over the years.
I guess that means in terms of interface versus concrete return is that if you are trying to reduce dependencies and/or decouple, returning the interface is generally more useful. However, if the concrete class implements more than that interface, it is usually more useful to the callers of your method to get the concrete class back (i.e. the "most derived") rather than aribtrarily restrict them to a subset of that returned object's functionality - unless you actually need to restrict them. Then again, you could also just increase the coverage of the interface. Needless restrictions like this I compare to thoughtless sealing of classes; you never know. Just to talk a bit about the former part of that mantra (for other readers), accepting the least derived also gives maximum flexibility for callers of your method.
-Oisin
Sorry to disagree, but I think the basic rule is as follows:
For input arguments use the most generic.
For output values, the most specific.
So, in this case you want to declare the implementation as:
public ArrayList<String> foo() {
return new ArrayList<String>();
}
Rationale:
The input case is already known and explained by everyone: use the interface, period. However, the output case can look counter-intuitive.
You want to return the implementation because you want the client to have the most information about what is receiving. In this case, more knowledge is more power.
Example 1: the client wants to get the 5th element:
return Collection: must iterate until 5th element vs return List:
return List: list.get(4)
Example 2: the client wants to remove the 5th element:
return List: must create a new list without the specified element (list.remove() is optional).
return ArrayList: arrayList.remove(4)
So it's a big truth that using interfaces is great because it promotes reusability, reduces coupling, improves maintainability and makes people happy ... but only when used as input.
So, again, the rule can be stated as:
Be flexible for what you offer.
Be informative with what you deliver.
So, next time, please return the implementation.
In OO programming, we want to encapsulate as much as possible the data. Hide as much as possible the actual implementation, abstracting the types as high as possible.
In this context, I would answer only return what is meaningful. Does it makes sense at all for the return value to be the concrete class? Aka in your example, ask yourself: will anyone use a LinkedList-specific method on the return value of foo?
If no, just use the higher-level Interface. It's much more flexible, and allows you to change the backend
If yes, ask yourself: can't I refactor my code to return the higher-level interface? :)
The more abstract is your code, the less changes your are required to do when changing a backend. It's as simple as that.
If, on the other hand, you end up casting the return values to the concrete class, well that's a strong sign that you should probably return instead the concrete class. Your users/teammates should not have to know about more or less implicit contracts: if you need to use the concrete methods, just return the concrete class, for clarity.
In a nutshell: code abstract, but explicitly :)
In general, for a public facing interface such as APIs, returning the interface (such as List) over the concrete implementation (such as ArrayList) would be better.
The use of a ArrayList or LinkedList is an implementation detail of the library that should be considered for the most common use case of that library. And of course, internally, having private methods handing off LinkedLists wouldn't necessarily be a bad thing, if it provides facilities that would make the processing easier.
There is no reason that a concrete class shouldn't be used in the implementation, unless there is a good reason to believe that some other List class would be used later on. But then again, changing the implementation details shouldn't be as painful as long as the public facing portion is well-designed.
The library itself should be a black box to its consumers, so they don't really have to worry about what's going on internally. That also means that the library should be designed so that it is designed to be used in the way it is intended.
It doesn't matter all that much whether an API method returns an interface or a concrete class; despite what everyone here says, you almost never change the implementiation class once the code is written.
What's far more important: always use minimum-scope interfaces for your method parameters! That way, clients have maximal freedom and can use classes your code doesn't even know about.
When an API method returns ArrayList, I have absolutely no qualms with that, but when it demands an ArrayList (or, all to common, Vector) parameter, I consider hunting down the programmer and hurting him, because it means that I can't use Arrays.asList(), Collections.singletonList() or Collections.EMPTY_LIST.
As a rule, I only pass back internal implementations if I am in some private, inner workings of a library, and even so only sparingly. For everything that is public and likely to be called from the outside of my module I use interfaces, and also the Factory pattern.
Using interfaces in such a way has proven to be a very reliable way to write reusable code.
The main question has been answered already and you should always use the interface. I however would just like to comment on
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList.
If you are returning a data structure that you know has poor random access performance -- O(n) and typically a LOT of data -- there are other interfaces you should be specifying instead of List, like Iterable so that anyone using the library will be fully aware that only sequential access is available.
Picking the right type to return isn't just about interface versus concrete implementation, it is also about selecting the right interface.
You use interface to abstract away from the actual implementation. The interface is basically just a blueprint for what your implementation can do.
Interfaces are good design because they allow you to change implementation details without having to fear that any of its consumers are directly affected, as long as you implementation still does what your interface says it does.
To work with interfaces you would instantiate them like this:
IParser parser = new Parser();
Now IParser would be your interface, and Parser would be your implementation. Now when you work with the parser object from above, you will work against the interface (IParser), which in turn will work against your implementation (Parser).
That means that you can change the inner workings of Parser as much as you want, it will never affect code that works against your IParser parser interface.
In general use the interface in all cases if you have no need of the functionality of the concrete class. Note that for lists, Java has added a RandomAccess marker class primarily to distinguish a common case where an algorithm may need to know if get(i) is constant time or not.
For uses of code, Michael above is right that being as generic as possible in the method parameters is often even more important. This is especially true when testing such a method.
You'll find (or have found) that as you return interfaces, they permeate through your code. e.g. you return an interface from method A and you have to then pass an interface to method B.
What you're doing is programming by contract, albeit in a limited fashion.
This gives you enormous scope to change implementations under the covers (provided these new objects fulfill the existing contracts/expected behaviours).
Given all of this, you have benefits in terms of choosing your implementation, and how you can substitute behaviours (including testing - using mocking, for example). In case you hadn't guessed, I'm all in favour of this and try to reduce to (or introduce) interfaces wherever possible.

Java - Is it good practice to return interface or abstract types from methods? [duplicate]

when programming in Java I practically always, just out of habit, write something like this:
public List<String> foo() {
return new ArrayList<String>();
}
Most of the time without even thinking about it. Now, the question is: should I always specify the interface as the return type? Or is it advisable to use the actual implementation of the interface, and if so, under what circumstances?
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList:
List bar = foo();
List myList = bar instanceof LinkedList ? new ArrayList(bar) : bar;
but that just seems horrible and my coworkers would probably lynch me in the cafeteria. And rightfully so.
What do you guys think? What are your guidelines, when do you tend towards the abstract solution, and when do you reveal details of your implementation for potential performance gains?
Return the appropriate interface to hide implementation details. Your clients should only care about what your object offers, not how you implemented it. If you start with a private ArrayList, and decide later on that something else (e.g., LinkedLisk, skip list, etc.) is more appropriate you can change the implementation without affecting clients if you return the interface. The moment you return a concrete type the opportunity is lost.
For instance, if I know that I will
primarily access the data in the list
randomly, a LinkedList would be bad.
But if my library function only
returns the interface, I simply don't
know. To be on the safe side I might
even need to copy the list explicitly
over to an ArrayList.
As everybody else has mentioned, you just mustn't care about how the library has implemented the functionality, to reduce coupling and increasing maintainability of the library.
If you, as a library client, can demonstrate that the implementation is performing badly for your use case, you can then contact the person in charge and discuss about the best path to follow (a new method for this case or just changing the implementation).
That said, your example reeks of premature optimization.
If the method is or can be critical, it might mention the implementation details in the documentation.
Without being able to justify it with reams of CS quotes (I'm self taught), I've always gone by the mantra of "Accept the least derived, return the most derived," when designing classes and it has stood me well over the years.
I guess that means in terms of interface versus concrete return is that if you are trying to reduce dependencies and/or decouple, returning the interface is generally more useful. However, if the concrete class implements more than that interface, it is usually more useful to the callers of your method to get the concrete class back (i.e. the "most derived") rather than aribtrarily restrict them to a subset of that returned object's functionality - unless you actually need to restrict them. Then again, you could also just increase the coverage of the interface. Needless restrictions like this I compare to thoughtless sealing of classes; you never know. Just to talk a bit about the former part of that mantra (for other readers), accepting the least derived also gives maximum flexibility for callers of your method.
-Oisin
Sorry to disagree, but I think the basic rule is as follows:
For input arguments use the most generic.
For output values, the most specific.
So, in this case you want to declare the implementation as:
public ArrayList<String> foo() {
return new ArrayList<String>();
}
Rationale:
The input case is already known and explained by everyone: use the interface, period. However, the output case can look counter-intuitive.
You want to return the implementation because you want the client to have the most information about what is receiving. In this case, more knowledge is more power.
Example 1: the client wants to get the 5th element:
return Collection: must iterate until 5th element vs return List:
return List: list.get(4)
Example 2: the client wants to remove the 5th element:
return List: must create a new list without the specified element (list.remove() is optional).
return ArrayList: arrayList.remove(4)
So it's a big truth that using interfaces is great because it promotes reusability, reduces coupling, improves maintainability and makes people happy ... but only when used as input.
So, again, the rule can be stated as:
Be flexible for what you offer.
Be informative with what you deliver.
So, next time, please return the implementation.
In OO programming, we want to encapsulate as much as possible the data. Hide as much as possible the actual implementation, abstracting the types as high as possible.
In this context, I would answer only return what is meaningful. Does it makes sense at all for the return value to be the concrete class? Aka in your example, ask yourself: will anyone use a LinkedList-specific method on the return value of foo?
If no, just use the higher-level Interface. It's much more flexible, and allows you to change the backend
If yes, ask yourself: can't I refactor my code to return the higher-level interface? :)
The more abstract is your code, the less changes your are required to do when changing a backend. It's as simple as that.
If, on the other hand, you end up casting the return values to the concrete class, well that's a strong sign that you should probably return instead the concrete class. Your users/teammates should not have to know about more or less implicit contracts: if you need to use the concrete methods, just return the concrete class, for clarity.
In a nutshell: code abstract, but explicitly :)
In general, for a public facing interface such as APIs, returning the interface (such as List) over the concrete implementation (such as ArrayList) would be better.
The use of a ArrayList or LinkedList is an implementation detail of the library that should be considered for the most common use case of that library. And of course, internally, having private methods handing off LinkedLists wouldn't necessarily be a bad thing, if it provides facilities that would make the processing easier.
There is no reason that a concrete class shouldn't be used in the implementation, unless there is a good reason to believe that some other List class would be used later on. But then again, changing the implementation details shouldn't be as painful as long as the public facing portion is well-designed.
The library itself should be a black box to its consumers, so they don't really have to worry about what's going on internally. That also means that the library should be designed so that it is designed to be used in the way it is intended.
It doesn't matter all that much whether an API method returns an interface or a concrete class; despite what everyone here says, you almost never change the implementiation class once the code is written.
What's far more important: always use minimum-scope interfaces for your method parameters! That way, clients have maximal freedom and can use classes your code doesn't even know about.
When an API method returns ArrayList, I have absolutely no qualms with that, but when it demands an ArrayList (or, all to common, Vector) parameter, I consider hunting down the programmer and hurting him, because it means that I can't use Arrays.asList(), Collections.singletonList() or Collections.EMPTY_LIST.
As a rule, I only pass back internal implementations if I am in some private, inner workings of a library, and even so only sparingly. For everything that is public and likely to be called from the outside of my module I use interfaces, and also the Factory pattern.
Using interfaces in such a way has proven to be a very reliable way to write reusable code.
The main question has been answered already and you should always use the interface. I however would just like to comment on
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList.
If you are returning a data structure that you know has poor random access performance -- O(n) and typically a LOT of data -- there are other interfaces you should be specifying instead of List, like Iterable so that anyone using the library will be fully aware that only sequential access is available.
Picking the right type to return isn't just about interface versus concrete implementation, it is also about selecting the right interface.
You use interface to abstract away from the actual implementation. The interface is basically just a blueprint for what your implementation can do.
Interfaces are good design because they allow you to change implementation details without having to fear that any of its consumers are directly affected, as long as you implementation still does what your interface says it does.
To work with interfaces you would instantiate them like this:
IParser parser = new Parser();
Now IParser would be your interface, and Parser would be your implementation. Now when you work with the parser object from above, you will work against the interface (IParser), which in turn will work against your implementation (Parser).
That means that you can change the inner workings of Parser as much as you want, it will never affect code that works against your IParser parser interface.
In general use the interface in all cases if you have no need of the functionality of the concrete class. Note that for lists, Java has added a RandomAccess marker class primarily to distinguish a common case where an algorithm may need to know if get(i) is constant time or not.
For uses of code, Michael above is right that being as generic as possible in the method parameters is often even more important. This is especially true when testing such a method.
You'll find (or have found) that as you return interfaces, they permeate through your code. e.g. you return an interface from method A and you have to then pass an interface to method B.
What you're doing is programming by contract, albeit in a limited fashion.
This gives you enormous scope to change implementations under the covers (provided these new objects fulfill the existing contracts/expected behaviours).
Given all of this, you have benefits in terms of choosing your implementation, and how you can substitute behaviours (including testing - using mocking, for example). In case you hadn't guessed, I'm all in favour of this and try to reduce to (or introduce) interfaces wherever possible.

What's wrong with returning this?

At the company I work for there's a document describing good practices that we should adhere to in Java. One of them is to avoid methods that return this, like for example in:
class Properties {
public Properties add(String k, String v) {
//store (k,v) somewhere
return this;
}
}
I would have such a class so that I'm able to write:
properties.add("name", "john").add("role","swd"). ...
I've seen such idiom many times, like in StringBuilder and don't find anything wrong with it.
Their argumentation is :
... can be the source of synchronization problems or failed expectations about the states of target objects.
I can't think of a situation where this could be true, can any of you give me an example?
EDIT The document doesn't specify anything about mutability, so I don't see the diference between chaining the calls and doing:
properties.add("name", "john");
properties.add("role", "swd");
I'll try to get in touch with the originators, but I wanted to do it with my guns loaded, thats' why I posted the question.
SOLVED: I got to talk with one of the authors, his original intention was apparently to avoid releasing objects that are not yet ready, like in a Builder pattern, and explained that if a context switch happens between calls, the object could be in an invalid state. I argued that this had nothing to do with returning this since you could make the same mistake buy calling the methods one by one and had more to do with synchronizing the building process properly. He admitted the document could be more explicit and will revise it soon. Victory is mine/ours!
My guess is that they are against mutable state (and often are rightly so). If you are not designing fluent interfaces returning this but rather return a new immutable instance of the object with the changed state, you can avoid synchronization problems or have no "failed expectations about the states of target objects". This might explain their requirement.
The only serious basis for the practice is avoiding mutable objects; the criticism that it is "confusing" and leads to "failed expectations" is quite weak. One should never use an object without first getting familiar with its semantics, and enforcing constraints on the API just to cater for those who opt out of reading Javadoc is not a good practice at all— especially because, as you note, returning this to achieve a fluent API design is one of the standard approaches in Java, and indeed a very welcome one.
I think sometimes this approach can be really useful, for example in 'builder' pattern.
I can say that in my organization this kind of things is controlled by Sonar rules, and we don't have such a rule.
Another guess is that maybe the project was built on top of existing codebase and this is kind of legacy restriction.
So the only thing I can suggest here is to talk to the people who wrote this doc :)
Hope this helps
I think it's perfectly acceptable to use that pattern in some situations.
For example, as a Swing developer, I use GridBagLayout fairly frequently for its strengths and flexibility, but anyone who's ever used it (with it's partener in crime GridBagConstraints) knows that it can be quite verbose and not very readable.
A common workaround that I've seen online (and one that I use) is to subclass GridBagConstraints (GBConstraints) that has a setter for each different property, and each setter returns this. This allows for the developer to chain the different properties on an as-needed basis.
The resultant code is about 1/4 the size, and far more readable/maintainable, even to the casual developer who might not be familiar with using GridBagConstaints.

Dependency injection: templates (/generics) or virtual functions?

This is a question of curiosity about accepted coding practices. I'm (primarily) a Java developer, and have been increasingly making efforts to unit test my code. I've spent some time looking at how to write the most testable code, paying particular attention to Google's How to write untestable code guide (well worth a look, if you haven't seen it).
Naturally, I was arguing recently with a more C++-oriented friend about the advantages of each language's inheritance model, and I thought I'd pull out a trump card by saying how much harder C++ programmers made it to test their code by constantly forgetting the virtual keyword (for C++ers - this is the default in Java; you get rid of it using final).
I posted a code example that I thought would demonstrate the advantages of Java's model quite well (the full thing is over on GitHub). The short version:
class MyClassForTesting {
private final Database mDatabase;
private final Api mApi;
void myFunctionForTesting() {
for (User u : mDatabase.getUsers()) {
mRemoteApi.updateUserData(u);
}
}
MyClassForTesting ( Database usersDatabase, Api remoteApi) {
mDatabase = userDatabase;
mRemoteApi = remoteApi;
}
}
Regardless of the quality of what I've written here, the idea is that the class needs to make some (potentially quite expensive) calls to a database, and some API (maybe on a remote web server). myFunctionForTesting() doesn't have a return type, so how do you unit test this? In Java, I think the answer isn't too difficult - we mock:
/*** Tests ***/
/*
* This will record some stuff and we'll check it later to see that
* the things we expect really happened.
*/
ActionRecorder ar = new ActionRecorder();
/** Mock up some classes **/
Database mockedDatabase = new Database(ar) {
#Override
public Set<User> getUsers() {
ar.recordAction("got list of users");
/* Excuse my abuse of notation */
return new Set<User>( {new User("Jim"), new User("Kyle")} );
}
Database(ActionRecorder ar) {
this.ar = ar;
}
}
Api mockApi = new Api() {
#Override
public void updateUserData(User u) {
ar.recordAction("Updated user data for " + u.name());
}
Api(ActionRecorder ar) {
this.ar = ar;
}
}
/** Carry out the tests with the mocked up classes **/
MyClassForTesting testObj = new MyClassForTesting(mockDatabase, mockApi);
testObj.myFunctionForTesting();
// Check that it really fetches users from the database
assert ar.contains("got list of users");
// Check that it is checking the users we passed it
assert ar.contains("Updated user data for Jim");
assert ar.contains("Updated user data for Kyle");
By mocking up these classes, we inject the dependencies with our own light-weight versions that we can make assertions on for unit testing, and avoid making expensive, time-consuming calls to database/api-land. The designers of Database and Api don't have to be too aware that this is what we're going to do, and the designer of MyClassForTesting certainly doesn't have to know! This seems (to me) like a pretty good way to do things.
My C++ friend, however, retorted that this was a dreadful hack, and there's a good reason C++ won't let you do this! He then presented a solution based on Generics, which does much the same thing. For brevity's sake, I'll just list a part of the solution he gave, but again you can find the whole thing over on Github.
template<typename A, typename D>
class MyClassForTesting {
private:
A mApi;
D mDatabase;
public MyClassForTesting(D database, A api) {
mApi = api;
mDatabase = database;
}
...
};
Which would then be tested much like before, but with the important bits that get replaced shown below:
class MockDatabase : Database {
...
}
class MockApi : Api {
...
}
MyClassForTesting<MockApi, MockDatabase>
testingObj(MockApi(ar), MockDatabase(ar));
So my question is this: What's the preferred method? I always thought the polymorphism-based approach was better - and I see no reason it wouldn't be in Java - but is it normally considered better to use Generics than Virtualise everything in C++? What do you do in your code (assuming you do unit test) ?
I'm probably biased, but I'd say the C++ version is better. Among other things, polymorphism carries some cost. In this case, you're making your users pay that cost, even though they receive no direct benefit from it.
If, for example, you had a list of polymorphic objects, and want to manipulate all of them via the base class, that would justify using polymorphism. In this case, however, the polymorphism is being used for something the user never even sees. You've built in the ability to manipulate polymorphic objects, but never really used it -- for testing you'll only have mock objects, and for real use you'll only have real objects. There will never be a time that you have (for example) an array of database objects, some of which are mock databases and others of which are real databases.
This is also much more than just an efficiency issue (or at least a run-time efficiency issue). The relationships in your code should be meaningful. When somebody sees (public) inheritance, that should tell them something about the design. As you've outlined it in Java, however, the public inheritance relationship involved is basically a lie -- i.e. what he should know from it (that you're dealing with polymorphic descendants) is an outright falsehood. The C++ code, by contrast, correctly conveys the intent to the reader.
To an extent, I'm overstating the case there, of course. People who normally read Java are almost certainly well accustomed to the way inheritance is typically abused, so they don't see this as a lie at all. This is a bit of throwing out the baby with the bathwater though -- instead of seeing the "lie" for what it is, they've learned to completely ignore what inheritance really means (or just never knew, especially if they went to college where Java was the primary vehicle for teaching OOP). As I said, I'm probably somewhat biased, but to to me this makes (most) Java code much more difficult to understand. You basically have to be careful to ignore the basic principles of OOP, and get accustomed to its constant abuse.
Some key advice is "prefer composition to inheritence", which is what your MyClassForTesting has done with respect to the Database and Api. This is good C++ advice too: IIRC it is in Effective C++.
It is a bit rich for your friend to claim that using polymorphism is a "dreadful hack" but using templates is not. On what basis does (s)he claim that one is less hacky than the other? I see none, and I use both all the time in my C++ code.
I'd say the polymorphism approach (as you have done) is better. Consider that Database and Api might be interfaces. In that case you are explicitly declaring the API used by MyClassForTesting: someone can read the Api.java and Database.java files. And you are loosely coupling the modules: the Api and Database interfaces will naturally be the narrowest acceptable interfaces, much narrower than the public interface of any concerete class that implements them.
More importantly, you cannot create templated virtual functions. This makes it impossible to test functions in C++ which use templates, by using inheritance, and therefore testing by inheritance in C++ is unreliable as you cannot test all classes that way, and definitely not every use of a base class can be substituted with that of a derived class, especially w.r.t instantiating templates of them. Of course, templates introduce their own problems, but I think that's beyond the scope of the question.
You're throwing inheritance at the problem but really it's not the right solution- you only need to change between the mock and the real at compile time, not at run time. This fundamental fact makes templates the better option.
In C++, we don't forget the virtual keyword, we just don't need it, because run-time polymorphism should only occur when you need to vary the type at run-time. Else, you're firing a rocket launcher at a nail.

When should I return the Interface and when the concrete class?

when programming in Java I practically always, just out of habit, write something like this:
public List<String> foo() {
return new ArrayList<String>();
}
Most of the time without even thinking about it. Now, the question is: should I always specify the interface as the return type? Or is it advisable to use the actual implementation of the interface, and if so, under what circumstances?
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList:
List bar = foo();
List myList = bar instanceof LinkedList ? new ArrayList(bar) : bar;
but that just seems horrible and my coworkers would probably lynch me in the cafeteria. And rightfully so.
What do you guys think? What are your guidelines, when do you tend towards the abstract solution, and when do you reveal details of your implementation for potential performance gains?
Return the appropriate interface to hide implementation details. Your clients should only care about what your object offers, not how you implemented it. If you start with a private ArrayList, and decide later on that something else (e.g., LinkedLisk, skip list, etc.) is more appropriate you can change the implementation without affecting clients if you return the interface. The moment you return a concrete type the opportunity is lost.
For instance, if I know that I will
primarily access the data in the list
randomly, a LinkedList would be bad.
But if my library function only
returns the interface, I simply don't
know. To be on the safe side I might
even need to copy the list explicitly
over to an ArrayList.
As everybody else has mentioned, you just mustn't care about how the library has implemented the functionality, to reduce coupling and increasing maintainability of the library.
If you, as a library client, can demonstrate that the implementation is performing badly for your use case, you can then contact the person in charge and discuss about the best path to follow (a new method for this case or just changing the implementation).
That said, your example reeks of premature optimization.
If the method is or can be critical, it might mention the implementation details in the documentation.
Without being able to justify it with reams of CS quotes (I'm self taught), I've always gone by the mantra of "Accept the least derived, return the most derived," when designing classes and it has stood me well over the years.
I guess that means in terms of interface versus concrete return is that if you are trying to reduce dependencies and/or decouple, returning the interface is generally more useful. However, if the concrete class implements more than that interface, it is usually more useful to the callers of your method to get the concrete class back (i.e. the "most derived") rather than aribtrarily restrict them to a subset of that returned object's functionality - unless you actually need to restrict them. Then again, you could also just increase the coverage of the interface. Needless restrictions like this I compare to thoughtless sealing of classes; you never know. Just to talk a bit about the former part of that mantra (for other readers), accepting the least derived also gives maximum flexibility for callers of your method.
-Oisin
Sorry to disagree, but I think the basic rule is as follows:
For input arguments use the most generic.
For output values, the most specific.
So, in this case you want to declare the implementation as:
public ArrayList<String> foo() {
return new ArrayList<String>();
}
Rationale:
The input case is already known and explained by everyone: use the interface, period. However, the output case can look counter-intuitive.
You want to return the implementation because you want the client to have the most information about what is receiving. In this case, more knowledge is more power.
Example 1: the client wants to get the 5th element:
return Collection: must iterate until 5th element vs return List:
return List: list.get(4)
Example 2: the client wants to remove the 5th element:
return List: must create a new list without the specified element (list.remove() is optional).
return ArrayList: arrayList.remove(4)
So it's a big truth that using interfaces is great because it promotes reusability, reduces coupling, improves maintainability and makes people happy ... but only when used as input.
So, again, the rule can be stated as:
Be flexible for what you offer.
Be informative with what you deliver.
So, next time, please return the implementation.
In OO programming, we want to encapsulate as much as possible the data. Hide as much as possible the actual implementation, abstracting the types as high as possible.
In this context, I would answer only return what is meaningful. Does it makes sense at all for the return value to be the concrete class? Aka in your example, ask yourself: will anyone use a LinkedList-specific method on the return value of foo?
If no, just use the higher-level Interface. It's much more flexible, and allows you to change the backend
If yes, ask yourself: can't I refactor my code to return the higher-level interface? :)
The more abstract is your code, the less changes your are required to do when changing a backend. It's as simple as that.
If, on the other hand, you end up casting the return values to the concrete class, well that's a strong sign that you should probably return instead the concrete class. Your users/teammates should not have to know about more or less implicit contracts: if you need to use the concrete methods, just return the concrete class, for clarity.
In a nutshell: code abstract, but explicitly :)
In general, for a public facing interface such as APIs, returning the interface (such as List) over the concrete implementation (such as ArrayList) would be better.
The use of a ArrayList or LinkedList is an implementation detail of the library that should be considered for the most common use case of that library. And of course, internally, having private methods handing off LinkedLists wouldn't necessarily be a bad thing, if it provides facilities that would make the processing easier.
There is no reason that a concrete class shouldn't be used in the implementation, unless there is a good reason to believe that some other List class would be used later on. But then again, changing the implementation details shouldn't be as painful as long as the public facing portion is well-designed.
The library itself should be a black box to its consumers, so they don't really have to worry about what's going on internally. That also means that the library should be designed so that it is designed to be used in the way it is intended.
It doesn't matter all that much whether an API method returns an interface or a concrete class; despite what everyone here says, you almost never change the implementiation class once the code is written.
What's far more important: always use minimum-scope interfaces for your method parameters! That way, clients have maximal freedom and can use classes your code doesn't even know about.
When an API method returns ArrayList, I have absolutely no qualms with that, but when it demands an ArrayList (or, all to common, Vector) parameter, I consider hunting down the programmer and hurting him, because it means that I can't use Arrays.asList(), Collections.singletonList() or Collections.EMPTY_LIST.
As a rule, I only pass back internal implementations if I am in some private, inner workings of a library, and even so only sparingly. For everything that is public and likely to be called from the outside of my module I use interfaces, and also the Factory pattern.
Using interfaces in such a way has proven to be a very reliable way to write reusable code.
The main question has been answered already and you should always use the interface. I however would just like to comment on
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList.
If you are returning a data structure that you know has poor random access performance -- O(n) and typically a LOT of data -- there are other interfaces you should be specifying instead of List, like Iterable so that anyone using the library will be fully aware that only sequential access is available.
Picking the right type to return isn't just about interface versus concrete implementation, it is also about selecting the right interface.
You use interface to abstract away from the actual implementation. The interface is basically just a blueprint for what your implementation can do.
Interfaces are good design because they allow you to change implementation details without having to fear that any of its consumers are directly affected, as long as you implementation still does what your interface says it does.
To work with interfaces you would instantiate them like this:
IParser parser = new Parser();
Now IParser would be your interface, and Parser would be your implementation. Now when you work with the parser object from above, you will work against the interface (IParser), which in turn will work against your implementation (Parser).
That means that you can change the inner workings of Parser as much as you want, it will never affect code that works against your IParser parser interface.
In general use the interface in all cases if you have no need of the functionality of the concrete class. Note that for lists, Java has added a RandomAccess marker class primarily to distinguish a common case where an algorithm may need to know if get(i) is constant time or not.
For uses of code, Michael above is right that being as generic as possible in the method parameters is often even more important. This is especially true when testing such a method.
You'll find (or have found) that as you return interfaces, they permeate through your code. e.g. you return an interface from method A and you have to then pass an interface to method B.
What you're doing is programming by contract, albeit in a limited fashion.
This gives you enormous scope to change implementations under the covers (provided these new objects fulfill the existing contracts/expected behaviours).
Given all of this, you have benefits in terms of choosing your implementation, and how you can substitute behaviours (including testing - using mocking, for example). In case you hadn't guessed, I'm all in favour of this and try to reduce to (or introduce) interfaces wherever possible.

Categories